business intelligence/data warehouse, 1 ben martinba lörrach, wi 4.semester 4/21/2002 data...

109
Ben Martin BA Lörrach, WI 4.Semester 4/21/2002 Business Intelligence/Data Warehouse, 1 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the Business Intelligence process ! How would you describe the current business dynamic ? Why focus on Customers and Customer behavior ? How would you describe a Customer ? What is a profitable Customer ? What information do we need to record about them ? What‘s the technical and logical reason for a Data Warehouse solution contrary to

Upload: erma-anding

Post on 05-Apr-2015

104 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 1

Data Warehouse Day 2Day 1 Review / Recall

Name the phases of the Business Intelligence process !

How would you describe the current business dynamic ?

Why focus on Customers and Customer behavior ?

How would you describe a Customer ?

What is a profitable Customer ?

What information do we need to record about them ?

What‘s the technical and logical reason for a Data Warehouse solution contrary to an operative system ?

Page 2: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 2

Data Warehouse GlossaryData Warehousing Requirements

• Unabhängigkeit zwischen Datenquellen und Analyse-systemen (bzgl. Verfügbarkeit, Belastung, laufender Änderungen)

• Dauerhafte Bereitstellung integrierter und abgeleiteter Daten (Persistenz)

• Mehrfachverwendbarkeit der bereitgestellten Daten

• Möglichkeit der Durchführung prinizipiell beliebiger Auswertungen

Page 3: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 3

Data Warehouse GlossaryData Warehouse Requirements II

• Unterstützung individueller Sichten (z.B. bzgl. Zeithorizont, Struktur)

• Erweiterbarkeit (z.B. Integration neuer Quelle)

• Automatisierung der Abläufe

• Eindeutigkeit über Datenstrukturen, Zugriffsberechtigungen und Prozesse

• Ausrichtung am Zweck: Analyse der Daten

Page 4: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 4

Data Warehouse GlossaryData Warehouse Characteristics

Application Processing - unstructured, heuristic, analytical

Priorities - Easy of use, flexible access, refresh, query

Processor Use - Highly unpredictable (unvorhersehbar)

Response Time - Seconds to hours (data mining may take hours)

Database - usually relational (RDBMS)

Data Content - Organized by subject partitioned

Nature of Data - Historical

End Users - management, decision makers, knowledge workers

Page 5: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 5

Data Warehouse GlossaryData Warehouse Characteristics II

User Expectations

• differences in response time may be significant between DWH and a client-server front end application

• you need to control user’s expectations regarding response

• set reasonable and achievable targets for query response, which can be assessed and proved in the first increment of development

• then you can define, specify and agree SLA

• Talk to the users !

Page 6: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 6

Data Warehouse GlossaryData Warehouse Characteristics III

Exponential Growth and Use

• once implemented, DWH continue to grow in size

• each refresh time - more data is added (or archived)

• DWH grow very quickly - magnitude of gigabytes a month, terabytes over year

• once the success of a DWH implementation is proven, the use increases dramatically

• use often grows faster than expected

Page 7: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 7

Data Warehouse GlossaryData Warehouse Properties

Page 8: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 8

Data Warehouse GlossaryData Warehouse Properties II

Page 9: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 9

Data Warehouse GlossaryData Warehouse Properties III

Subject Areas

• For a given subject - snapshots of data across the business

- different time periods, different emphasis of data view

• Typical subject areas

- Customer accounts

- Product sales

- Customer savings (Spareinlagen)

- Toll calls (telecommunication)

- Airline passenger booking information

- Insurance claim data (Ansprueche)

Page 10: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 10

Data Warehouse GlossaryData Warehouse Properties IV

Subject Areas and Warehouse Data Model

• you develop a data model to hold the data that you will use measure the business

• you include the information that you will use to analyze the business

• you measure the business according sales figures

• you analyze the sales by Customers, Region, Salesperson, Territory, Store (or any combination)

Subject oriented information provides information departments within a corporation with a common understanding of their business

Page 11: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 11

Data Warehouse GlossaryData Warehouse Properties V

Page 12: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 12

Data Warehouse GlossaryData Warehouse Properties VI

Data status of online transaction processing data:

• dispersed (verteilt) in diverse (verschiedene) and independent legacy systems

• it’s impossible to measure the business performance, because

- of the diversity

- inconsistency in the data

- differences in database management systems

- lack of external information

Page 13: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 13

Data Warehouse GlossaryData Warehouse Properties VII

DWH to integrate the data into one set quality information, which is:

• meaningful, accurate and intelligible (verstaendlich) for analysis

Standardization, Integration of Data:

• Naming conventions

• Coding structures

• Physical data attributes

• Measurement of variables

Cleaning and integration process is time-consuming and costly !

Page 14: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 14

Data Warehouse GlossaryData Warehouse Properties VIII

Page 15: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 15

Data Warehouse GlossaryData Warehouse Properties IX

Time key is a vital database attribute

• analysis of data is over a time period (days, weeks, month, quarters, years)

• database key columns contain an element of time that determinates the business period to which the data relates

• structure and meaning of the element varies between implementation and business needs

Refresh Cycles

• must be determined in the early stages of the analysis of the business user’s requirements

Page 16: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 16

Data Warehouse GlossaryData Warehouse Properties X

Grain of Data (granularity - Körnigkeit)

• grain is level at which the data is held in DWH-tables

• operational system: grain of data is transactional (one record for each transaction)

• refresh cycle may not have the same grain as the data cycle

• it’s more usual to store data in a summarized form by week, month or other business defined time period

• you may choose refresh the data warehouse every week, but the grain of the data may be daily totals (monthly - week, etc.)

Page 17: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 17

Data Warehouse GlossaryData Warehouse Properties XI

Page 18: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 18

Data Warehouse GlossaryData Warehouse Properties XII

Changing Data - the following operations are typical of a DWH

• initial set of data is loaded (first time load)

• frequent snapshots of core data are added, according to the refresh cycle

DWH-Data may need to changed in other ways

• business determines how much historical data is needed for analysis (older: archived, purged (gesäubert))

• inappropriate (unangebrachte) or inaccurate data values may be deleted from or migrated out of the DWH

Page 19: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 19

Data Warehouse GlossaryEnterprise -Wide Data Warehouse

• Stores all data from all subject areas within the business for analysis by end users

• the scope is the entire business and all operational aspects within the business

• normally created through a series of incrementally developed solutions

• EDWH provides:

- a single source of corporate enterprise-wide data

- a single source of synchronized data for each subject area

- a single point for distribution of data to dependent data marts

Page 20: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 20

Aufgabe • Bereitstellung einer inhaltlich beschränkten Sicht auf das DW (z.B. für Abteilung, oder Funktionen) Gründe• Eigenständigkeit, Datenschutz, Lastverteilung, Datenvolumen, etc.

Realisierung • Verteilung der DW-Daten

Formen• Abhängige Data Marts, Unabhängige Data Marts

Data Warehouse GlossaryData Marts

Page 21: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 21

Data Warehouse GlossaryData Marts II

Benefits

• provides localization - they server users at a specific level or for a specific purpose

• smaller and easier to manage then a EDWH

• the need may come from geographical, functional divisions or technical groups within an enterprise

• DM reduce the demands on warehouse date and also the data access traffic

Page 22: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 22

Data Warehouse GlossaryData Marts Independent

Page 23: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 23

Data Warehouse GlossaryData Marts Independent II

• build and loaded directly from operational system

• motivation for this kind of implementation:

- Line Of Business (LOB) empowerment

- short time frame for implementation

• the methods for extracting and loading of operational data as in the DH solution

• Integration and Transformation retrospectively (nachtraeglich) into a single DW-solution is possible

• Issue: independent data transformation process

Page 24: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 24

Data Warehouse GlossaryData Marts Dependent

Page 25: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 25

Data Warehouse GlossaryData Marts Dependent II

• subset of enterprise-wide data

• built and loaded from the Enterprise DW

• need only extract from the data warehouse and transport the date into themselves, higher grain then DW

• they don’t transform any data (faster, cheaper)

• other advantages

- performance, availability, connection costs

- more resistant to change

- maintains a single version of data

Page 26: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 26

Data Warehouse GlossaryData Mart Dependent III

Strukturelle Extrakte•Beschränkung auf Teile des Schemas•Bsp.: nur bestimmte Kennzahlen oder Dimensionen

Inhaltliche Extrakte• inhaltliche BeschränkungBsp.: nur bestimmte Filialen oder das letzte Jahresergebnis

Aggregierte Extrakte• Verringerung der GranularitätBsp.: Beschränkung auf Monatsergebnisse

Page 27: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 27

Data Warehouse GlossaryData Mart Considerations

• avoid disparate (unvereinbare) data mart solution

• build towards the enterprise-wide strategy

• consistent use of products, technology and processes are vital

• always employ (einsetzen) dependent data mart solutions to avoid the disparity problems

Page 28: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 28

Data Warehouse GlossaryData Mart Characteristics

Priorities - Easy of use, flexible data access

Processor Use - Highly unpredictable (unvorhersehbar)

Response Time - Seconds to several minutes

Database - Relational, multidimensional

Data Content - Organized by subject for LOB

Nature of Data - historical (month, weeks rather then years)

Application Processing - unstructured, heuristic, analytical

End Users - see DW, + statisticians

Page 29: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 29

Data Warehouse GlossaryOperational Data Store

Page 30: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 30

Data Warehouse GlossaryOperational Data Store

• holds the current data for analysis or application integration

• may form a staging area for the Warehouse

• may contain integrated, clean, summarized data

• limited summary life expectation

• may be updated

- synchronously with operational system

- on a store-and forward basis

• exists in a separate environment

Page 31: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 31

Data Warehouse GlossaryODS - Characteristics

Priorities - Easy of use, flexible data access

Response Time - Seconds to minutes

Database - relational

Data Content - organized by subject, current value data, integrated

Nature of Data - Dynamic

Processing - structured, analytical

End Users - DBA’s, clerical users

Page 32: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 32

Data Warehouse GlossaryMeta Data

Begriff: „ jede Art von Information, die für den Entwurf, die Konstruktion und die Benutzung eines Informationssystems benötigt wird“

für DW:• notwendig zur Abdeckung der Informations-Schutz-und Sicherheitsbedürfnisse der Anwender und der Software• werden in allen Phasen produziert und genutzt

konsistente Bereitstellung der Metadaten ausunterschiedlichen Quellen notwendig -> Repository

Page 33: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 33

Data Warehouse GlossaryMeta Data Nutzung

Passiv: • als Dokumentation der verschiedenen Aspekte eines DW-Systems

Aktiv: • Speicherung semantischer Aspekte (z.B. Transformationsregeln) sowie deren Interpretation zur Laufzeit

Semiaktiv: • Speicherung von Strukturinformationen (Tabellendefinitionen,Konfigurationsspezifikationen) und Nutzung zur Überprüfung (nicht direkt zur Ausführung)

Page 34: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 34

Data Warehouse GlossaryMeta Data Objekte

• Betriebswirtschaftliche Kennzahlen

• Sichten für einzelne Anwendergruppen

• Transformation der Daten aus Quellsystemen in das DW

• Laderoutinen und Regeln

• Aufbau von Anfragen, Filter, Anzeigeschablonen,

Page 35: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 35

• Administrationsinformationen: Zugriffsstatistiken,Backup/Recovery, Bildung von Aggregaten, ...

• Datenbankparameter und -einstellungen: Server, Hardware-Umgebung, Tuning-Parameter

• Anfrage-Performance: vorberechnete Aggregate, Caching, Optimierungsstrategien

• Granularität der Daten

Data Warehouse GlossaryMeta Data Objekte II

Page 36: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 36

• allgemeine Attribute: Maßeinheiten etc.

• Sicherheitsstrategie: Anwenderprofile und -gruppen, Einschränkungen der Sichten

• Berichts- und Analyseobjekte, Reports

Data Warehouse GlossaryMeta Data Objekte III

Page 37: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 37

Data Warehouse GlossaryMeta Data Repository

Ziel 1: • Minimierung des Aufwandes für Aufbau und Betrieb eines DW

Systemintegration:• Integration auf Schema- und Datenebene erfordert Information über Struktur und Semantik der Quell- und Zielsysteme• einheitliche Verwaltung von Metadaten für Integration der DW-Werkzeuge

Automatisierung der Administration• Steuerung der DW-Prozesse über Scheduling-/ Konfigurationsmetadaten• Daten über Ausführung der Prozesse (Protokolle etc.)

Page 38: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 38

Data Warehouse GlossaryMeta Data Repository II

Ziel 1 (cont.): • Minimierung des Aufwandes für Aufbau und Betrieb eines DW

Flexibler Softwareentwurf• explizite Repräsentation sich häufig ändernder Aspekte (z.B. Transformationsregeln)• verbesserte Wartbarkeit und Erweiterbarkeit

Schutz- und Sicherheitsaspekte• Behandlung von Zugriffs- und Benutzerrechten als Metadaten• globale Zugriffsmechanismen

Page 39: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 39

Data Warehouse GlossaryMeta Data Repository III

Ziel 2: Gewährleistung eines optimalen Informationsgewinns für alle Anwendergruppen

Datenqualität• Sicherstellung der geforderten Qualität durch Überprüfungsregeln• Nachvollziehbarkeitsinformationen (Quellsystem, Autor, Zeitpunkt usw.)

Terminologie• einheitliche Terminologie als Voraussetzung für einheitlicheInterpretation• zentrale Verwaltung im Metadaten-Repository

Page 40: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 40

Ziel 2 (cont.): Gewährleistung eines optimalen Informationsgewinns für alle Anwendergruppen

Datenanalyse• Metadaten über Bedeutung von Daten, Kennzahlensysteme,

Data Warehouse GlossaryMeta Data Repository IV

Page 41: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 41

Anwenderzugriff• Mechanismen zur Navigation, Filterung, Selektion von Metadaten• Unterstützung manueller Aktualisierung

Interoperabilität und Werkzeugunterstützung• Programmierschnittstelle für lesenden und schreibenden Zugriff

Import- und Exportschnittstellen• Erweiterbares Metamodell

Change Management•Versions- und Konfigurationsverwaltung•Benachrichtigungsmechanismen

Data Warehouse GlossaryMeta Data Anforderungen bzgl. Funktionalität

Page 42: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 42

Data Warehouse ArchitecureReference Architecture I

Page 43: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 43

Data Warehouse ArchitecureReference Architecture II

Page 44: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 44

Data Warehouse ArchitecureExtraction, Transformation and Load Process (ETL)

• ETL-Prozeß

• Integrationsprobleme

• Data Cleaning

• Data Capture Methods

• Staging Area

• Load Window

This area typically takes 70% of the overall effort in building DWH !

Page 45: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 45

Data Warehouse Architecure

• Vielzahl von Quellen

• Heterogenität

• Datenvolumen

• Komplexität der Transformation- Schema- und Instanzintegration- Datenbereinigung

• Kaum durchgängige Methoden- und System-unterstützung, jedoch Vielzahl von Werkzeugen vorhanden

ETL - Probleme

Page 46: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 46

Data Warehouse Architecure

Extraktion: Selektion eines Ausschnitts der Daten aus den Quellen und Bereitstellung für Transformation

Transformation: Anpassung der Daten an vorgegebene Schema- und Qualitätsanforderungen

Load: physisches Einbringen der Daten aus dem Arbeitsbereich (staging area) in das Data Warehouse (einschl. eventuell notwendiger Aggregationen)

Extraction, Transformation and Load Process (ETL)

Page 47: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 47

Data Warehouse ArchitecureETL - Definitionsphase

Page 48: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 48

Data Warehouse ArchitecureETL - Integrationsprobleme

Schwerpunkt:

• Probleme der Datenintegration

Ausgangspunkt:

•Daten liegen in den operativen Informationssystemen unterschiedliche Systeme

-> Heterogenität

Page 49: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 49

Data Warehouse ArchitecureETL - Anforderungen an Integration

• alle relevanten Daten aus den operativen Systeme müssen im Data Warehouse aufgenommen werden können

• Überführung unterschiedliche Strukturierungen / Darstellungen semantisch gleicher oder zusammengehöriger Daten aus den Quellsystemen in eine gemeinsame Repräsentation

• Identifizierungen gleicher Informationen, die aus mehreren Systemen stammen

• Beseitigung ungewünschter Redundanz, die Analyseergebnisse verfälschen kann

Page 50: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 50

Data Warehouse ArchitecureETL - Integrationskonflikten

• Beschreibungskonflikte

• Heterogenitätskonflikte

• Strukturelle Konflikte

in der Regel kombiniertes Auftreten dieser Konfliktartenzusätzlich- für Data Warehouses besonders wichtig:

• Datenkonflikte

Page 51: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 51

Data Warehouse ArchitecureETL - Beschreibungskonflikte

• unterschiedliche Eigenschaften/Attribute derselbenObjekte in den lokalen Schemata

• homonyme und synonyme Bezeichnungen

• Datentypkonflikte / Wertebereichskonflikte:unterschiedliche Datentypen / Wertebereiche für diegleiche Eigenschaft

• Skalierungskonflikte: Verwendung unterschiedlicher,aber ineinander umrechenbarer Maßeinheiten

Examples ?

Page 52: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 52

Data Warehouse ArchitecureETL - Heterogenitätskonflikte

• Unterschiedliche Datenmodelle der zu integrierenden Schemata

• unterschiedliche Modellierungskonstrukte und Ausdruckskraft impliziert oft auch strukturelle Konflikte

• Auflösung durch Transformation in ein gemeinsames globales Datenmodell

• Example: Objektorientierte DB vers relationales Modell

Page 53: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 53

Data Warehouse ArchitecureETL - Strukturelle Konflikte

• selbst bei Verwendung desselben Datenmodells (Objekt oder relational) oft unterschiedliche

• Modellierung eines Sachverhaltes insbesondere bei semantisch reichenDatenmodellen (mit vielen Modellierungskonstrukten)

• Example ?

Page 54: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 54

Data Warehouse ArchitecureETL - Datenkonflikte

A. falsche Daten1. nicht korrekte Einträge2. veraltete Daten

B. unterschiedliche Repräsentationen1. verschiedene Ausdrücke2. verschiedene Einheiten3. Unterschiedliche Genauigkeit

Examples ?

Page 55: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 55

Data Warehouse ArchitecureETL - Data Cleaning

Korrektur inkorrekter, inkonsistenter oder unvollständiger Daten Auch: Data Cleansing, Data Scrubbing

Techniken:- Konvertierung unterschiedlicher Formate (z.B. Textdateien in DB-Tabellen über Oracle SQL*Loader)- Abbildung von Datenfeldern in ein gemeinsames Format(Zeichenketten in Großschreibung / Datumsformat: dd/mm/yyyy Währungen)- Einsatz spezielle Werkzeuge möglich (häufig auf Basis von Wörterbüchern) Beispiele:

• Produktbezeichnungen im Pharmabereich, Adressen über Adreßdatenbanken (Postleitzahlen, Telefonvorwahl)• Synonyme und Abkürzungen („Str.“ für „Straße“)

Page 56: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 56

Data Warehouse ArchitecureETL - Data Capture (Erfassungs) Methods

Problem:

• after the initial load, incremental loads need to identify only the data that has changed on the source system

Triggers on the operational System

• whenever a record has changed, the changed value is written to a file - problem: performance (database) operational system

Operational System generates a delta file

• code can be added to the operational system to generate a file containing the changed records - problem add code in operational system

Page 57: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 57

Analyze log file of the operational system

• copy of log file can be used by checking the LAST UPDATE DATE field - recommended method

• Example ?

Compare current extract to the last extract

• getting a specified extract file containing the latest snapshot of the operational data

• this is compared with the last extract file

• changes are inserted into the warehouse - most commonly used

Data Warehouse ArchitecureETL - Data Capture (Erfassungs) Methods

Page 58: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 58

Data Warehouse ArchitecureETL -Staging Area

• contains the tables that are transported to the data warehouse platform

• supplies the warehouse with both the first-time and the regular refresh

• typical requirement of DWH implementation

• it may be an Operational Data Store (ODS) or a series of tables in a relational database server or flat files manipulated using in-house scripts, programs

• Multi-tier staging (optional)

Page 59: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 59

Data Warehouse ArchitecureETL - Load Window

Page 60: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 60

Data Warehouse Architecure

• simply the amount of time you have available to extract, transform, load, post-load process data and make the data warehouse available to the user

• load performs many sequential tasks that take time to execute

• you must endure that every event that occurs during the load window is planned, tested, proved and constantly monitored

• you may have to face poor load performance and gaps (Lücken) by providing the data for user access

• careful planning, defining, testing and scheduling is critical !

ETL - Load Window

Page 61: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 61

Data Warehouse Architecure

Load Window Strategy

• load time is dependent upon a number of factors such as data volumes, network capacity and load utility capabilities

• consider the user requirements first - then work out the load schedule backwards from that point

Load Recovery

• you may also have to allow sufficient time within the batch load window to recover back to logical business point in time (up to the close of business the previous day)

ETL - Load Window

Page 62: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 62

Warehouse Data SchemasOverview

Page 63: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 63

Warehouse Data SchemasOverview

Warehouse / Mart will contain a large number of objects:

• Core Objects

- Fact Data - Tables

- Dimensional Data - Tables

- Reference Data - Tables

- Summary Data - Tables

Page 64: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 64

Warehouse Data SchemasStar Schema

Page 65: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 65

Warehouse Data SchemasStar Schema II

• single, large central table surrounded by a number of other smaller tables radiating from it connected by database primary and foreign keys

• outlying tables - dimension tables that control the query as they contain the data is found in the query predicates

• most dominant warehouse schema

• DWH will contain many stars, not just one, each subject area will have it’s own fact table

• many fact tables may share dimensions (e.g. time)

Page 66: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 66

Warehouse Data SchemasStar Schema III

Page 67: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 67

Warehouse Data SchemasSnowflake Schema

Page 68: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 68

Warehouse Data SchemasSnowflake Schema II

• closer to an entity relationship diagram than the classic star model

• the dimension data is normalized

• developing a snowflake model means building class hierarchies out of each dimension (normalizing data)

Page 69: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 69

Warehouse Data SchemasSnowflake Schema III

Page 70: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 70

Warehouse Data SchemasStar Schema

Advantages:

• easy to understand, the structure is simple and straightforward

• provides fast response to queries with optimization and reductions of joins required between fact and dimension tables

• supported by many front end tools

Disadvantages

• may require more frequent rebuilding

• slow to build because of the level of denormalization

• not easy to design and use if you need to maintain the history of data or hierarchy within a dimension

Page 71: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 71

Warehouse Data SchemasSnowflake Schema

Advantages:

• certain advanced DSS tools and servers can use this structure directly

• provides a structure that is easier to change as requirement change

• loading data into smaller normalized tables is quicker than loading into huge denormalized tables

Disadvantages

• large number of dimension hierarchy tables, may start to become an unmanageable model

• more joins may mean performance declines

Page 72: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 72

Warehouse Data SchemasFact Table

• comprises the bulk of data within the data warehouse, many million rows

• is the numerical measurement of the business performance, such as sales figures, customer banking transactions

• is accessed by data values stored in dimension tables

• contains multi-part primary key values, each part of the key references a dimension by which the fact data is accessed

• you should consider the design of the fact extremely carefully

Page 73: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 73

Warehouse Data SchemasFact Table - Granularity

Granularity - Level of Detail

• individual transactions, daily snapshots, monthly, quarterly

• high level: transaction/daily

• low level: week/month ...

• determines size of data warehouse

• users define the level of granularity and not technical restrictions

Page 74: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 74

Warehouse Data SchemasFact Table - Design Considerations

• access performance and flexibility and manageability

Partitioning

• Horizontal: fact table broken into number of smaller tables (load into one table, performance)

• Vertical: sliced into a number of narrower (schmal) tables (performance, different user groups)

Page 75: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 75

Warehouse Data SchemasDimension Data Tables

Page 76: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 76

Warehouse Data SchemasDimension Data Tables II

Updating

Dimension

Data

• not refreshed in the same way as fact data

• changes in dimension table - updates rather then inserts

• Example ?

Page 77: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 77

Warehouse Data SchemasDimension Data Tables III

Page 78: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 78

Warehouse Data SchemasDimension Data Tables - Time

Time in different environment:

Operational

• up-to-date snapshot of the busness transactions at any point in time

• time element constantly change, doesn’t contain serious amount of historical data

Warehouse

• provide an explicit time series of data

• snapshots of operational system are moved into warehouse in series of layers

Page 79: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 79

Warehouse Data SchemasDimension Data Tables - Time II

Page 80: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 80

Warehouse Data SchemasReference Data Tables

Page 81: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 81

Warehouse Data SchemasSummary Data

Page 82: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 82

Warehouse Data SchemasSummary Data Tables

Page 83: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 83

Warehouse Data SchemasSummary Data Tables II

Perfomance

• improves query performance by allowing queries direct access to pre-computed summaries and pre-defined views

• due to the user acceptance - one of the most important implementation consideration of a warehouse

Content

• based on data stored in dimension tables (Customer attributes)

Numbers of tables

• hundreds

Page 84: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 84

Warehouse Data SchemasSummary Data Tables III

Summaries stored as additional or even stored within fact tables (separate level field indicator/index is used)

Benefits of Separate Summary Fact Tables

• easier to manage: created, dropped, loaded and indexed separately

• accessed faster than embedding the summary within facts

but: as this information must refer to dimensional data, additional dimension tables may also have to create

Page 85: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 85

Managing the WarehouseSizing Storage (Einschätzen)

Attention must be paid to storage requirements for the warehouse:

• Data - facts, dimensions, reference and summary tables

• Staging file store

• Indexes

• Backup and Recovery Strategies

• temporary files

• log files

Database should be three to four time the size of base fact table

Page 86: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 86

Managing the WarehouseSizing Storage (Einschätzen) II

Page 87: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 87

Managing the WarehouseSizing Storage (Einschätzen) III

Page 88: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 88

Managing the WarehouseMonitoring and Performance Tuning

• Not the same as OLTP - DBA’s not to hunt and kill expensive queries

• DWH - high throughput, insert/update intensive systems

• may contain large number of data that grow continuously and are accessed concurrently by hundreds of users

Tuning goals are:

• availability

• Transaction speed

• Concurrency (numbers of users and transactions)

• Recoverability

Page 89: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 89

Managing the WarehouseMonitoring and Performance Tuning II

Techniques dependent on database vendors (Oracle, IBM ..)

• parallel query option

Page 90: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 90

Managing the WarehouseMonitoring and Performance Tuning III

Page 91: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 91

Managing the WarehouseMonitoring and Performance Tuning IV

Page 92: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 92

Managing the WarehouseMonitoring and Performance Tuning V

• Partitioning

- by dimension (region, time)

- high query performance and high scalability

- high availability as each partition can be managed independently

- faster backup and restore operation can be done on individual partition

Page 93: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 93

Managing the WarehouseMonitoring and Performance Tuning VI

Page 94: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 94

Managing the WarehouseMonitoring and Performance Tuning VII

Page 95: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 95

Managing the WarehouseMonitoring and Performance Tuning VIII

Page 96: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 96

Managing the WarehouseMonitoring and Performance Tuning IX

Page 97: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 97

Managing the WarehouseArchiving Data

• Old data may need to be archived

• you need to identify a archive frequency

• use the partitioning option for archiving

• archiving by dimension

• purge data and remove the details to the archive

• plan and design early !

Page 98: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 98

Managing the WarehouseArchiving Data II

Page 99: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 99

Managing the WarehouseBackup and Recovery

• Strategy needs to be developed early in the Project

• technology and approach drive by the user requirements

• Impact of: partitioning, batch load window

• hot, cold, standby approaches, full, incremental

• what: facts, dimensions & reference, dependant data marts

• when: before DWH refresh ?, after ?, before & after ?

• Recovery: structure, data

• export/import

Page 100: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 100

Managing the WarehouseHardware Architectures

• SMP - Symmetric MultiProcessing

• Cluster - Processor Cluster (Einheit)

• MPP - Massive Parallel Processing

• NUMA - Non Uniform Memory Access

• Hybrids use SMP and MPP (Kreuzung)

Page 101: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 101

Managing the WarehouseHardware Architectures II

Page 102: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 102

Managing the WarehouseHardware Architectures - SMP

Page 103: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 103

Managing the WarehouseHardware Architectures - SMP II

Page 104: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 104

Managing the WarehouseHardware Architectures - Clusters

Page 105: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 105

Managing the WarehouseHardware Architectures - Clusters II

Page 106: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 106

Managing the WarehouseHardware Architectures - NUMA

Page 107: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 107

Managing the WarehouseHardware Architectures - NUMA II

Page 108: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 108

Managing the WarehouseHardware Architectures - MPP

Page 109: Business Intelligence/Data Warehouse, 1 Ben MartinBA Lörrach, WI 4.Semester 4/21/2002 Data Warehouse Day 2 Day 1 Review / Recall Name the phases of the

Ben Martin BA Lörrach, WI 4.Semester 4/21/2002

Business Intelligence/Data Warehouse, 109

Managing the WarehouseHardware Architectures - MPP II