online analytical processing - vaibhav bajpai · 2020-03-01 · architecture information sources...

38
Online Analytical Processing Vaibhav Bajpai

Upload: others

Post on 28-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Online Analytical Processing

Vaibhav Bajpai

Page 2: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Decision Support Systems

Architecture

Information Sources

Data Warehouse

OLAP Servers

OLAP Clients

Page 3: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

DSS

Architecture

DSS are used to make business decisions based on data collected by OLTP.

Page 4: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Architecture

Information Sources

Operational Databases

ERP system

Semi-Structured Sources

does not conform with formal structure of relational data models but contains tags to separate semantic elements (XML).

Page 5: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Architecture

ETL

Extract-Transform-Load is a process involving

Extracting data from information sources.

Transforming to fit operational needs. (cleansing)

Loading it into the Data Warehouse.

Page 6: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Architecture

Data Warehouse

DW is a repository of data to support management decision making process.

Characteristics

Basic Elements

Conceptual Models

Data Marts

Page 7: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Warehouse

Characteristics

Subject-Oriented

Integrated (security, single-version)

Time Variant (particular time-period)

Non-Volatile (never removed!)

Page 8: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Warehouse

Basic Elements

Facts (measures + fKeys to dimTables)

Measures (additive + non-additive + semi-additive)

Dimensions (measures from different perspective)

Hierarchies (classification of dimensions)

Page 9: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Warehouse

Conceptual Models

Star Schema

Snowflake Schema (normalized)

Galaxy Schema (many fact tables)

Page 10: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Conceptual Models

Star Schema

good for large data warehouses!

Page 11: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Conceptual Models

Snowflake Schema

good for small data warehouses!

Page 12: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Warehouse

Data Marts

DM is a subset of DW oriented to specific business line or team.

It is an access layer of DW used to get data out to the users.

Page 13: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Mining

process of extracting patterns from data.

transforms data into business intelligence.

Page 14: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Data Mining

Page 15: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

OLAP

Data Cube

Aggregation

Navigational Operations

Page 16: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

core of an OLAP system.

provides the m-Dim way to look at the summary of the data.

m-Dim generalization of group by operator

are sparse in nature

OLAP

Data Cube

Page 17: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

pre-calculated summaries of data.

answers are ready before the questions!

improve query response time.

aggregations are stored in the data cube

OLAP

Aggregation

Page 18: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Roll-up (lower to higher aggregation)

Drill-down (higher to lower aggregation)

Slicing

Dicing

Pivot

OLAP

Navigational Operations

Page 19: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

a subset of a mDim array corresponding to a single value for one or more members of the dimensions not in the subset.

Navigational Operations

Slicing

Page 20: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

is a slice on more than two dimensions of a cube.

i.e. more than two consecutive slices.

Navigational Operations

Dicing

Page 21: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Navigational Operations

Pivoting

Page 22: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

OLAP Approaches

ROLAP

MOLAP

HOLAP

Page 23: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Overview

Architecture

Methods of Cubing

Performance Evaluation

OLAP Approaches

ROLAP

Page 24: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

uses a RDBMS as a source.however, a DB designed for OLTP will not function well with ROLAP.

does not require pre-aggregation

generate SQL queries at appropriate level at request time.

ROLAP

Overview

Page 25: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

scalable!

needs additional attributes to define position in m-Dim space.

ROLAP

Architecture

Page 26: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Sort-based Methods (pipeSort)

Hash-based Methods (pipeHash)

ROLAP

Methods of Cubing

Page 27: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

ROLAP is CPU-bound.

slow

requires more disk

requires more IOs

requires more IO time

ROLAP

Performance Evaluation

Page 28: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Overview

Architecture

Storage Issues

OLAP Approaches

MOLAP

Page 29: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

core is a m-Dim data cube.

allows position based computation.

the cube is *very* sparse (not-scalable).

extremely fast!

MOLAP

Overview

Page 30: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

MOLAP

Architecture

pre-aggregated cubes

formatting data to user’s

needs

data requests

}

Page 31: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Chunking

Chunk-offset Compression

MOLAP

Storage Issues

Page 32: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

dividing m-Dim array into small chunks.

allows chunks to fit into available memory for in-memory computations.

MOLAP | Storage Issues

Chunking

Page 33: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

store a pair for each valid entry.

solves the sparse-array problem.

MOLAP | Storage Issues

Chunk-offset Compression

Page 34: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Overview

Architecture

OLAP Approaches

HOLAP

Page 35: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

store detailed data in RDBMS

store aggregated data in MDBMS

HOLAP

Overview

Page 36: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

HOLAP

Architecture

Page 37: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Performance is a concern?

MOLAP!

Data Volume and Scalability is a concern?

ROLAP!

When to choose What?

Page 38: Online Analytical Processing - Vaibhav Bajpai · 2020-03-01 · Architecture Information Sources Operational Databases ERP system Semi-Structured Sources does not conform with formal

Thank You!