mads brink hansen rehfeld partners
DESCRIPTION
Application of Columnstore and In-Memory Technologies in Business Intelligence Projects Oplæg fra InfinIT temadag d. 28. oktober 2014: Modern Analytical Database TechnologyTRANSCRIPT
APPLICATION OF COLUMNSTORE
AND IN-MEMORY TECHNOLOGIES
IN BUSINESS INTELLIGENCE
PROJECTS
Mads Brink HansenPrincipal Consultant, Rehfeld Partners A/S
Ekstern Lektor, Aarhus Universitet
03-11-2014
What is Business Intelligence
Data Information Analytics Knowledge Wisdom
03-11-2014
The traditional Kimball Lifecycle
Business Requirements
Definition
BI App. DesignBI App.
Development
Dimensional Model Design
Physical Design
Technical Architecture
Design
Product Selection and
Installation
ETL Design & Development
Technical Deployment
Program / Project Planning
Growth
Maintenance
Business Development
Technical Development
Other roles
Organizational Deployment
03-11-2014
BI Arhitecture
• [Too long] Time-to-Market
• [Too poor] Data Quality
• [Too poor] User Adoption
• Lack of goals
• Complex technical solutions
•
•
03-11-2014
So Why does BI Fail?
• Changed and New Processes
• Better training and/or practices
• New Technology
03-11-2014
What to do about it?
03-11-2014
Agile BI [still Kimball]
Customer Representatives
Business Requirements
Definition
ETL Design & Development
Dimensional Model Design/
Development
BI App. Design/
Development
Business Requirements
Definition
Program / Project Planning
Technical Architecture
Design
Technical (Architecture) Deployment
Organizational Deployment
Sprint 0..a Sprint 1..n
Back-log
User Stories
Architeture Model
Developer StoriesCustomer
Representatives
Bus Architecture Matrix
03-11-2014
Application of ColumnStore and
In-Memory technologies
In-Memory during development
In-Memory for Analysis
In-Memory for new oppertunities
ColumnStore for Analysis
• Data Audit in the Kimball Lifecycle
– Data Identification
– Data Profiling
• Descriptive statistics for all relevant columns
– Insigth into data
• Verification of Business-logic
• Input for ETL-coding
03-11-2014
In-Memory during development 1#3
• Traditional Approach
– Write and execute a Query for all relevant columns
→ Table Scan
→ Static Analysis
• Table-scan in a Oracle-database 4-5h/200 m. records → Data Profiling is very often not performed → Data QualityRisks
03-11-2014
In-Memory during development 2#3
• Using in-memory engine
– Off-load the relevant table[s] to the in-
memory engine
→ Table scan on source Database
→ Dynamic Analysis [Pivot-table and Graphical]
• Off-load is performed during off-hours –analysis is done the next day → BetterData Quality
03-11-2014
In-Memory during development 3#3
• Performing Analytics on large volumes
of data
03-11-2014
In-Memory for Analysis
Data Volume
In-Memory
Client
In-Memory
Server
ColumnStore
Server
Data Volumes
Limited by
Client-memory
Data Volumes
Limited by
Server-memory
Data Volumes
can exceed
Server-memory
[eg. SSD-memory-
Extension]10 mill. Records (10 GB)
After Load to Excel In-Memory: 2.9GB RAM
Response time (Count Distinct) < 1s
• Client or Server Analysis
• Server
+ Larger Data Volumes
- Fixed Data Model
• Client
- Data Volumes limited by local RAM
+ Flexible Data Model → Self Service BI
03-11-2014
In-Memory for Analysis
11/3/2014
Self Service BI
Personal BI
Team BIEnterprise BI
Non-Enterprise BI-data
• Analysis in relational databases
– Existing tools
– Special tools for relational analysis
+ Data Volumes can exceed RAM [and still
perform using eg. SSD-disks og SSD-based
RAM-extension]
+ Flexible [within the Database]
- [Database] External data?
03-11-2014
ColumnStore for Analysis
• The traditional Dimensional Model is
great – but has some limitations
– Requires PK-FK-relationships in the
database
→ issues in special situations
• Inventory [Periodic Snapshot Facts]
• Market Basket [non-key relationship for filtering]
03-11-2014
In-Memory for new oppertunities
• Example – Stock-at-hand
03-11-2014
Inventory 1#4
• Star-schema data model
• Problem – a record must exist for all products [and other dimensions] at all dates
03-11-2014
Inventory 2#4
• The ETL-process or Query must ”fill
the gaps”
03-11-2014
Inventory 3#4
• What does it take to make the calculation”on the fly”
– The transaction table must be scanned and a running total calculated
– Scanning a table in a traditional database takes too much time to be performed ”on the fly”
– The same scanning can be feasible in a In-Memory-table
The In-Memory technology provides a leaner, more efficient and elegant solution
03-11-2014
Inventory 4#4