technical deep-dive in a column-oriented in … model for enterprise applications ! maintenance and...
TRANSCRIPT
Technical Deep-Dive in a
Column-Oriented In-Memory Database
Martin Faust [email protected]
Research Group of Prof. Hasso Plattner
Hasso Plattner Institute for Software Engineering University of Potsdam
Research Group of Prof. Dr. h.c. Hasso Plattner § Research focuses on the technical aspects of enterprise software and
design of complex applications § In-Memory Data Management for Enterprise Applications § Human-Centered Software Design and Engineering § Programming Model for Enterprise
Applications § Maintenance and Evolution of
Service-Oriented Enterprise Software § RFID Technology in Enterprise Platforms
§ Teaching activities: § Lectures, Exercises, Seminars § Bachelor/Master projects § Master thesis
Enterprise Platform and Integration Concepts Group - Topics
2
The Design Thinking Methodology
3
Viability Business Factors
Feasibility Technical Factors
Solutions Needs & Opportunities
Desirability Human Factors
Demand Prediction
Supply Chain
Innovation
Database Technology
Massachusetts Institute of Technology
…
Software
…
Hardware
Cloud Computing
UC Berkeley
Design Research
In-Memory Data
Management Stanford
University
Hasso Plattner Institute, Enterprise Systems &
Applications Chair
…
End-Users
Setup Enables Leading Research in the Field of In-‐Memory Data Management
Learning Map of our Online Lecture @ openHPI.de
Founda'ons for a New Enterprise Applica'on
Development Era
Founda'ons of Database Storage
Techniques
The Future of Enterprise Compu'ng
Advanced Database Storage Tech-‐niques
In-‐Memory Database Operators
Goals
Deep technical understanding of a column-‐oriented, dicAonary-‐encoded in-‐memory database and its applicaAon in enterprise compuAng
Chapters □ The future of enterprise compuAng □ FoundaAons of database storage techniques □ In-‐memory database operators □ Advanced database storage techniques □ ImplicaAons on ApplicaAon Development
6
Chapter 1:
The Future of Enterprise Computing
New Requirements for Enterprise Computing
□ Sensors
□ Events
□ Structured + unstructured data
□ Social networks
□ ...
8
���Enterprise Application
Characteristics���
¨ Modern enterprise resource planning (ERP) systems are challenged by mixed workloads, including OLAP-‐style queries. For example: § OLTP-‐style: create sales order, invoice, accounAng documents,
display customer master data or sales order § OLAP-‐style: dunning, available-‐to-‐promise, cross selling,
operaAonal reporAng (list open sales orders)
¨ But: Today’s data management systems are opAmized either for daily transac'onal or analy'cal workloads storing their data along rows or columns
Online TransacAon Processing
Online AnalyAcal Processing
OLTP vs. OLAP
10
Drawbacks of the Separation
¨ OLAP systems do not have the latest data
¨ OLAP systems only have predefined subset of the data
¨ Cost-‐intensive ETL processes have to sync both systems
¨ SeparaAon introduces data redundancy
¨ Different data schemas introduce complexity for
applicaAons combining sources
11
Enterprise Workloads are Read Dominated
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100%
OLTP OLAP
Work
load
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100%
TPC-C
Work
load
Select
Insert
Modification
Delete
Write:
Read:
Lookup
Table Scan
Range Select
Insert
Modification
Delete
Write:
Read:
§ Workload in enterprise applicaAons consAtutes of § Mainly read queries (OLTP 83%, OLAP 94%) § Many queries access large sets of data
12
Few Updates in OLTP Pe
rcen
tage of row
s upd
ated
13
Combine OLTP and OLAP data using modern hardware and database systems
to create a single source of truth, enable real-‐'me analy'cs and
simplify applicaAons and database structures.
AddiAonally, ¨ ExtracAon, transformaAon, and loading (ETL) processes ¨ Pre-‐computed aggregates and materialized views
become (almost) obsolete.
Vision
14
¨ Many columns are not used even once
¨ Many columns have a low cardinality of values
¨ NULL values/default values are dominant
¨ Sparse distribuAon facilitates high compression
Standard enterprise so`ware data is sparse and wide.
Enterprise Data Characteristics
15
Many Columns are not Used Even Once
55% unused columns per company in average 40% unused columns across all 12 analyzed companies
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 - 32 33 - 1023 1024 - 100000000
13%9%
78%
24%
12%
64%
Number of Distinct Values
Inventory ManagementFinancial Accounting
% o
f Col
umns
16
Changes in Hardware
A
Changes in Hardware…
… give an opportunity to re-‐think the assump'ons of yesterday because of what is possible today.
§ Main Memory becomes cheaper and larger
■ MulA-‐Core Architecture (128 cores per server)
■ Large main memories: 2TB /blade
■ One blade ~$50.000 = 1 enterprise class server
■ Parallel scaling across blades
■ 64-‐bit address space ■ 12TB in current servers ■ Cost-‐performance raAo rapidly
declining
■ Memory hierarchies
18
In the Meantime Research has come up with…
§ Column-‐oriented data organizaAon (the column store) § Sequen'al scans allow best bandwidth uAlizaAon between CPU cores and memory
§ Independence of tuples allows easy parAAoning and therefore parallel processing
§ Lightweight Compression § Reducing data amount § Increasing processing speed through late materializaAon
§ And more, e.g., parallel scan/join/aggregaAon
… several advances in soRware for processing data
+
19
A Blueprint of SanssouciDB
SanssouciDB: An In-Memory Database for Enterprise Applications
Main Memoryat Blade i
Log
SnapshotsPassive Data (History)
Non-VolatileMemory
RecoveryLoggingTime travel
Data aging
Query Execution Metadata TA Manager
Interface Services and Session Management
Distribution Layerat Blade i
Main Store DifferentialStore
Active Data
Me
rgeCo
lum
n
Co
lum
n
Co
mb
ined
Co
lum
n
Co
lum
n
Co
lum
n
Co
mb
ined
Co
lum
n
Indexes
Inverted
ObjectData Guide
In-‐Memory Database (IMDB)
¨ Data resides permanently in main memory
¨ Main Memory is the primary “persistence”
¨ SAll: logging and recovery to/from flash
¨ Main memory access is the new boTleneck
¨ Cache-‐conscious algorithms/ data structures are crucial (locality is king)
Main Memory at Server i
Distribution Layer at Server i
Chapter 2:
Foundations of Database Storage Techniques
Learning Map of our Online Lecture @ openHPI.de
Founda'ons for a New Enterprise Applica'on
Development Era
Founda'ons of Database Storage
Techniques
The Future of Enterprise Compu'ng
Advanced Database Storage Tech-‐niques
In-‐Memory Database Operators
Data Layout in Main Memory
Memory Basics (1)
□ Memory in today’s computers has a linear address layout: addresses start at 0x0 and go to 0xFFFFFFFFFFFFFFFF for 64bit
□ Each process has its own virtual address space □ Virtual memory allocated by the program can distribute over
mulAple physical memory locaAons □ Address translaAon is done in hardware by the CPU
25
Memory Basics (2)
□ Memory layout is only linear □ Every higher-‐dimensional access (like two-‐dimensional
database tables) is mapped to this linear band
26
0x0 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
...
Memory Hierarchy
27
Memory Page
Nehalem Quadcore
Core 0 Core 1 Core 2 Core 3
L3 Cache
L2
L1
TLB
Main Memory Main Memory
QPI
Nehalem Quadcore
Core 0Core 1 Core 2 Core 3
L3 Cache
L2
L1
TLB
QPI
L1 Cacheline
L2 Cacheline
L3 Cacheline
Physical Data Representation § Row store:
§ Rows are stored consecuAvely § OpAmal for row-‐wise access (e.g. SELECT *)
§ Column store: § Columns are stored consecuAvely § OpAmal for akribute focused access (e.g. SUM, GROUP BY)
§ Note: concept is independent from storage type
+
28
Doc
Num
Doc
Date
Sold-
To
Value
Status
Sales
Org
Row
4
Row
3
Row
2
Row
1
Row-Store Column-store
Row Data Layout □ Data is stored tuple-‐wise □ Leverage co-‐locaAon of akributes for a single tuple □ Low cost for tuple reconstrucAon, but higher cost for
sequenAal scan of a single akribute
CBACBACBACBA
CBACBACBACBA
Column Operation
Row Operation
Row Row Row
Row Row Row Row29
Columnar Data Layout □ Data is stored akribute-‐wise □ Leverage sequenAal scan-‐speed in main memory
for predicate evaluaAon □ Tuple reconstrucAon is more expensive
CCCCBBBBAAAA
CCCCBBBBAAAA
Column Operation
Row Operation
Column Column Column
Column Column Column30
Row-oriented storage
31
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Row-oriented storage
32
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Row-oriented storage
33
A1 B1 C1 A2 B2 C2
A3 B3 C3
A4 B4 C4
Row-oriented storage
34
A1 B1 C1 A2 B2 C2 A3 B3 C3
A4 B4 C4
Row-oriented storage
35
A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4
Column-oriented storage
36
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Column-oriented storage
37
B1 C1
B2 C2
B3 C3
B4 C4
A1 A2 A3 A4
Column-oriented storage
38
A1 B1
C1
A2 B2
C2
A3 B3
C3
A4 B4
C4
Column-oriented storage
39
A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4
Dictionary Encoding
Motivation □ Main memory access is the new bokleneck □ Idea: Trade CPU Ame to compress and decompress data □ Compression reduces number of memory accesses □ Leads to less cache misses due to more informaAon on a
cache line □ OperaAon directly on compressed data possible □ Offseong with bit-‐encoded fixed-‐length data types □ Based on limited value domain
41
Dictionary Encoding Example
8 billion humans □ Akributes
§ first name § last name § gender § country § city § birthday à 200 byte
□ Each akribute is stored dicAonary encoded
42
Sample Data
43
rec ID fname lname gender city country birthday
… … … … … … …
39 John Smith m Chicago USA 12.03.1964
40 Mary Brown f London UK 12.05.1964
41 Jane Doe f Palo Alto USA 23.04.1976
42 John Doe m Palo Alto USA 17.06.1952
43 Peter Schmidt m Potsdam GER 11.11.1975
… … … … … …
Dictionary Encoding a Column
□ A column is split into a dicAonary and an akribute vector □ DicAonary stores all disAnct values with implicit value ID □ Akribute vector stores value IDs for all entries in the column □ PosiAon is implicit, not stored explicitly □ Enables offseong with fixed-‐length data types
44
Rec ID fname
… …
39 John
40 Mary
41 Jane
42 John
43 Peter
… …
Dic'onary for “fname”
Value ID Value
… …
23 John
24 Mary
25 Jane
26 Peter
… …
ATribute Vector for “fname”
posi'on Value ID
… …
39 23
40 24
41 25
42 23
43 26
… …
Sorted Dictionary □ DicAonary entries are sorted either by their numeric value or
lexicographically § DicAonary lookup complexity: O(log(n)) instead of O(n)
□ DicAonary entries can be compressed to reduce the amount of required storage
□ SelecAon criteria with ranges are less expensive (order-‐preserving dicAonary)
45
Data Size Examples
46
Column Cardi-‐nality
Bits Needed
Item Size Plain Size Size with Dic'onary (Dic'onary + Column)
Compression Factor
First name 5 million 23 bit 50 Byte 373 GB 238.4 MB + 21.4 GB ≈ 17
Last name 8 million 23 bit 50 Byte 373 GB 381.5 MB + 21.4 GB ≈ 17
Gender 2 1 bit 1 Byte 7 GB 2 bit + 953.7 MB ≈ 8
City 1 million 20 bit 50 Byte 373 GB 47.7 MB + 18.6 GB ≈ 20
Country 200 8 bit 47 Byte 350 GB 9.2 KB + 7.5 GB ≈ 47
Birthday 40,000 16 bit 2 Byte 15 GB 78.1 KB + 14.9 GB ≈ 1
Totals 200 Byte ≈ 1.6 TB ≈ 92 GB ≈ 17
Chapter 3:
In-Memory Database Operators
Learning Map of our Online Lecture @ openHPI.de
Founda'ons for a New Enterprise Applica'on
Development Era
Founda'ons of Database Storage
Techniques
The Future of Enterprise Compu'ng
Advanced Database Storage Tech-‐niques
In-‐Memory Database Operators
Scan Performance (1) 8 billion humans
¨ Akributes § First Name § Last Name § Gender § Country § City § Birthday è 200 byte
¨ QuesAon: How many men/women? ¨ Assumed scan speed: 2 MB/ms/core
49
Scan Performance (2)
50
Row 1
Row 2
Row 3
...
Row 8 x 10
First NameLast Name
GenderCountry
CityBirthday
Table: humans
9
Row Store – Layout
¨ Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB
¨ Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core
Scan Performance (3)
51
Row 1
Row 2
Row 3
...
First NameLast Name Country
CityBirthday
Table: humans
Data loaded and used
Data loaded but not used
Row 8 x 109
Gender
Row Store – Full Table Scan
¨ Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB
¨ Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core
Scan Performance (4)
52
Row 1
Row 2
Row 3
...
First NameLast Name Country
CityBirthday
Table: humans
Data not loaded
Data loaded and used
Data loaded but not used
Row 8 x 109
Gender
¨ 8 billion cache accesses à 64 byte à ~512 GB
¨ Read with 2 MB/ms/core à ~256 seconds with 1 core
Row Store – Stride Access “Gender”
Scan Performance (5)
53
First NameLast Name Country
CityBirthday
Table: humans
Gender
Column Store – Layout
¨ Table size § Attribute vectors: ~91 GB
§ Dictionaries: ~700 MB à Total: ~92 GB
¨ Compression factor: ~17
Scan Performance (6)
54
Column Store – Full Column Scan on “Gender”
First NameLast Name Country
CityBirthday
Table: humans
Data not loaded
Data loaded and used
Data loaded but not used
Gender
¨ Size of attribute vector “gender” = 8 billion tuples x 1 bit per tuple à ~1 GB
¨ Scan through attribute vector with 2 MB/ms/core à ~0.5 seconds with 1 core
Scan Performance (7)
55
Column Store – Full Column Scan on “Birthday”
First NameLast Name Country
CityBirthday
Table: humans
Data not loaded
Data loaded and used
Data loaded but not used
Gender
¨ Size of attribute vector “birthday” = 8 billion tuples x 2 Byte per tuple à ~16 GB
¨ Scan through column with 2 MB/ms/core à ~8 seconds with 1 core
Scan Performance – Summary
56
¨ How many women, how many men?
Column Store Row Store
Full table scan
Stride access
Time in seconds 0.5 800 256
1,600x slower
512x slower
Tuple Reconstruction
Tuple Reconstruction (1)
¨ All akributes are stored consecuAvely
¨ 200 byte à 4 cache accesses à 64 byte à 256 byte
¨ Read with 2 MB/ms/core à ~0.128 μs with 1 core
Accessing a record in a row store
58
Table: world_populaAon
Tuple Reconstruction (2)
¨ All akributes are stored in separate columns
¨ Implicit record IDs are used to reconstruct rows
59
Virtual record IDs Table: world_populaAon
Tuple Reconstruction (3)
¨ 1 cache access for each akribute
¨ 6 cache accesses à 64 byte à 384 byte
¨ Read with 2 MB/ms/core à ~0.192 μs with 1 core
60
Virtual record IDs Table: world_populaAon
Select
SELECT Example SELECT fname, lname FROM world_population WHERE country="Italy" and gender="m"
fname lname country gender
Gianluigi Buffon Italy m
Lena Gercke Germany f
Mario Balotelli Italy m
Manuel Neuer Germany m
Lukas Podolski Germany m
Klaas-‐Jan Huntelaar Netherlands m
62
Query Plan
¨ MulAple plans exist to execute query ¨ Query OpAmizer decides which is executed ¨ Based on cost model, staAsAcs and other parameters
¨ AlternaAves ¨ Scan “country” and “gender”, posiAonal AND ¨ Scan over “country” and probe into “gender” ¨ Indices might be used ¨ Decision depends on data and query parameters like e.g. selecAvity
63
SELECT fname, lname FROM world_population WHERE country="Italy" and gender="m"
Query Plan (i)
PosiAonal AND: ¨ Predicates are evaluated and generate posiAon lists ¨ Intermediate posiAon lists are logically combined ¨ Final posiAon list is used for materializaAon
64
country = "Italy" gender = "m"
fname, lname
position list
position list
positionalAND
π
Query Execution (i)
Posi'on
0
2
Posi'on
0
2
3
4
5
Posi'on
0
2
country = 3 ("Italy")
gender = 1 ("m")
AND
Value ID Dic'onary for “country”
0 Algeria
1 France
2 Germany
3 Italy
4 Netherlands
…
Value ID
Dic'onary for “gender”
0 f
1 m
fname lname country gender
Gianluigi Buffon 3 1
Lena Gercke 2 0
Mario Balotelli 3 1
Manuel Neuer 2 1
Lukas Podolski 2 1
Klaas-‐Jan Huntelaar 4 1
65
country = "Italy"
Gender
fname, lname
position list
ProbePositionalFilter
gender = "m"
Query Plan (ii)
Based on posiAon list produced by first selecAon, gender column is probed.
66
π
Insert
Insert □ Insert is the dominant modificaAon operaAon
§ Delete/Update can be modeled as Inserts as well (Insert-‐only approach)
□ InserAng into a compressed in-‐memory persistence can be expensive § UpdaAng sorted sequences (e.g. dicAonaries) is a challenge § InserAng into columnar storages is generally more expensive than
inserAng into row storages
68
Insert Example
rowID fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
... ... ... ... ... ... ...
world_populaAon
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
69
INSERT (1) w/o new Dictionary entry
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
... ... ... ... ... ... ...
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
0 0
1 1
2 3
3 2
4 3
0 Albrecht
1 Berg
2 Meyer
3 Schulze
D AV
Akribute Vector (AV) DicAonary (D)
70
0 Albrecht
1 Berg
2 Meyer
3 Schulze
1. Look-‐up on D à entry found
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
Akribute Vector (AV) DicAonary (D)
D fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
... ... ... ... ... ... ...
AV
0 0
1 1
2 3
3 2
4 3
INSERT (1) w/o new Dictionary entry
71
0 0
1 1
2 3
3 2
4 3
5 3
1. Look-‐up on D à entry found 2. Append ValueID to AV
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
Akribute Vector (AV) DicAonary (D)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze
... ... ... ... ... ... ...
AV
0 Albrecht
1 Berg
2 Meyer
3 Schulze
D
INSERT (1) w/o new Dictionary entry
72
INSERT (2) with new Dictionary Entry I/II
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze
... ... ... ... ... ... ...
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
0 0
1 0
2 1
3 2
4 3
D AV
0 Berlin
1 Hamburg
2 Innsbruck
3 Potsdam
Akribute Vector (AV) DicAonary (D)
73
1. Look-‐up on D à no entry found
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
D AV
0 0
1 0
2 1
3 2
4 3
0 Berlin
1 Hamburg
2 Innsbruck
3 Potsdam
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
INSERT (2) with new Dictionary Entry I/II
74
1. Look-‐up on D à no entry found 2. Append new value to D (no re-‐sorAng necessary)
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
0 Berlin
1 Hamburg
2 Innsbruck
3 Potsdam
4 Rostock
D fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
AV
0 0
1 0
2 1
3 2
4 3
INSERT (2) with new Dictionary Entry I/II
75
1. Look-‐up on D à no entry found 2. Append new value to D (no re-‐sorAng necessary) 3. Append ValueID to AV
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
0 0
1 0
2 1
3 2
4 3
5 4
0 Berlin
1 Hamburg
2 Innsbruck
3 Potsdam
4 Rostock
D AV
INSERT (2) with new Dictionary Entry I/II
76
0 Anton
1 Hanna
2 MarAn
3 Michael
4 Sophie
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
D AV
0 2
1 3
2 1
3 0
4 4
Akribute Vector (AV) DicAonary (D)
INSERT (2) with new Dictionary Entry I/II
77
1. Look-‐up on D à no entry found
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
0 Anton
1 Hanna
2 MarAn
3 Michael
4 Sophie
D AV
0 2
1 3
2 1
3 0
4 4
INSERT (2) with new Dictionary Entry II/II
78
1. Look-‐up on D à no entry found 2. Insert new value to D
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
0 Anton
1 Hanna
2 Karen
3 MarAn
4 Michael
5 Sophie
D AV
0 2
1 3
2 1
3 0
4 4
INSERT (2) with new Dictionary Entry II/II
79
1. Look-‐up on D à no entry found 2. Insert new value to D
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
0 Anton
1 Hanna
2 MarAn
3 Michael
4 Sophie
D (old) AV
0 2
1 3
2 1
3 0
4 4
0 Anton
1 Hanna
2 Karen
3 MarAn
4 Michael
5 Sophie
D (new)
INSERT (2) with new Dictionary Entry II/II
80
1. Look-‐up on D à no entry found 2. Insert new value to D 3. Change ValueIDs in AV
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
AV (old)
0 Anton
1 Hanna
2 Karen
3 MarAn
4 Michael
5 Sophie
D (new) AV (new)
0 3
1 4
2 1
3 0
4 5
0 2
1 3
2 1
3 0
4 4
INSERT (2) with new Dictionary Entry II/II
81
1. Look-‐up on D à no entry found 2. Insert new value to D 3. Change ValueIDs in AV 4. Append new ValueID to AV
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Sophie Schulze f GER Potsdam 09-‐03-‐1977
5 Karen Schulze Rostock
... ... ... ... ... ... ...
Akribute Vector (AV) DicAonary (D)
0 Anton
1 Hanna
2 Karen
3 MarAn
4 Michael
5 Sophie
D AV
0 3
1 4
2 1
3 0
4 5
5 2
INSERT (2) with new Dictionary Entry II/II
82
Changed Value IDs
Result
rowID fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977
5 Karen Schulze f GER Rostock 11-‐15-‐2012
world_populaAon
INSERT INTO world_populaAon VALUES (Karen, Schulze, f, GER, Rostock, 11-‐15-‐2012)
83
Chapter 4:
Advanced Database Storage Techniques
Learning Map of our Online Lecture @ openHPI.de
Founda'ons for a New Enterprise Applica'on
Development Era
Founda'ons of Database Storage
Techniques
The Future of Enterprise Compu'ng
Advanced Database Storage Tech-‐niques
In-‐Memory Database Operators
Differential Buffer
Motivation
□ InserAng new tuples directly into a compressed structure can be expensive § Especially when using sorted structures § New values can require reorganizing the dicAonary § Number of bits required to encode all dicAonary values can change,
akribute vector has to be reorganized
87
Differential Buffer ¨ New values are wriken to a dedicated differenAal buffer (Delta) ¨ Cache SensiAve B+ Tree (CSB+) used for fast search on Delta
Table
MainStore
DifferentialBufferD
ata
Mod
ifyin
g O
pera
tions
Rea
d O
pera
tions
TableDicAonary Akribute Vector
fname
…
(com
pressed)
Main Store
DicAonary Akribute Vector
CSB+
fname
…
DifferenAal Buffer/ Delta
Write Read
world_population
0 0
1 1
2 1
3 3
4 2
5 1
0 Anton
1 Hanna
2 Michael
3 Sophie
0 Angela
1 Klaus
2 Andre
0 0
1 1
2 1
3 2
8 Billion entries up to 50,000
entries
88
Differential Buffer
□ Inserts of new values are fast, because dicAonary and akribute vector do not need to be resorted
□ Range selects on differenAal buffer are expensive § Unsorted dicAonary allows no direct comparison of value IDs § Scans with range selecAon need to lookup values in dicAonary for
comparisons
□ DifferenAal Buffer requires more memory: § Akribute vector not bit-‐compressed § AddiAonal CSB+ Tree for dicAonary
89
DifferenAal Buffer
Main Store
Tuple Lifetime
recId fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977
5 Sophie Schulze f GER Rostock 06-‐20-‐2012
... ... ... ... ... ... ...
8 * 109 Zacharias Perdopolus m GRE Athen 03-‐12-‐1979
Main Table: world_populaAon
Michael moves from Berlin to Potsdam
UPDATE "world_populaAon" SET city="Potsdam" WHERE fname="Michael" AND lname="Berg"
90
DifferenAal Buffer
Main Store
Tuple Lifetime
recId fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977
5 Sophie Schulze f GER Rostock 06-‐20-‐2012
... ... ... ... ... ... ...
8 * 109 Zacharias Perdopolus m GRE Athen 03-‐12-‐1979
UPDATE "world_populaAon" SET city="Potsdam" WHERE fname="Michael" AND lname="Berg"
91
Main Table: world_populaAon
Michael moves from Berlin to Potsdam
Tuple Lifetime
recId fname lname gender country city birthday
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955
1 Michael Berg m GER Berlin 03-‐05-‐1970
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977
5 Sophie Schulze f GER Rostock 06-‐20-‐2012
... ... ... ... ... ... ...
8 * 109 Zacharias Perdopolus m GRE Athen 03-‐12-‐1979
UPDATE "world_populaAon" SET city="Potsdam" WHERE fname="Michael" AND lname="Berg"
DifferenAal Buffer
Main Store
0 Michael Berg m GER Potsdam 03-‐05-‐1970
92
Main Table: world_populaAon
Michael moves from Berlin to Potsdam
Tuple Lifetime □ Tuples are now available in Main Store and DifferenAal Buffer □ Tuples of a table are marked by a validity vector to reduce the
required amount of reorganizaAon steps § AddiAonal akribute vector for validity § 1 bit required per database tuple
□ Invalidated tuples stay in the database table, unAl the next reorganizaAon takes place
□ Query results § Main and delta have to be queried § Results are filtered using the validity vector
93
recId fname lname gender country city birthday valid
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955 1
1 Michael Berg m GER Berlin 03-‐05-‐1970 0
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968 1
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992 1
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977 1
5 Sophie Schulze f GER Rostock 06-‐20-‐2012 1
... ... ... ... ... ... ...
8 * 109 Zacharias Perdopolus m GRE Athen 03-‐12-‐1979 1
Tuple Lifetime
UPDATE "world_populaAon" SET city="Potsdam" WHERE fname="Michael" AND lname="Berg"
0 Michael Berg m GER Potsdam 03-‐05-‐1970 1 DifferenAal Buffer
Main Store
94
Main Table: world_populaAon
Michael moves from Berlin to Potsdam
Tuple Lifetime
UPDATE "world_populaAon" SET city="Potsdam" WHERE fname="Michael" AND lname="Berg"
recId fname lname gender country city birthday valid
0 MarAn Albrecht m GER Berlin 08-‐05-‐1955 1
1 Michael Berg m GER Berlin 03-‐05-‐1970 0
2 Hanna Schulze f GER Hamburg 04-‐04-‐1968 1
3 Anton Meyer m AUT Innsbruck 10-‐20-‐1992 1
4 Ulrike Schulze f GER Potsdam 09-‐03-‐1977 1
5 Sophie Schulze f GER Rostock 06-‐20-‐2012 1
... ... ... ... ... ... ...
8 * 109 Zacharias Perdopolus m GRE Athen 03-‐12-‐1979 1
0 Michael Berg m GER Potsdam 03-‐05-‐1970 1 DifferenAal Buffer
Main Store
95
Main Table: world_populaAon
Michael moves from Berlin to Potsdam
Merge
Handling Write Operations
¨ All Write operaAons (INSERT, UPDATE) are stored within a differenAal buffer (delta) first
¨ Read-‐operaAons on differenAal buffer are more expensive than on main store
□ DifferenAal buffer is merged periodically with the main store § To avoid performance degradaAon based on large delta § Merge is performed asynchronously
97
Merge Overview I/II
¨ The merge process is triggered for single tables
¨ Is triggered by: § Amount of tuples in buffer § Cost model to
§ Schedule § Take query cost into account
§ Manually
98
Merge Overview II/II
¨ Working on data copies allows asynchronous merge ¨ Very limited interrupAon due to short lock ¨ At least twice the memory of the table needed!
99
MainStore
Differ-entialBuffer
BeforeData
Modifying Operations
Read Operations
MainStore
DifferentialBuffer(new)
During the Merge ProcessData
Modifying Operations
Read Operations
MainStore(new)
Differ-entialBuffer(new)
AfterData
Modifying Operations
Read Operations
MainStore(new)
Differ-entialBuffer
Merge Operation
Table Table Table
Prepare Attribute Merge Commit
Attribute Merge
Step 1: Dic'onary Merge ¨ Merge main and delta dicAonary
¨ OpAonally remove unused values ¨ Merge of two sorted arrays
¨ Create mapping if valueIDs changed Step 2: Update ATribute Vector ¨ Create new merged main parAAon ¨ Update valueIDs reflecAng changed dicAonary
100
Delta
Main Store
Example
101
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
recID valid
0 1
1 1
2 1
recID fname city
0 2 0
1 1 1
2 0 0
Dic'onaries ATribute Vectors Validity Vector
valueID fname city recID valid recID fname city
Dic'onaries ATribute Vectors Validity Vector
Delta
Main Store
Example
102
recID valid
0 1
1 0
2 1
Dic'onaries ATribute Vectors Validity Vector
valueID fname city
0 Michael Berlin
recID valid
0 1
recID fname city
0 0 0
Dic'onaries ATribute Vectors Validity Vector
Michael moves from London to Berlin
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
recID fname city
0 2 0
1 1 1
2 0 0
Main Store
Delta
Example
103
recID valid
0 0
1 0
2 1
Dic'onaries ATribute Vectors Validity Vector
valueID fname city
0 Michael Berlin
1 Nadja Potsdam
recID valid
0 1
1 1
recID fname city
0 0 0
1 1 1
Dic'onaries ATribute Vectors Validity Vector
Nadja moves from Berlin to Potsdam
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
recID fname city
0 2 0
1 1 1
2 0 0
Main Store
Delta
Example
104
recID valid
0 0
1 0
2 1
Dic'onaries ATribute Vectors Validity Vector
valueID fname city
0 Michael Berlin
1 Nadja Potsdam
recID valid
0 0
1 1
2 1
recID fname city
0 0 0
1 1 1
2 0 1
Dic'onaries ATribute Vectors Validity Vector
Michael moves from Berlin to Potsdam
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
recID fname city
0 2 0
1 1 1
2 0 0
Main Store
Delta
First Step: Validity Detection
105
recID valid
0 0
1 0
2 1
Dic'onaries ATribute Vectors Validity Vector
valueID fname city
0 Michael Berlin
1 Nadja Potsdam
recID valid
0 0
1 1
2 1
recID fname city
0 0 0
1 1 1
2 0 1
Dic'onaries ATribute Vectors Validity Vector
Will be deleted / moved into history parAAon
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
recID fname city
0 2 0
1 1 1
2 0 0
Main Store
Delta
First Step: Dictionary Merge
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
Dic'onaries
valueID fname city
0 Michael Berlin
1 Nadja Potsdam
Dic'onaries
city: old to new Main
valueID (old) 0 1
valueID (new) 0 NA
fname: old to new Main
valueID (old) 0 1 2
valueID (new) 0 1 2
city: Delta to Main
valueID (Delta) 0 1
valueID (Main) 0 1
fname: Delta to Main
valueID (Delta) 0 1
valueID (Main) 1 2
Mappings
New Combined Main Store
valueID fname city
0 Albert Berlin
1 Michael Potsdam
2 Nadja
Dic'onaries
Main Store
Delta
Second Step: New Main Column
Dic'onaries ATribute Vectors Validity Vector
valueID fname city
0 Michael Berlin
1 Nadja Potsdam
recID valid
0 1
1 1
recID fname city
0 1 1
1 0 1
Dic'onaries ATribute Vectors Validity Vector
recID valid
0 1
recID fname city
0 0 0
fname: old to new Main
valueID (old) 0 1 2
valueID (new) 0 1 2
fname: Delta to Main
valueID (Delta) 0 1
valueID (Main) 1 2
Mappings city: old to new Main
valueID (old) 0 1
valueID (new) 0 NA
city: Delta to Main
valueID (Delta) 0 1
valueID (Main) 0 1
valueID fname city
0 Albert Berlin
1 Michael London
2 Nadja
Main Store
Delta
Second Step: New Main Column
valueID fname city
0 Albert Berlin
1 Michael Potsdam
2 Nadja
recID valid
0 1
1 1
2 1
recID fname city
0 0 0
1 2 1
2 1 1
Dic'onaries ATribute Vectors Validity Vector
valueID fname city recID valid recID fname city
Dic'onaries ATribute Vectors Validity Vector
108
Chapter 5:
Implications on Application Development
Learning Map of our Online Lecture @ openHPI.de
Founda'ons for a New Enterprise Applica'on
Development Era
Founda'ons of Database Storage
Techniques
The Future of Enterprise Compu'ng
Advanced Database Storage Tech-‐niques
In-‐Memory Database Operators
How does it all come together? 1. Mixed Workload combining OLTP and
analytic-style queries ■ Column-Stores are best suited for analytic-style queries ■ In-memory databases enable fast tuple re-construction ■ In-memory column store allows aggregation on-the-fly
2. Sparse enterprise data ■ Lightweight compression schemes are optimal ■ Increases query execution ■ Improves feasibility of in-memory databases
3. Mostly read workload ■ Read-optimized stores provide best throughput
■ i.e. compressed in-memory column-store ■ Write-optimized store as delta partition
to handle data changes is sufficient
Changed Hardware
Advances in data processing
(software)
Complex Enterprise Applications
Our focus
111
An In-Memory Database for Enterprise Applications
□ In-‐Memory Database (IMDB) § Data resides permanently
in main memory § Main Memory is the
primary “persistence” § SAll: logging and recovery
from/to flash § Main memory access is
the new boTleneck § Cache-‐conscious algorithms/
data structures are crucial (locality is king)
112
Main Memoryat Blade i
Log
SnapshotsPassive Data (History)
Non-VolatileMemory
RecoveryLoggingTime travel
Data aging
Query Execution Metadata TA Manager
Interface Services and Session Management
Distribution Layerat Blade i
Main Store DifferentialStore
Active Data
Me
rgeCo
lum
n
Co
lum
n
Co
mb
ined
Co
lum
n
Co
lum
n
Co
lum
n
Co
mb
ined
Co
lum
n
Indexes
Inverted
ObjectData Guide
Simplified Application Development
TradiAonal Column-‐oriented
ApplicaAon cache
Database cache
Prebuilt aggregates
Raw data
113
¨ Fewer caches necessary ¨ No redundant data
(OLAP/OLTP, LiveCache) ¨ No maintenance of
materialized views or aggregates
¨ Minimal index maintenance
Examples for Implications on Enterprise Applications
SAP ERP Financials on In-Memory Technology
In-memory column database for an ERP system
□ Combined workload (parallel OLTP/OLAP queries)
□ Leverage in-memory capabilities to § Reduce amount of data § Aggregate on-the-fly § Run analytic-style queries (to replace materialized views) § Execute stored procedures
□ Use Case: SAP ERP Financials solution § Post and change documents § Display open items § Run dunning job § Analytical queries, such as balance sheet
115
Current Financials Solutions
116
Only base tables, algorithms, and some indices
The Target Financials Solution
117
Feasibility of Financials on In-Memory Technology in 2009
□ Modifications on SAP Financials § Removed secondary indices, sum tables and pre-calculated and
materialized tables
§ Reduce code complexity and simplify locks
§ Insert Only to enable history (change document replacement)
§ Added stored procedures with business functionality
□ European division of a retailer § ERP 2005 ECC 6.0 EhP3
§ 5.5 TB system database size
§ Financials:
§ 23 million headers / 1.5 GB in main memory
§ 252 million items / 50 GB in main memory (including inverted indices for join attributes and insert only extension)
118
BKPF
accounting documents
BSEG
sum tables
secondary indices
dunning data
change documents
CDHDR
CDPOS
MHNK
MHND BSAD
BSAK
BSAS
BSID
BSIK
BSIS
LFC1
KNC1
GLT0
119
In-Memory Financials on SAP ERP
BKPF
accounting documents
BSEG
In-Memory Financials on SAP ERP
120
Classic Row-Store
(w/o compr.) IMDB
BKPF 8.7 GB 1.5 GB
BSEG 255 GB 50 GB
Secondary Indices 255 GB -
Sum Tables 0.55 GB -
Complete 519.25 GB 51.5 GB
263.7 GB 51.5 GB
Reduction by a Factor 10
121
Booking an accounting document
□ Insert into BKPF and BSEG only
□ Lack of updates reduces locks
122
Wrap Up (I)
□ The future of enterprise compuAng § Big data challenges § Changes in Hardware § OLTP and OLAP in one single system
□ FoundaAons of database storage techniques § Data layout opAmized for memory hierarchies § Light-‐weight compression techniques
□ In-‐memory database operators § Operators on dicAonary compressed data § Query execuAon: Scan, Insert, Tuple ReconstrucAon
123
Wrap Up (II)
□ Advanced database storage techniques § DifferenAal buffer accumulates changes § Merge combines changes periodically with main storage
□ ImplicaAons on ApplicaAon Development § Move data intensive operaAons closer to the data § New analyAcal applicaAons on transacAonal data possible § Less data redundancy, more on the fly calculaAon § Reduced code complexity
124
References
□ A Course in In-‐Memory Data Management, H. Plakner hkp://epic.hpi.uni-‐potsdam.de/Home/InMemoryBook
□ PublicaAons of our Research Group: § Papers about the inner-‐workings of in-‐memory databases § hkp://epic.hpi.uni-‐potsdam.de/Home/PublicaAons
125
Thank You! Technical Deep-Dive in a Column-Oriented
In-Memory Database
Martin Faust [email protected]
Research Group of Prof. Hasso Plattner
Hasso Plattner Institute for Software Engineering University of Potsdam
Dunning Run
□ Dunning run determines all open and due invoices □ Customer defined queries on 250M records □ Current system: 20 min □ New logic: 1.5 sec
§ In-‐memory column store § Parallelized stored procedures § Simplified Financials
127
Bring Application Logic Closer to the Storage Layer
□ Select accounts to be dunned, for each: § Select open account items from BSID, for each:
§ Calculate due date § Select dunning procedure, level and area
§ Create MHNK entries
□ Create and write dunning item tables
128
□ Select accounts to be dunned, for each: § Select open account items from BSID, for each:
§ Calculate due date § Select dunning procedure, level and area
§ Create MHNK entries
□ Create and write dunning item tables
1 SELECT
10000 SELECTs
10000 SELECTs
31000 Entries
129
Bring Application Logic Closer to the Storage Layer
Bring Application Logic Closer to the Storage
Layer
□ Select accounts to be dunned, for each: § Select open account items from BSID, for each:
§ Calculate due date § Select dunning procedure, level and area
§ Create MHNK entries
□ Create and write dunning item tables
130
1 SELECT
10000 SELECTs
10000 SELECTs
31000 Entries
One single stored procedure executed within IMDB
Bring Application Logic Closer to the Storage
Layer
□ Select accounts to be dunned, for each: § Select open account items from BSID, for each:
§ Calculate due date § Select dunning procedure, level and area
§ Create MHNK entries
□ Create and write dunning item tables
Calculated on-the-fly
131
One single stored procedure executed within IMDB
Dunning Application
132
Dunning Application
133
Available-to-Promise Check □ Can I get enough quanAAes of a requested product on a
desired delivery date? □ Goal: Analyze and validate the potenAal of in-‐memory and
highly parallel data processing for Available-‐to-‐Promise (ATP) □ Challenges
§ Dynamic aggregaAon § Instant rescheduling in minutes vs. nightly batch runs § Real-‐Ame and historical analyAcs
□ Outcome § Real-‐Ame ATP checks without materialized views § Ad-‐hoc rescheduling § No materialized aggregates
134
In-Memory Available-to-Promise
135
Demand Planning □ Flexible analysis of demand
planning data □ Zooming to choose granularity □ Filter by certain products or
customers □ Browse through Ame spans □ CombinaAon of locaAon-‐based
geo data with planning data in an in-‐memory database
□ External factors such as the temperature, or the level of cloudiness can be overlaid to incorporate them in planning decisions
136
GORFID □ HANA for Streaming Data Processing □ Use Case: In-‐Memory RFID Data
Management □ EvaluaAon of SAP OER □ Prototypical implementaAon of:
§ RFID Read Event Repository on HANA § Discovery Service on HANA (10 billion data
records with ~3 seconds response Ame) § Front ends for iPhone & iPad
□ Key Findings: § HANA is suited for streaming data
(using bulk inserts) § AnalyAcs on streaming data is now possible
137
GORFID: “Near Real-Time” as a Concept
Discovery Service
Read Event Repositories
Verification Services
SAP HANA
● ●
up to 8,000 read event notifications
per second
up to 2,000 requests
per second
Discovery Service
Read Event Repositories
Verification Services
SAP HANA
● ●
P A
up to 8.000 read event notifications
per second
up to 2.000 requests
per second
Discovery Service
Read Event Repositories
Verification Services
SAP HANA
● ●
P A
up to 8.000 read event notifications
per second
up to 2.000 requests
per second
P A
Bulk load every 2-3 seconds:
> 50,000 inserts/s
138
POS Explorer I
139
POS Explorer II
140
POS Explorer III
141