monetdb intro ppt

M tDB/X100MonetDB/X100a (very) fast column-store( y)

M i Z k ki P t BMarcin Zukowski, Peter BonczSándor Héman, Niels NesSándor Héman, Niels Nes

CWI, Amsterdam, The Netherlands

Disclaimer… a different one ☺

This talk is about data-intensive applications pp

Data warehousing, analytical processing

Scientific data information retrievalScientific data, information retrieval

Transaction processing is a different story…

A lot of content… wake up!

MonetDB/X100 overview 2

Outline

Traditional database performancep

Improvements in MonetDB

MonetDB/X100MonetDB/X100

Query execution

Storage


Database performancep

TPC-H 1GB, Query 1, Q y

Selects 98% of fact table (6M rows), computes t i d t llnet prices and aggregates all

Performance:

C program: ?

MySQL: 26.2sMySQL: 26.2s

DBMS “X”: 28.1s


Database performancep

TPC-H 1GB, Query 1, Q y

Selects 98% of fact table (6M rows), computes t i d t llnet prices and aggregates all

Performance:

C program: 0.2s

MySQL: 26.2sMySQL: 26.2s

DBMS “X”: 28.1s


Database performance analyzedatabase pe o a ce a a y ed

Why so slow?y

Inefficient data storage format

Inefficient query processing modelInefficient query processing model


N-ary storage model (NSM)y g ( )

Fixed-width attributes in a record

Joe 27 Black101

Edward 21 Scissorhand103


Real-life NSM implementationp

“Slotted” pages, example:p g , p

27 Joe Black 101

21 Edward Scissorhand103 21 Edward Scissorhand 03


NSM problemsp

Poor bandwidth use

Always read all the attributes

Terrible on diskTerrible on disk

Bad in memory

Complex attribute access

Variable-length fields

Null fields


Column stores to the rescue!

Store attributes separatelyp y

R d l ib d bRead only attributes used by a query


“Traditional” column stores

Data pathp

Read columns from disk

Convert into NSMConvert into NSM

Use NSM-based processing

Examples: Sybase IQ, Vertica

Not enough!g

Only I/O problem addressed


How databases run a queryq y

Query

SELECT name, salary*.19 AS taxsalary .19 AS tax

FROMemployee

WHERE age > 25


Database operatorspTuple-at-a-time iterator interface:- open()- next(): tuple

close()- close()

next() is called:- for each operator- for each operator- for each tuple

Complex code repeatedComplex code repeated over and over


Primitive functions

Provide data-specific computational functionality

Called once for every operation on e er t pleoperation on every tuple.

Even worse with complex tuple representationtuple representation

Perform one operation (e.g. addition) in one call(e.g. addition) in one call


DBMS performance - IPTp

Lots of repeated, unnecessary codep , y

Operator logic

Function callsFunction calls

Attribute access

Most instructions NOT processing any actual data!

High instructions-per-tuple (IPT) factor


Modern CPUs

New CPU features over last 20 yearsy

Instruction and data cache

Deep pipeline with out of order executionDeep pipeline with out-of-order execution

Superscalar features – multiple instructions at once

SIMD instructions (SSE)

Great for e.g. multimedia processing…

… but bad for database code!


DBMS performance - CPIp

CPU-unfriendly codey

Complicated code

Poor use of CPU cache (both data and instructions)Poor use of CPU cache (both data and instructions)

Processing one value at a time

Compilers can’t help much

High cycles-per-instruction (CPI) factor


DBMS performancep

Performance factors:

High instructions-per-tuple

High cycles per instructionHigh cycles-per-instruction

Very high cycles-per-tuple (CPT)

Others can do better

Scientific computing, …

How can we?


MonetDB

MonetDB – 1993-now, developed at CWI , p

Peter’s PhD ☺

Column storeColumn store

Improves computational efficiency

Predecessor of MonetDB/X100


MonetDB: a column store

“save disk I/O when scan-intensive queries / qneed a few columns”




“ id i i t t t i“avoid an expression interpreter to improve computational efficiency”


MonetDB in actionSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30WHERE age > 30


MonetDB in actionSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30

CPU Efficiency depends on “nice” code- out-of-order execution- few dependencies (control,data)WHERE age > 30

int select gt float( oid* res,

few dependencies (control,data)- compiler support

Compilers love simple loops over arraysl i li i

Simple hard

_g _ ( ,float* column, float val, int n)

{for(int j=0 i=0; i<n; i++)

- loop-pipelining- automatic SIMD

Simple, hard-coded semantics

in operators

for(int j=0,i=0; i<n; i++)if (column[i] >val) res[j++] = i;

return j;}




“ id i i t t t i“avoid an expression interpreter to improve computational efficiency”

Simple algebra – Monet Interpreter Language (MIL)

Hard-coded operator semantics – no function calls

Array-like processing


MonetDB problempSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30WHERE age > 30

MATERIALIZED intermediate

results


Materialization problemp

Extra main-memory bandwidth y

Performance is sub-optimal…

but still faster than anything else (5 years ago ☺)… but still faster than anything else (5 years ago ☺)

Reduces scalability

Can’t afford writing to disk

Only effective for limited data sizes and not all query types


MonetDB: a Faustian Pact

You want efficiencyy

Simple hard-coded operators

I take scalabilitI take scalability

Result materializationSupports SQL and XQuery

C program: 0.2s

MonetDB: 3 7s

Open-source download:monetdb.cwi.nl

MonetDB: 3.7s

MySQL: 26.2s

DBMS “X”: 28.1s


MonetDB/X100/

My PhD thesisy

Motivation:

l ’ fi M DB l bililet’s fix MonetDB scalability…

… and improve the performance on the way ☺

Core ideas:

New execution modele e e u o ode

High performance column storage


Typical Relational DBMS Engineyp g

Query

SELECT name, salary*.19 AS taxsalary .19 AS tax

FROMemployee

WHERE age > 25


MonetDB/X100: “Vectors”/


MonetDB/X100: “Vectors”o et / 00 ecto s

Vector contains data of multiple tuples (~100-1000)

All operations work on entire vectorson entire vectors

Effect: much less operator.next() and primitive calls.


VectorsColumn slices as

unary arrays

NOT:NOT:Vertical is a better table storage layout than horizontal

(though we still think it often is)(though we still think it often is)

RATIONALE:- Simple array operations are p y p

well-supported by compilers- SIMD friendly layout- Assumed cache-resident


Vectorized Primitivesint select_lt_int_col_int_val (

int *res,t es,int *col, int val, int n)

{for(int j i 0; i<n; i++)

Most primitives take just 0.5 (!) tofor(int j=i=0; i<n; i++)

if (col[i] < val) res[j++] = i;return j;

}

take just 0.5 (!) to 10 cycles per tuple

}10-100+ times faster thant l t tituple-at-a-time


MonetDB/X100/

Both efficiency…y

Vectorized primitives

and scalabilit… and scalability

Pipelined query evaluation

C program: 0.2s

MonetDB/X100: 0.6s

MonetDB: 3.7s

MySQL: 26.2s

DBMS “X”: 28 1s


DBMS “X”: 28.1s

Memory Hierarchyy yX100 query engine

CPUhcache

ColumnBM (buffer manager)

RAM


(raid)Disk(s)

Optimal Vector size?pX100 query engine

All vectors together should fit the CPU cache

Depends on the query

CPUhcache


RAM


(raid)Disk(s)

Varying the Vector sizey g

Less and less operator.next() and

primitive function calls (“interpretation overhead”)( interpretation overhead )


Varying the Vector sizey g

Vectors start to exceed theCPU cache, causing

additional memory trafficadditional memory traffic


MonetDB/MIL materializes columnso et / ate a es co u sMonetDB/MILX100 query engine

CPUhcache


RAM


(raid)Disk(s)

Why is X100 so fast?yReduced interpretation overhead

100x less Function Calls100x less Function Calls

Good CPU cache useHigh locality in the primitivesg y pCache-conscious data placement

No Tuple NavigationPrimitives only see arrays

Vectorization allows algorithmic optimizationCPU and compiler-friendly function bodies

Multiple work units, loop-pipelining, SIMD…


Feeding the Beastg

X100 uses < 100 cycles per tuple for TPC-H Q1y p p Q

Q1 has ~30 bytes of used columns per tuple

3GHz CPU core3GHz CPU core

eats 900MB/s

No problem for RAM

But disk-based data?


Using Disk in the 21th centuryUsing Disk in the 21th century

Poor random disk access needs toPoor random disk access needs to be compensated with more and

more disk heads.(tens, hundreds… thousands!)

You’re better off with scanning!


Using Disk in Data Warehousingg gGoals:1 S b d di k * l * (f ll ti l )1. Scan-based disk access *only* (full or partial scans)2. Minimize bandwidth3. Benefit, not suffer from concurrency

Database strategies:replicate tables in multiple orders (goal 1)p p (g )clustering join-tables in foreign-key order (goal 1)keep dimension tables in RAM (goal 1&2)scan-optimized indices (goal 1&2)scan-optimized indices (goal 1&2)use a column-store (goal 2)increase disk bandwidth with lightweight compression (goal 2)

di di k ( l 2&3)


coordinate concurrent disk access (goals 2&3)

Feeding the Beast (1)g ( )

Two ideas pursued:p

Lightweight compression to enhance disk bandwidth

Maximizing diskMaximizing disk

scan sharing in

concurrent queries.


Compression to improve I/O bandwidthCompression to improve I/O bandwidth

0.9GB/s query consumption/ q y p

1/3 CPU for decompression 1.8GB/s needed

new lightweight compression schemesMonetDB/X100 overview 46

new lightweight compression schemes

Key Ingredientsy g

Compress relations on a per-column basisEasy to exploit redundancy

Keep data compressed in main-memoryKeep data compressed in main memoryMore data can be buffered

Decompress vector at a timeDecompress vector at a time Minimize main-memory overhead

Use light-weight, CPU-efficient algorithmsExploit processing power of modern CPUs


Results


TPC-H 100 GBDecent improvement

with fast disksTPC-H

query

MonetDB/X100 on 1 CPU DB2 – 8 CPUs

Compression ratio

4 disks 12 disks 142 disks

Speedup Time (s) Speedup Time (s) Time (s)Speedup Time (s) Speedup Time (s) Time (s)

01 4.33 4.41 69.6 1.29 50.9 111.9

03 3.04 3.10 11.3 1.48 6.0 15.1

04 8.15 7.58 2.4 2.67 1.8 12.5

05 3.81 3.55 15.3 1.06 16.2 84.0

06 4.39 4.50 10.7 2.35 4.6 17.1

07 1.71 1.66 72.0 0.84 40.8 86.5


Linear speedup with slow disks

Competes with DB2 using ~10x less resources

Feeding the Beast (2)g ( )

Two ideas pursued:p

Lightweight compression to enhance disk bandwidth

Maximizing diskMaximizing disk

scan sharing in

concurrent queries.


Concurrent scans

Multiple queries p qscanning the same table

Different start timesDifferent start times

Different scan ranges

Compete for disk accessCompete for disk access and buffer space

C SFCFS request scheduling: poor latency


“Normal” scans in real life


Shared scans

Observation: queries qoften do not need data in a sequential orderq

Idea: make queries “share” the scanningshare the scanning process

Two existing types:Two existing types:Attach

Elevator


Elevator

“Attach” in real life


“Elevator” in real life


Existing shared scansg

Benefits

Less I/O operations

Better data reuseBetter data reuse

Problems

Sharing decisions static (when a query starts)

Misses opportunities in a dynamic environment

Not sensitive to different query types


“Relevance” scans

Core ideas

Dynamically adapt to the current situation

Allow fully arbitrary data orderAllow fully arbitrary data order

Goals:

Maximize data sharing

Optimize latency and throughput

Work for different types of queries


“Relevance” in real life


Results


ConclusionPresented MonetDB/X100

A d t b k l d l d t CWIA new database kernel developed at CWI

Uses block-oriented iterator model (vectorization)

works amazingly wellworks amazingly well

So fast must reduce hunger for hard disksColumn storage specialized in sequential accessg p q

+ Lightweight compression schemes (give ~~ factor 3)

+ Cooperative Bandwidth Sharing (gives ~~ factor 2)

Good performance resultsFastest raw 100GB TPC-H performance around (** not fair)


Beats IR systems on Terabyte TREC

LiteratureMonetDB

P.A. Boncz. MIL Primitives for Querying a Fragmented World. VLDB Journal, 1999.

P.A. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.d. thesis, 2002.

MonetDB/X100P.A. Boncz, M.Zukowski, N.Nes. MonetDB/X100: Hyper-pipelining Query Execution, CIDR 2005.

M Zukowski S Heman N Nes P A Boncz Super-Scalar RAM-CPU Cache Compression ICDE 2006M. Zukowski, S. Heman, N. Nes, P.A. Boncz. Super Scalar RAM CPU Cache Compression. ICDE 2006.

M. Zukowski, S. Heman, N.Nes, P.A. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS, VLDB 2007.

OtherS. Padmanabhan, T. Malkemus, R. Agarwal, A. Jhingran. Block oriented processing of relational database operations in modern computer architectures. ICDE 2001.

J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. ICDE 1998.

M. Stonebraker, et al. C-Store: A Column Oriented DBMS. VLDB 2005.

D.J. Abadi, S.R. Madden, M.C. Ferreira. Integrating Compression and Execution in Column-Oriented DatabaseD.J. Abadi, S.R. Madden, M.C. Ferreira. Integrating Compression and Execution in Column Oriented Database Systems. SIGMOD 2006.

S. Harizopoulos, V. Liang, D. Abadi, S. Madden. Performance Tradeoffs in Read-Optimized Databases. VLDB 2006.

D.J. Abadi, D.S. Myers, D.J. DeWitt, S.R. Madden. Materialization Strategies in a Column-Oriented DBMS. ICDE 2007.

C A L B Bh tt h j T M lk S P d bh K W I i b ff l lit f lti l


C.A. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, K. Wong. Increasing buffer-locality for multiple relational table scans through grouping and throttling. ICDE 2007.

The End

Thank you!

Questions?


monetdb intro ppt

Documents