class 5 column stores 2daslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2017. 9....
TRANSCRIPT
-
column stores 2.0prof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 5
http://daslab.seas.harvard.edu/classes/cs165/
-
CS165, Fall 2015 Stratos Idreos /312
what just happened?where is my data?
email, cloud, social media, …
can we design systems that let us know what is going on?
worth thinking about 2.0
-
CS165, Fall 2015 Stratos Idreos /313
cool papers 2.0
The Case for RodentStore: An Adaptive, Declarative Storage SystemPhilippe Cudré-Mauroux, Eugene Wu, Samuel Madden In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2009
Abstraction Without Regret in Database Systems Building: a ManifestoChristoph KochIEEE Data Eng. Bull. 37(1): 70-79 (2014)
declarative processing and design
-
CS165, Fall 2015 Stratos Idreos /314
design doc (optional)think, design, create 1-2 page PDF doc and ask for feedback
by email or ideally during office hours or sections
do not worry about perfection: fail fast wrong ideas ok if you eventually find out they are wrong :) (holds for midterms as well)
-
CS165, Fall 2015 Stratos Idreos /31
am I keeping up ok?
5
1) follow concepts in class 2) keep up with project timeline & readings
if not, then OH & sections for more help
-
CS165, Fall 2015 Stratos Idreos /31
feedback on starter code, api & tests
6
-
CS165, Fall 2015 Stratos Idreos /317
registers
on chip cache2x
on board cache10x
memory100x
disk100Kx
Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Inovations award
Pluto2 years
New York1.5 hours
this building10 min
this room1 min
my head~0
-
CS165, Fall 2015 Stratos Idreos /31
the way we store data defines the possible (efficient) access methods
8
-
CS165, Fall 2015 Stratos Idreos /319
employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)
(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)
…
data storage blocks < pages < files
file
-
CS165, Fall 2015 Stratos Idreos /3110
free_offset, N, offset1-length1, offset2-lenght2,…
free space
slotted page
scan null
update var length
…
-
CS165, Fall 2015 Stratos Idreos /3111
row-store column-storeABCD A B C D
-
CS165, Fall 2015 Stratos Idreos /3112
a1 a2 a3 a4 a5 a6
b1 b2 b3 b4 b5 b6
c1 c2 c3 c4 c5 c6
virtual ids/ positional alignment
positional lookups/joinsA(i) = A + i * width(A)
tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6
A B C
fixed-width + dense
columns do not need to have the
same width
-
CS165, Fall 2015 Stratos Idreos /31
todaycolumn-stores 2.0
13
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3114
select min(C) from R where A min
sequential access patterns, max 1 if
-
CS165, Fall 2015 Stratos Idreos /3115
working over fixed width & dense columns
for (i=0;iv
res[j++]=i
no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values
for (i=0;i
-
CS165, Fall 2015 Stratos Idreos /3116
B
-
CS165, Fall 2015 Stratos Idreos /3117
B
-
CS165, Fall 2015 Stratos Idreos /3118
disk memoryA B C D
A
ABCrow-store
engineearly tuple
reconstruction/materialization
option1
option2
column-store
engine
-
CS165, Fall 2015 Stratos Idreos /3119
possible data flow patternstuple at a time block/vector at a time column at a time
B
-
CS165, Fall 2015 Stratos Idreos /3120
select min(C) from R where A
-
CS165, Fall 2015 Stratos Idreos /3121
CEO/Co-founder of Vectorwise (now Actian) now: “changing the world, one terabyte at a time” co-founder of Snowflake
the beer analogy
Marcin Zukowski, PhD
-
CS165, Fall 2015 Stratos Idreos /3122
registers
on chip cache
on board cache
memory
disk
CPU
chea
per
fast
erop1 op2
query plan
A B
A Bop3
A
size of vector
-
CS165, Fall 2015 Stratos Idreos /3123
tuple at a time - good for minimizing memory footprint bulk processing - good minimizing functional overhead
vectorized processing - somewhere in the between
-
CS165, Fall 2015 Stratos Idreos /3124
history/timeline
~1960s
tuple at a time
1980s: ideas about block processing
2005: vectorwise
tuple at a time tuple at a time
>2010: industry adoption
-
CS165, Fall 2015 Stratos Idreos /31
project: column-at-a-time
bonus: vectorized processing
25
-
CS165, Fall 2015 Stratos Idreos /3126
update row7=(A=a,B=b,C=c,D=d)
row-store column-storeABCD A B C D
vs
which is better to update and why? how much does it cost to update a single row? (think about pages, data movement) how to update in column-stores? (query plan + algorithms)
-
CS165, Fall 2015 Stratos Idreos /31
A
27
A B C D
B C D
base data pending updates
updatequery
periodically
-
CS165, Fall 2015 Stratos Idreos /3128
A B C D
columns copy rows copy
fractured mirrors
ABCD
optimizer
query
-
CS165, Fall 2015 Stratos Idreos /3129
reading
The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden
IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)
-
CS165, Fall 2015 Stratos Idreos /3130
research papers
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz, Stefan Manegold, Martin Kersten In Proc. of the Very Large Databases Conference (VLDB), 1999
MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski, Niels NesIn Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2005Materialization Strategies in a Column-Oriented DBMSDaniel Abadi, Daniel Myers, David DeWitt, Samuel Madden In Proc. of the Inter. Conference on Data Engineering (ICDE), 2007
Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009
-
DATA SYSTEMSprof. Stratos Idreos
class 5
column-stores 2.0