class 5 column stores 2 - harvard...

35
column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 5

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • column stores 2.0prof. Stratos Idreos

    HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

    class 5

    http://daslab.seas.harvard.edu/classes/cs165/

  • CS165, Fall 2016 Stratos Idreos /282

    what just happened?where is my data?

    email, cloud, social media, …

    can we design systems that let us know what is going on?

    worth thinking about…

  • CS165, Fall 2016 Stratos Idreos /283

    cool papers 2.0

    The Case for RodentStore: An Adaptive, Declarative Storage SystemPhilippe Cudré-Mauroux, Eugene Wu, Samuel Madden In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2009

    Abstraction Without Regret in Database Systems Building: a ManifestoChristoph KochIEEE Data Eng. Bull. 37(1): 70-79 (2014)

    dbTouch: Analytics at your FingertipsStratos Idreos and Erietta Liarou In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2013

  • CS165, Fall 2016 Stratos Idreos /284

    design doc think, design, create 1-2 page PDF doc and ask for feedback mandatory M1-M3, optional afterwards

    submit through Canvas

    do not worry about perfection: fail fast wrong ideas ok if you eventually find out they are wrong :) (holds for midterms as well)

  • CS165, Fall 2016 Stratos Idreos /285

    Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations Award

    disk100Kx Pluto

    2 years

    memory100x New York1.5 hours

    on board cache10x this building

    10 min

    on chip cache2x this room

    1 min

    registers my head~0

  • CS165, Fall 2016 Stratos Idreos /28

    the way we store data defines the possible (efficient) access methods

    6

  • CS165, Fall 2016 Stratos Idreos /287

    free_offset, N, offset1-length1, offset2-lenght2,…

    free space

    slotted page

    scan null

    update var length

  • CS165, Fall 2016 Stratos Idreos /288

    row-store column-storeABC D A B C D

  • CS165, Fall 2016 Stratos Idreos /289

    a1 a2 a3 a4 a5 a6

    b1 b2 b3 b4 b5 b6

    c1 c2 c3 c4 c5 c6

    virtual ids/ positional alignment

    positional lookups/joinsA(i) = A + i * width(A)

    tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

    A B C

    fixed-width + dense

    columns do not need to have the

    same width

  • CS165, Fall 2016 Stratos Idreos /28

    todaycolumn-stores 2.0

    10

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2811

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2016 Stratos Idreos /2812

    working over fixed width & dense columns

    for (i=0;iv inter1[j++]=i

    no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values

    for (i=0;i

  • CS165, Fall 2016 Stratos Idreos /2813

    B

  • CS165, Fall 2016 Stratos Idreos /2813

    B

  • CS165, Fall 2016 Stratos Idreos /2814

    B

  • CS165, Fall 2016 Stratos Idreos /2815

    disk memoryA B C D

    A

    ABCrow-store

    engineearly tuple

    reconstruction/materialization

    option1

    option2

    column-store

    engine

  • CS165, Fall 2016 Stratos Idreos /2816

    possible data flow patternstuple at a time block/vector at a time column at a time

    B

  • CS165, Fall 2016 Stratos Idreos /2817

    select min(C) from R where A

  • CS165, Fall 2016 Stratos Idreos /2818

    CEO/Co-founder of Vectorwise (now Actian) now: “changing the world, one terabyte at a time” co-founder of Snowflake

    the beer analogy

    Marcin Zukowski, PhD

  • CS165, Fall 2016 Stratos Idreos /2819

    registers

    on chip cache

    on board cache

    memory

    disk

    CPU

    chea

    per

    fast

    erop1 op2

    query plan

    A B

    A Bop3

    A

    size of vector

  • CS165, Fall 2016 Stratos Idreos /2820

    tuple at a time - good for minimizing memory footprint bulk processing - good minimizing functional overhead

    vectorized processing - somewhere in between

  • CS165, Fall 2016 Stratos Idreos /2821

    history/timeline

    ~1960s

    tuple at a time

    1980s: ideas about block processing

    2005: vectorwise

    tuple at a time tuple at a time

    >2010: industry adoption

  • CS165, Fall 2016 Stratos Idreos /28

    project: column-at-a-time

    bonus: vectorized processing

    22

  • CS165, Fall 2016 Stratos Idreos /2823

    update row7=(A=a,B=b,C=c,D=d)

    row-store column-storeABCD A B C D

    vs

    which is better to update and why? how much does it cost to update a single row? (think about pages, data movement) how to update in column-stores? (query plan + algorithms)

  • CS165, Fall 2016 Stratos Idreos /28

    A

    24

    A B C D

    B C D

    base data pending updates

    updatequery

    periodically

  • CS165, Fall 2016 Stratos Idreos /2825

    A B C D

    columns copy rows copy

    fractured mirrors

    ABCD

    optimizer

    query

    A case for fractured mirrorsRavishankar Ramamurthy, David J. DeWitt, Qi Su Very Large Databases Journal, 12(2): 89-101, 2003

  • CS165, Fall 2016 Stratos Idreos /2826

    column-stores great for analytics

    row-stores great for transactions

    still basic concepts are the same

    hybrids possible

    keep access patterns sequential

    and simple (min ifs)

    Notes to remember

  • CS165, Fall 2016 Stratos Idreos /2827

    reading

    The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

    IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)

  • CS165, Fall 2016 Stratos Idreos /2828

    research papers

    Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz, Stefan Manegold, Martin Kersten In Proc. of the Very Large Databases Conference (VLDB), 1999

    MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski, Niels NesIn Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2005Materialization Strategies in a Column-Oriented DBMSDaniel Abadi, Daniel Myers, David DeWitt, Samuel Madden In Proc. of the Inter. Conference on Data Engineering (ICDE), 2007

    Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009

  • DATA SYSTEMSprof. Stratos Idreos

    class 5

    column-stores 2.0