135_pgcon 2009 - aster v6

Upload: sbatati

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 135_PGCon 2009 - Aster v6

    1/18

    Building PetaByte Warehouses withUnmodified PostgreSQL

    Emmanuel CecchetMember of Research Staff

    May 21st, 2009

    Topics

    Introduction to frontline data warehouses

    PetaByte warehouse design with PostgreSQL

    Aster contributions to PostgreSQL

    Q&A

    2 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    2/18

    Enterprise Data Warehouse Under Stress

    EnterpriseData Warehouse

    EnterpriseData Warehouse

    PGCon 2009, Ottawa 2009 Aster Data Systems3

    Offloading The Enterprise Data Warehouse

    Enterprise DataWarehouse

    Enterprise DataWarehouse

    Frontline DataWarehouse

    Frontline DataWarehouse

    Archival DataWarehouse

    Archival DataWarehouse

    Petabytes detailed data

    Low cost/TB Flexible compression On-line access Aging out to off-line

    Petabytes source data

    24 x7 Availability TB/Day Load capacity In-database transforms Rapid data access

    PGCon 2009, Ottawa 2009 Aster Data Systems4

  • 7/28/2019 135_PGCon 2009 - Aster v6

    3/18

    !

    "

    #

    $

    " %

    & ! ' ( # '

    '

    "

    ! & ! ' )

    0 "

    1

    "

    !

    2

    &

    %

    3

    "

    ( & !

    2

    )

    % 4

    0

    3

    !

    "

    & !

    "

    5

    6

    7 # ' # )

    "

    ) ) )

    " % %

    &

    8

    ) # '

    "

    9 # ) # ' #

    $

    "

    ) )

    " % %

    &

    8

    ) # '

    "

    6

    B

    C

    D

    6

    C

    F

    )

    $

    " %

    # ) G

    0 " H

    "

    ( !

    0

    P

    "

    ) )

    $

    "

    Requirements for Frontline Data Warehouses

    5 PGCon 2009, Ottawa 2009 Aster Data Systems

    Who is Aster Data Systems?

    Aster nCluster is a software-only RDBMS for large-scalefrontline data warehousing

    High performance Always Parallel MPP architecture

    High availability Always On on-line operations

    High value analytics In-Database MapReduce

    Low cost Petabytes on commodity HW

    6 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    4/18

    ! "

    # $ $ $ % & &

    "

    !

    H "

    !

    '

    ( ) 0 1 2 3 4 5 3 0

    6

    7 ( 8 )9

    '

    ( ) 0 1

    @ A B

    C

    D

    0 1

    '

    ( ) 0 1 2 3 4 5 3 0

    6

    7 ( 8 )9

    '

    ( ) 0 1

    @ A B

    C

    D

    0 1

    E G

    7 7 ) 1 H

    I B P

    7 Q

    I

    3 C 9

    D

    Q 1

    E G

    7 7 ) 1 H

    I B P

    7 Q

    I

    3 C 9

    D

    Q 1

    MySpace Frontline Data Warehouse

    R S

    T U U T

    V W X Y ` W

    a b c d e

    f g

    S h

    ` i p q r

    s t u

    v

    u x t x

    y

    t

    R S

    T U U T

    V W X Y ` W

    a b c d e

    f g

    S h

    ` i p q r

    s t u

    v

    u x t x

    y

    t

    "

    !

    " ! !

    j k l m n

    o

    7 PGCon 2009, Ottawa 2009 Aster Data Systems

    Topics

    Introduction to frontline data warehouses

    PetaByte warehouse design with PostgreSQL

    Aster contributions to PostgreSQL

    Q&A

    8 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    5/18

    Petabyte Datawarehouse Design

    PostgreSQL as a building blockDo not hack PostgreSQL to serve the distributed database

    Build on top of mainline PostgreSQL

    Use standard Postgres APIs

    Service Oriented ArchitectureHierarchical query planning and optimization

    Shared nothing replication treating Postgres and Linux as a

    service

    Compression at the OS level transparently to PostgreSQLIn-database Map-Reduce out-of-process

    9 PGCon 2009, Ottawa 2009 Aster Data Systems

    Aster nCluster Database

    10 PGCon 2009, Ottawa 2009 Aster Data Systems

    Queries/AnswersQueries/Answers

    Queen Server Group

    QueriesQueries

    DataData

    Worker Server Group

    Loader/Exporter Server Group

    Aster nCluster Database

    Queen Node u

    y y y

    v

    t u x

    v

    t u

    y y

    u

    t

    u u

    s t t

    u

    v

    y

    x

    y

    Worker Node

    ! " # $ # % & # ' " # 0 " # 1

    # 0 2 1

    " # 3 0 1 #

    "

    5

    " # 0 1 #

    " $ $ 1

    1 1 $

    # 1

    6

    7

    1 $ "

    7

    # 8 " #

    9

    #

    Loader Node@

    A 6 $ 1

    1 A 1 B C D " # E 2

    9 F

    $ #

    G

    1 #

    "

    7

    $ 1

    1

    " 1 0 0 # " 0 # 1

    H I

    P

    #

    Q

    7

    H I

    $ $ 1

    1 1 # " 8 " #

    9

    #

  • 7/28/2019 135_PGCon 2009 - Aster v6

    6/18

    Query Processing: How It Works

    A: Users send queries to the Queen (via ACT, ODBC/JDBC, BI tools, etc.)

    B: The Queen parses the query, creates an optimal distributed plan, sends subqueriesto the Workers, and supervises the processing

    C: Workers execute their subqueries (locally or via distributed communication)

    D: The Queen aggregates Worker results and returns the final result to the user

    g

    7 8 0 8

    S

    g

    V V

    U

    b

    p

    S

    p

    S

    7 8 0 8

    7 8 0 8

    7 8 0 8.

    2 !

    D

    Q

    P

    "

    #

    $

    11 PGCon 2009, Ottawa 2009 Aster Data Systems

    Table Compression Architecture

    How It Works%

    & ' & ( ) 0 1 2 3 4 5 ) ) 5

    6 7 6

    5 0 1 2 3 4 5 ) ) 5

    6 8

    5 1 @ 1 A ( 0 &

    6

    & ' &

    8

    & ) 5

    B

    ' 1 4 5

    6

    C

    6

    ( D D 5 4 5 C ' ' &

    8

    5 ) 3 & 0 5 )

    6

    5 3 5 C

    6

    ( C A 1 C 0 1 2 3 4 5 ) ) ( 1 C 5 E 5

    Architecture Benefits:F

    ( A G 0 1 2 3 4 5 ) ) ( 1 C 4 & ' ( 1 ) H

    8

    ( A A 5 4

    8

    1 0

    I

    ) ( P 5 ) Q ( 5

    6

    2 1 4 5 0 1 ) ' ) & E ( C A )

    %

    & ' &

    8

    & ) 5 ' 4 & C ) 3 & 4 5 C ' H 5 & ) 5 1 D D S ' S 4 5 T 1 ) ' A 4 5 ) S 3 A 4 &

    6

    5 )

    U

    1 C 0 S 4 4 5 C 0 Q H G ( A G V 3 5 4 D 1 4 2 & C 0 5 2 S ' ( V ' &

    8

    5 0 1 2 3 4 5 ) ) ( 1 C

    T 5 4 D 1 4 2 & C 0 5

    Y

    5 @ V 4 1 @ ` a S 5 4 ( 5 )

    6

    1 C b ' C 5 5

    6

    D S ' &

    8

    5

    6

    5 0 1 2 3 4 5 ) ) ( 1 C

    Logical Database (eg. Schema)

    Physical Tablespacesc

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    d y

    c

    d e f g h i i

    d y

    c

    d e f g h i i

    d y

    c

    d e f g h i i

    d y

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    d

    c

    d e f g h i i h

    r

    d

    c

    d e f g h i i h

    r

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    Query

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    q

    h

    r s t

    e

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    c

    d e f g h i i

    s v w

    12 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    7/18

    Table Compression Enables Powerful Archival

    Compressed - High

    Compressed - Medium

    Compressed - Low

    Not Compressed

    Older data accessed less frequently

    Compress to save space and cost

    Oldest data is compressed the most, recent is

    compressed the leastCompressed tables are fully available for

    queries (true online archival)

    13 PGCon 2009, Ottawa 2009 Aster Data Systems

    What is MapReduce and Why Should I Care?

    What is MapReduce?

    Popularized by Google

    http://labs.google.com/papers/mapreduce.html

    Processes data in parallel across distributed cluster

    Why is MapReduce significant?

    Empowers ordinary developers

    Write application logic, not debug cluster communication code

    Why is In-Database MapReduce significant?

    Unites MapReduce with SQL: power invoked from SQL

    Develop SQL/MR functions with common languages

    14 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    8/18

    Aster In-Database MapReduce

    Slide 15

    Aster nClusterAster nCluster

    Data Store Engine

    In-Database MapReduce

    SQL/MR Functions

    PGCon 2009, Ottawa 2009 Aster Data Systems15

    Users, analysts,applications

    In-Database MapReduce

    Extensible framework (MapReduce + SQL)

    Flexible: Map-Reduce expressiveness, languages, polymorphism

    Performance: Massively parallel, computational push-down

    Availability: Fault isolation, resource management

    Out-of-process executables

    Does not use PL/* for custom code execution

    Can execute Map and Reduce functions in any language that has a

    runtime on Linux (e.g. Java, Python, C#, C/C++, Ruby, Perl, R,

    etc)

    Standard PostgreSQL APIs to send/receive data to executables

    Fault isolation, security and resource management for arbitrary

    user code

    PGCon 2009, Ottawa 2009 Aster Data Systems16

  • 7/28/2019 135_PGCon 2009 - Aster v6

    9/18

    Live QueriesLive Queries

    Always On: Minimize Planned Downtime

    Backup &Restore

    Backup &Restore

    DataBackup

    DataBackup

    Add CapacityAdd CapacityRebalance DataRebalance Data

    Load & ExportLoad & Export

    17 PGCon 2009, Ottawa 2009 Aster Data Systems

    Precision Scaling

    When more CPU/memory/capacity are needed, new nodes can be added for scale-out.

    Precision Scaling uses standard PostgreSQL APIs to migrate vWorker partitions to

    new nodes either for load balancing (more compute power) or capacity expansion

    Example: Assume Workers 1/2/3 are 100% CPU-bottlenecked. Incorporation adds anew Worker4 node and migrates over vWorker partitions D/H/L. As a result of load-balancing, CPU-utilization drops to 75% per node, eliminating hotspots.

    .

    !

    "

    #

    $

    &

    '

    $

    $

    18 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    10/18

    Replication

    ! &

    H

    #

    % "

    %

    # ! #

    4

    " %

    &

    "

    ! )

    0 "

    )

    %

    "

    '

    H

    #

    4

    # #

    v w v w

    t u t

    y

    y

    v

    v

    y

    v

    t u

    t u u t

    v

    u t

    u

    v

    u

    v

    t u t

    x

    v

    t u

    t

    v

    v

    y

    u

    y

    v

    v

    t

    y y

    u

    y y

    v

    t x

    v

    y

    !

    "

    # ) ! & !

    "

    & 1 " )

    %

    " # )

    ( #

    "

    !

    H

    #

    0 " $

    %

    u

    y

    u

    v v

    t

    t

    y

    v

    y

    &

    t '

    u (

    v

    y

    u

    u x

    & ) 0

    v

    y

    v

    t

    v

    x

    v

    u

    y

    19 PGCon 2009, Ottawa 2009 Aster Data Systems

    Fault Tolerance & Automatic Online Failover

    Replication Failover

    Automatic, non-disruptive, graceful performance impact

    Replication Restoration

    Delta-based (fast) and online (non-disruptive)

    1

    2 3 4 5 3 6

    1

    2 3 4 5 3 7

    1

    2 3 4 5 3 8

    9 @

    5 5 A B

    C D E F G

    H

    I P

    $

    &

    !

    "

    1

    2 3 4 5 3 R

    1

    2 3 4 5 3 S

    !

    #

    $

    "

    #

    &

    20 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    11/18

    Using Commodity Hardware

    .

    .

    .

    v

    v

    u

    v

    s

    y

    t

    x u

    v

    x

    v

    s t

    s

    y

    &

    x

    v

    s

    y y

    (

    y

    v

    x

    v

    t

    '

    0

    '

    0

    0

    ' % ' (

    y

    (

    y

    u

    v

    (

    0

    x u

    v

    !

    "

    #

    !

    #

    $

    "

    ! %

    " &

    #

    '

    (

    &

    0

    t

    u

    u

    '

    t )

    y

    v

    % s

    )

    21 PGCon 2009, Ottawa 2009 Aster Data Systems

    Scaling OnScaling On--Demand to aDemand to a PetaBytePetaByte

    Commodity Hardware

    2 TB Building Block

    Dell, HP, IBM, Intel x86

    16 GB Memory

    2.4TB of Storage

    8 Disks

    $5k to $10k Node

    0

    1 2 1

    U

    3

    b a

    ` i 4 V

    5

    More Blocks = More Power

    Massive Power Per Rack

    160 Cores

    640 GB RAM

    48 TB SAS

    22 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    12/18

    Heterogeneous Hardware Support

    Heterogeneous HW support enables customers to addcheaper/faster nodes to existing systems for investment

    protectionMix-n-match different servers as you grow (faster CPUs, more

    memory, bigger disk drives, different vendors, etc)

    ...

    ...

    V i ` i

    ....

    CPU

    Memory

    Disk

    23 PGCon 2009, Ottawa 2009 Aster Data Systems

    Topics

    Introduction to frontline data warehouses

    PetaByte warehouse design with PostgreSQL

    Aster contributions to PostgreSQL

    Q&A

    24 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    13/18

    Error Logging in COPY

    Separate good from malformed tuples

    Per tuple error information

    Errors to capture

    Type mismatch (e.g. text vs. int)

    Check constraints (e.g. int < 0)

    Malformed chars (e.g. invalid UTF-8 seq.)

    Missing / extra column

    Low-performance overhead

    Activated on-demand using environment variables

    25 PGCon 2009, Ottawa 2009 Aster Data Systems

    Error Logging in COPY

    26 PGCon 2009, Ottawa 2009 Aster Data Systems

    Detailed error context is logged along with tuple content

  • 7/28/2019 135_PGCon 2009 - Aster v6

    14/18

    Error Logging Performance

    1 million tuples COPY performance

    27 PGCon 2009, Ottawa 2009 Aster Data Systems

    Auto-partitioning in COPY

    COPY into a parent table route tuples

    directly in the child table with matching

    constraints

    COPY y2008 FROM data.txt

    PGCon 2009, Ottawa 2009 Aster Data Systems28

  • 7/28/2019 135_PGCon 2009 - Aster v6

    15/18

    Auto-partitioning in loading

    Activated on-demand

    set tuple_routing_in_copy = 0/1;

    Configurable LRU cache size

    set tuple_routing_cache_size = 3;

    COPY performance into parent table similar to

    direct COPY in child table if data is sorted

    Will leverage partitioning information in the

    future (WIP for 8.5)

    PGCon 2009, Ottawa 2009 Aster Data Systems29

    Other contributions

    Temporary tables and 2PC transactions

    [Auto-]Partitioning infrastructure

    Regression test suite

    LFI (http://lfi.sourceforge.net/)

    Fault injection at the library level or below

    out-of-memory conditions, network connection errors, interrupted

    system calls, data corruption, hardware failures, etc

    Lightning talk on Friday!

    PGCon 2009, Ottawa 2009 Aster Data Systems30

  • 7/28/2019 135_PGCon 2009 - Aster v6

    16/18

    Topics

    Introduction to Aster and data warehousing

    PetaByte warehouse design with PostgreSQL

    Aster contributions to PostgreSQL

    Q&A

    31 PGCon 2009, Ottawa 2009 Aster Data Systems

    PetaByte Warehouses with PostgreSQL

    Unmodified PostgreSQL

    Always Parallel MPP architecture

    Always On on-line operations

    In-Database MapReduce

    PetaByte on commodity Hardware

    32 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    17/18

    Learn morewww.asterdata.com

    Free TDWI report on advanced analytics:

    asterdata.com/mapreduce

    Free Gartner webcast on mission-critical DW:

    asterdata.com/gartner

    Contact us

    [email protected]

    Aster Data Systems

    33 PGCon 2009, Ottawa 2009 Aster Data Systems

    Bonus slides

    34 PGCon 2009, Ottawa 2009 Aster Data Systems

  • 7/28/2019 135_PGCon 2009 - Aster v6

    18/18

    nCluster Components

    I G

    ( 0 8 9

    S

    g

    V V

    U

    b

    p q

    a

    q

    T

    W

    T

    W !

    h

    i V ! i q "

    b

    # $ " # $ % $ " #

    &

    )

    2

    G 0

    1

    2

    3

    C

    D

    C 4 8 0

    A

    C 8 C

    D

    7 3

    3

    C

    D

    C 4 8 0

    A

    C 8 C

    D

    7 3

    2 5

    A

    7 8

    D

    2 5

    A

    7 8

    D

    nCluster

    6

    7

    8

    9

    "

    @

    35 PGCon 2009, Ottawa 2009 Aster Data Systems

    Aster Loader

    AV q

    5

    S

    q

    U

    q W B

    T

    W !

    C D E

    G H I G P Q R S T U V R

    W

    R X U

    W

    X Y S U R

    W

    U R V R

    W

    R

    W

    S R ` X

    a

    U S b U S

    a c

    S d R ` Y U e

    c

    W W

    U ` U Y

    g

    i p

    F q r s

    p

    H P t u

    W

    b U Q

    c

    R V U S X R S U d R b b U V

    W

    c w

    x

    c

    S

    g

    U S X

    c

    `

    x

    c

    S

    g

    U S `

    c

    V U X

    y

    a

    W

    U S b R S

    W W

    c

    ` ` T Q

    c

    R V U S

    b R S R U

    c

    R V u ` u U V R

    W

    R `

    W

    c w

    x

    c

    S

    g

    U S X

    G H G

    r

    Q ` U R S X Y R R e U

    c

    R V b U S

    a c

    S d R ` Y U

    a

    S

    c

    d Q

    c

    R V U S X

    W

    c

    x

    c

    S

    g

    U S X

    Scalable PartitioningC D E

    G H I G P R S

    W W

    c

    ` ` T d u X

    W

    e U X Y R R e U R ` V `

    c

    W

    d b U V U

    W

    U

    c

    R V ` T b S

    c

    Y U X X

    i p

    F q r s

    p

    H P R Y

    Q

    c

    R V U S Y

    c

    `

    W

    R ` X R R S

    W W

    c

    ` U S

    Y

    u X U X R ` R T

    c

    S

    W

    d

    W

    c

    R X X T ` V R

    W

    R `

    W

    c

    e u Y

    g

    U

    W

    X R Y

    e u Y

    g

    U

    W

    X u ` u U d R b b U V

    W

    c

    W

    `

    W

    U Y u X

    W

    U S

    G H G

    r

    P R X

    W

    `

    W

    U T U `

    W

    b R S

    W W

    c

    ` ` T V u S ` T d R X X

    w

    U X Y R U V R

    W

    R

    c

    R V X

    q 3

    U

    a

    q W

    5 U T

    W !

    C D E

    G H I G P R U V `

    c

    V U X d R

    c

    X U V R

    W

    R

    c

    S X T `

    a

    Y R `

    W

    V S

    c

    b

    c

    R V b U S

    a c

    S d R ` Y U

    i p

    F q r s

    p

    H

    a

    R `

    c

    V U

    a

    R X

    c

    W

    U S X

    W

    S U R d X Y

    c

    `

    W

    ` u U

    W

    c c

    R V

    b U S

    a c

    S d R ` Y U

    W

    y

    V d ` X Y R ` U R X

    S U Y

    c w

    U S

    c

    X

    W

    V R

    W

    R e S U

    c

    R V ` T u

    g

    U U V U S V R

    W

    R

    G H G

    r

    R

    W

    R

    c

    X X b S

    c

    W

    U Y

    W

    c

    ` b U S

    a c

    S d R ` Y U Y

    c

    ` X X

    W

    U ` Y V u S ` T `

    c

    V U

    a

    R u S U

    36 PGCon 2009, Ottawa 2009 Aster Data Systems