authors brian f. cooper, raghu ramakrishnan, utkarsh srivastava, adam silberstein, philip bohannon,...

40
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Presenters Daniel Burgener, Gautam Bhawsar

Upload: vanessa-chase

Post on 13-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Authors

Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel

Weaver, Ramana Yerneni

PresentersDaniel Burgener, Gautam Bhawsar

Page 2: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 3: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 4: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

IntroductionPNUTS is

Massively parallel

Geographically distributed database system

Designed Yahoo!

Used by their web application

Shared between several applications

Page 5: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 6: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Background

* taken from http://msdn.microsoft.com/en-us/library/ms978603.aspx

Pub/Sub ModelSending Applications (Publishers)Receiving Applications (Subscribes)Communicate through asynchronous

messaging paradigm

Page 7: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 8: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

RequirementsPNUTS is designed to meet the following

requirements:

Scalability

Response Time and Geographic Scope

High Availability & Fault Tolerance

Relaxed Consistency Grantees

Page 9: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 10: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

PNUTS Overview & FunctionalityData Model & Features

Fault Tolerance

Pub-Sub Message System

Record-level Mastering

Hosting

Page 11: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

PNUTS Overview & Functionality (Cont’d)Functionality

Data & Query Model

Consistency Model

Page 12: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Data & Query ModelSimplified relational data model

Organizes data into tables of records with attributes

Allows arbitrary structure inside a record – “blob”

Schema are flexible

New attribute is added without halting query or update

activity

Allow to have empty attribute in the record

Query language

Supports selection and projection in single table

Updates & deletes with primary key only

Page 13: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Consistency ModelHide the complexity of replication

Considered between general serializability & eventual consistency

Per-record timeline consistency“All replica of given record apply all updates to the

record in the same order”

Page 14: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Consistency Model (Cont’d)Support range of API calls with different

levels of consistency

Read-any

Read-critical(required_version)

Read-latest

Write

Test-and-set-write(required_version)

Page 15: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 16: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

System Arch. & App.Data tables are horizontally partitioned into

groups of records called tablets

Page 17: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

System Arch. & App. (Cont’d)Data Storage & Retrieval

Replication & Consistency

PNUTS Applications

Page 18: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Data Storage & RetrievalOrdered table

Primary-key space of a table is divided into

intervals

Each interval corresponds to one tablet

The router stores interval mapping

For a given PMK, binary search is used to find

the tablet

Page 19: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Data Storage & Retrieval (Cont’d)Hash-organized table

n-bit hash function H(), 0 ≤ H() < 2n [0... 2n) is divided into intervals Each interval corresponds to single tabletTo map a key to a tablet,

1. Hash the key2. Search set of interval using binary search

Page 20: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Replication & ConsistencyNo redo log

The system uses asynchronous replication

To ensure low-latency updates

Yahoo! Message Broker (YMB)

Used for replication & logging because:

1. Multiple steps are applied before committed to DB

2. YMB is designed for wide-area replication

Page 21: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Replication & Consistency (Cont’d)Consistency via YMB & mastership

Per-record timeline consistency

One copy of a record considered as master

Direct all updates to the master copy

This is called Record-level mechanism

Mastership is assigned on a record-by-record basis

Different master records in the same table can be in

different clusters

All updates are propagated to non-master replicas by

publishing them to YMB and delivered as commit order

Page 22: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Replication & Consistency (Cont’d)Recovery from failure (3 Steps)

1. the tablet controller requests a copy from the source tablet3. the source tablet is copied to the destination region2. “checkpoint message” is published to YMB

Page 23: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

PNUTS ApplicationsUser Database

Social Applications

Content Meta-Data

Listings Management

Session Data

Page 24: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 25: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Experimental Results3 regions PNUTS cluster

2 on the west coast and 1 on the east coast

Storage engine for hash table “Yahoo! propriety disk-based hashtable”

Storage engine for ordered tablesMySQL using InnoDB

Written primarily in C++ Some components written in PHP & Perl

Page 26: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Experimental Results (Cont’d)Experimental parameters:

The coming experiments showThe impact of several factors on the average

latency for request

Page 27: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Varying Load

Page 28: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Varying Read/Write Ratio

Page 29: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Varying Skew

Page 30: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Varying Number of Storage Units

Page 31: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Varying Size of Range Scan

Page 32: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 33: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Comparison to CompetitorsGoogle BigTable

Geographic replication

Secondary indexes

Materialized views

Create multiple tables

Hash organized tables

Page 34: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Comparison to CompetitorsAmazon Dynamo

Eventual consistency too weak

No support for ordered tables

Page 35: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Comparison to CompetitorsSharding

No automated data migration

No shard splitting

Page 36: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Comparison to CompetitorsDFS

Hard to scale

Less rich database functionality

Page 37: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Road MapIntroduction

Background

Requirements

PNUTS Overview & Functionality

System Architecture & Applications

Experimental Results

Comparison to Competitors

Conclusion & Future Work

Page 38: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

ConclusionPNUTS is

Massively parallel

Geographically distributed database system

Designed Yahoo! to be used by their web

application

Yahoo!s Hosted Data Serving Platform

Architecture of PNUTS is based on record-level

Consistency model

Delivers the data management as hosted service

Page 39: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

Future WorkImproving query functionality

Enforce Constraints such as referential integrity

Complex ad hoc queries such as join & group-by

Query optimization techniques

Provide better technique than simple incremental

scanning

Add more API calls in consistency model:

Bundled Update

Relaxed Consistency

Page 40: Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana

ThankYou