cloud computing cloud data serving systems keke chen

37
Cloud Computing Cloud Data Serving Systems Keke Chen

Upload: hilary-turner

Post on 30-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing Cloud Data Serving Systems Keke Chen

Cloud Computing

Cloud Data Serving Systems

Keke Chen

Page 2: Cloud Computing Cloud Data Serving Systems Keke Chen

Outline Frequent operations: append, squential

read. Great scalability GFS, HDFS

Random Read/Write. Great scalability BigTable (Hbase) DynamoDB Cassandra PNUTS

Page 3: Cloud Computing Cloud Data Serving Systems Keke Chen

BigTable Structured data

May generate from MapReduce processing

Support random R/W

Page 4: Cloud Computing Cloud Data Serving Systems Keke Chen

Data model Objects are arrays of bytes Objects are indexed by 3 dimensions

Row Column and column family Timestamp (versions)

<picture>

“a big sparse table”

Page 5: Cloud Computing Cloud Data Serving Systems Keke Chen

Data model Row

Row key: strings Row range (groups of rows) – tablet (100-200M)

Column Grouped into “column families”

Column key “Family: qualifier” Data physically organized by column-family – a

embedded substructure of a record Access controls are on column-family Example:

“anchor” family, qualifier is the anchor link, and object is the anchortext

Timestamps Representing versions of the same object

Page 6: Cloud Computing Cloud Data Serving Systems Keke Chen

Operations API

Schema Create, delete tables/column families Access control

Data Write, delete, lookup values Iterate a subset

Single-row transaction BigTable data can be input/output of

mapreduce

Page 7: Cloud Computing Cloud Data Serving Systems Keke Chen

Physical organization Built on top of GFS A tablet contains multiple SSTables SSTable: a block file with sparse index

Data (key, value) are stored in blocks Block index

sorted keys, Keys are partitioned to ranges First Key of the range ->block(s)

Data Possibly organized in a nested way

Use Chubby distributed lock service

Page 8: Cloud Computing Cloud Data Serving Systems Keke Chen

Basic technical points Use index to speed up random R/W

Tablet servers maintain the indices

Use locking service to solve the R/W conflicts guarantees consistency

Page 9: Cloud Computing Cloud Data Serving Systems Keke Chen

How to get tablet location? Through Chubby service

Metadata 128M, 1K per item 2^17 entriesTwo levels 2^34 entries(tablets)

Chubby clientCaches locations

Page 10: Cloud Computing Cloud Data Serving Systems Keke Chen

How it works Chubby service handles tablet locking Master maintains consistency

Start up by reading Metadata from Chubby Check lock status

insert/delete/split/merge tablets Tablet operations

Each tablet contains multiple SSTables (info stored in Metadata)

Use log to maintain consistency

Page 11: Cloud Computing Cloud Data Serving Systems Keke Chen

HBASE Open source version to BigTable Built on top of HDFS

Page 12: Cloud Computing Cloud Data Serving Systems Keke Chen

Dynamo (DynamoDB) Developed by Amazon Fast R/W speed, satisfying strong service

level agreement Using a fully distributed architecture Store only key-value pairs

Page 13: Cloud Computing Cloud Data Serving Systems Keke Chen

Service Level Agreements (SLA)

Application can deliver its functionality in abounded time: Every dependency in the platform needs to deliver its functionality with even tighter bounds.

Example: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second.

Service-oriented architecture of Amazon’s platform

Page 14: Cloud Computing Cloud Data Serving Systems Keke Chen

Design Consideration Sacrifice strong consistency for

availability Conflict resolution is executed during

read instead of write, i.e. “always writeable”.

Other principles: Incremental scalability. Symmetry. Decentralization. Heterogeneity.

Page 15: Cloud Computing Cloud Data Serving Systems Keke Chen

Partition Algorithm: distributed hash table

Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”.

”Virtual Nodes”: Each node can be responsible for more than one virtual node.

Page 16: Cloud Computing Cloud Data Serving Systems Keke Chen

Advantages of using virtual nodes

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.

The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

Page 17: Cloud Computing Cloud Data Serving Systems Keke Chen

Replication

Each data item is replicated at N hosts.

“preference list”: The list of nodes that is responsible for storing a particular key.

Page 18: Cloud Computing Cloud Data Serving Systems Keke Chen

Scalability Easy to scale up/down

Membership join/leaving Redundant data stores

Page 19: Cloud Computing Cloud Data Serving Systems Keke Chen

Evaluation

Page 20: Cloud Computing Cloud Data Serving Systems Keke Chen

Cassandra Use BigTable’s data model

“Rows” and “column family”

Use Dynamo’s architecture

Impressive measures MySQL > 50 GB Data

Writes Average : ~300 msReads Average : ~350 ms

Cassandra > 50 GB DataWrites Average : 0.12 msReads Average : 15 ms

Page 21: Cloud Computing Cloud Data Serving Systems Keke Chen

Table is a multi dimensional map indexed by key (row key).

Columns are grouped into Column Families.

2 Types of Column Families Simple Super (nested Column Families)

Each Column has Name Value Timestamp

Data Model

Page 22: Cloud Computing Cloud Data Serving Systems Keke Chen

* Figure taken from Eben Hewitt’s (author of Oreilly’s Cassandra book) slides.

Data Model

Page 23: Cloud Computing Cloud Data Serving Systems Keke Chen

PartitioningHow data is partitioned across nodes

ReplicationHow data is duplicated across nodes

Cluster MembershipHow nodes are added, deleted to the cluster

Architecture

Page 24: Cloud Computing Cloud Data Serving Systems Keke Chen

Nodes are logically structured in Ring Topology (DHT).

Hashed value of key associated with data partition is used to assign it to a node in the ring.

Hashing rounds off after certain value to support ring structure.

Lightly loaded nodes moves position to alleviate highly loaded nodes.

Partitioning

Page 25: Cloud Computing Cloud Data Serving Systems Keke Chen

Each data item is replicated at N (replication factor) nodes.

Different Replication Policies Rack Unaware – replicate data at N-1 successive

nodes after its coordinator Rack Aware – uses ‘Zookeeper’ to choose a leader

which tells nodes the range they are replicas for Datacenter Aware – similar to Rack Aware but

leader is chosen at Datacenter level instead of Rack level.

Replication

Page 26: Cloud Computing Cloud Data Serving Systems Keke Chen

Network Communication protocols inspired for real life rumour spreading.

Periodic, Pairwise, inter-node communication. Low frequency communication ensures low

cost. Random selection of peers. Example – Node A wish to search for pattern

in data Round 1 – Node A searches locally and then gossips

with node B. Round 2 – Node A,B gossips with C and D. Round 3 – Nodes A,B,C and D gossips with 4 other

nodes …… Round by round doubling makes protocol very

robust.

Gossip protocol

Page 27: Cloud Computing Cloud Data Serving Systems Keke Chen

Variety of Gossip Protocols exists

Dissemination protocol Event Dissemination: multicasts events via gossip. high

latency might cause network strain. Background data dissemination: continuous gossip

about information regarding participating nodes

Anti Entropy protocol Used to repair replicated data by comparing and

reconciling differences. This type of protocol is used in Cassandra to repair data in replications.

Gossip protocol

Page 28: Cloud Computing Cloud Data Serving Systems Keke Chen

Uses Scuttleback (a Gossip protocol) to manage nodes.

Uses gossip for node membership and to transmit system control state.

Node Fail state is given by variable ‘phi’ which tells how likely a node might fail (suspicion level) instead of simple binary value (up/down).

This type of system is known as Accrual Failure Detector.

Cluster management

Page 29: Cloud Computing Cloud Data Serving Systems Keke Chen

Two ways to add new node New node gets assigned a random token which

gives its position in the ring. It gossips its location to rest of the ring

New node reads its config file to contact it initial contact points.

New nodes are added manually by administrator via CLI or Web interface provided by Cassandra.

Scaling in Cassandra is designed to be easy.

Lightly loaded nodes can move in the ring to alleviate heavily loaded nodes.

Bootstrapping and Scaling

Page 30: Cloud Computing Cloud Data Serving Systems Keke Chen

Relies on local file system for data persistency.

Write operations happens in 2 steps Write to commit log in local disk of the node Update in-memory data structure. Why 2 steps or any preference to order or

execution?

Read operation Looks up in-memory ds first before looking up

files on disk. Uses Bloom Filter (summarization of keys in file

store in memory) to avoid looking up files that do not contain the key.

Local persistent

Page 31: Cloud Computing Cloud Data Serving Systems Keke Chen

Query

Closest replica

Cassandra Cluster

Replica A

Result

Replica B Replica C

Digest QueryDigest Response Digest Response

Result

Client

Read repair if digests differRead repair if digests differ

* Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.

Read operation

Page 32: Cloud Computing Cloud Data Serving Systems Keke Chen

Cassandra developed to address this problem.

50+TB of user messages data in 150 node cluster on which Cassandra is tested.

Search user index of all messages in 2 ways. Term search : search by a key word Interactions search : search by a user id

Latency Stat

Search Interactions

Term Search

Min 7.69 ms 7.78 ms

Median 15.69 ms 18.27 ms

Max 26.13 ms 44.41 ms

Facebook inbox search

Page 33: Cloud Computing Cloud Data Serving Systems Keke Chen

Comparison on cloud data serving systems Paper “Benchmarking Cloud Serving

Systems with YCSB”

Page 34: Cloud Computing Cloud Data Serving Systems Keke Chen

Update heavy workload

Page 35: Cloud Computing Cloud Data Serving Systems Keke Chen

Read heavy

Page 36: Cloud Computing Cloud Data Serving Systems Keke Chen

Short scan

Page 37: Cloud Computing Cloud Data Serving Systems Keke Chen

Cluster size