consistent hashing4.a. lakshman and p. malik, “cassandra: a decentralized structured storage...

20
Consistent Hashing : Dynamic Load distribution and Load balancing BY: MONIKA SHAH DATE: 23-4-2016

Upload: others

Post on 22-Sep-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Consistent Hashing : Dynamic Load distribution and Load balancing

BY: MONIKA SHAH

DATE: 23-4-2016

Page 2: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

OutlineMotivation

Key Features

Research Applications

Introduction

Background

Challenges

Application

Page 3: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Motivation case study: Web cachingKnown fact: caching for improving efficiency of data delivery over internet

Single cache – problems:◦ Cache size limit

◦ Single Point Of Failure

◦ Swamped by user requests

Solution : Hashing based Distributed Web Cache servers◦ Cache server s = Hash(URL)

◦ User get content from cache server S

Problems:◦ What if node joins/ fails ?

◦ What if different view?

Solution : Consistent hashing

Items distributed among caches

Users get itemsfrom caches

Server

Page 4: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Key features ◦ Dynamic Load distribution

◦ Better Scalability

◦ Efficient Resource utilization

◦ Efficient Data retrieval

◦ Improving Performance

◦ Parition Tolerance

Page 5: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Research ApplicationsDesign for Bigdata storage and efficient retrieval using NoSQL Database over cloud E.g.. Cassendra [2], Amazon’s Dynamo[3]

Cloud storage mgt.: Data placement strategy in Cloud computing[7]

Virtual Partition Consistent Hashing (VPCH) : Balancing reducer load in map-reduce [8]

UbiCrawler: One of best Distributed Web Crawler [5]

Distributed Real – time storage : Fast, timely storage for IoT applications[6] E.g. Transport monitoring for traffic management system

Page 6: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Other Applications:Task: distribute items into buckets

◦ Data to memory locations

◦ Files to disks

◦ Tasks to processors

◦ Web pages to caches (our motivation)

Goal: balanced distribution

Page 7: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Background: Basic Hashing

to distribute objects among cache servers

Goal of Hashing in Distributed System :◦ Balanced distribution

◦ Smoothness: Little impact on buckets on

◦ Adding a bucket ( for scalability)

◦ Remove a bucket (Node Failure)

◦ Light Load : on each Node

◦ Spread : small set of object that may hold an object

Simple Hash Function Location = Hash(Key) % # Nodes

Node 3Node 2

Node 1Node 0

A B

C D

E F

G H

I J

K L

M N

O P

Page 8: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Hashing problem : Add/Remove nodeNew Node to Scale up performance

Redistribution Problem :%Data Moved 100%

New Node 4

Node 3Node 2

Node 1Node 0

A B

C D

F G

H

K

J

LP

O

M

I

N

E

Hashing : Distribution of object to nodes

Node 3Node 2

Node 1Node 0

A B

C D

E F

G H

I J

K L

M N

O P

Page 9: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Hashing problem : Add/Remove nodeRemove Node on Failure

Redistribution Problem : %Data Moved 100%

Node 3Node 2

Node 0

A

B C

D

F

G

H

K

J

L

P

O

M

I

N

E

Hashing : Distribution of object to nodes

Node 3Node 2

Node 1Node 0

A B

C D

E F

G H

I J

K L

M N

O P

Page 10: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Consistent HashingIntroduced by Karger et al.[1]

Consistent (same) hashing function for Cache (node) and data objects

Hash server : i.e. Hash(IP) Hash objects: i.e. Hash(o) % servers

Page 11: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

I

F

E

Redistribution on Joining of new node/Back up of failed node

Redistribution = % Data Moved = 100 * 1 / N i.e. 1/5 * (16) = 3

0

A

E

I

M

B

F

J

N C

G

K

O

D

H

L

P

Add Node

0

A

E

I

M

B

F

J

N C

G

K

O

D

H

L

P

0Node

1

Node 2Node

3

Node 4D

Page 12: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Consistent Hashing:Redistribution on a server down

Server 2 Fail

Page 13: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenge : Balanced Distributionfor efficient resource utilization and performance

Solution : Use Virtual Node for better balancing

A

B

C

DE

F

A

B

C

D

E

F

Page 14: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenge : Fault / Partition tolerance

Solution : Use Virtual Node for replica of object to nearer nodes

Page 15: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenge in designing Query model : Data access by variety of predicates

Solution : Use Key mapping for predicate attributes[4]

SELECT * FROM CUSTOMER WHERE zipcode=12345

Page 16: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenge : Consistency Cassandra solution: Tuneable consistency [3]

0

Node 1

Node 2

Node 3Node 4

Node 5

Node 6

replication_factor = 3

R1

R2

R3Client

INSERT INTO table (column1, …) VALUES (value1, …) USING CONSISTENCY ONE

Page 17: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenge : Consistency Cassandra solution: Tuneable consistency [3]

0

Node 1

Node 2

Node 3Node 4

Node 5

Node 6

replication_factor = 3

R1

R2

R3Client

INSERT INTO table (column1, …) VALUES (value1, …) USING CONSISTENCY QUORUM

Page 18: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Challenges :Deciding Hash function parameters (ie. URL, IP, website size, structure of site[5])

Parameters for Virtual node configuration for heterogeneous nodes (Power, memory capacity, disk capacity)

Configuring number of virtual nodes as per Energy Efficiency algorithm[9]

Page 19: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

References

1.D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, R. Panigrahy, "Consistent Hashing and Random Trees: Distributed Caching Protocols for RelievingHot Spots on the World Wide Web", In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, El Paso Texas, 1997, pp. 654-663.

2.G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s HighlyAvailable Key-Value Store,” in Symposium on Operating System Principles, vol. 7, 2007, pp. 205–220.

3.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,” ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40,2010.

4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,” ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40,2010.

5.Mitra Nasri1, Mohsen Sharifi2, “Load Balancing using Consistent Hashing: a Real Challenge for Large Scale Distributed Web Crawlers”, In Proceeding of theInternational Conference on Advanced Information Networking and Applications Workshops, 2009, IEEE DOI 10.1109pp. 715-720

6.Yanhua Sun, Jun Fang, Yanbo Han, “A Distributed Real-time Storage Method for Stream Data”, In Proceeding of 10th Web Information System andApplication Conference, 2013, IEEE DOI 10.1109, pp. 314-317

7.Qiang Li, Kun Wang, Nigel Linge, “A data placement strategy based on clustering and consistent hashing algorithm in Cloud Computing”, In proceeding of9th International Conference on Communications and Networking in China (CHINACOM), 2015, IEEE DOI 10.1109, pp. 69-72

8.Qi Liu*, Weidong Cai, Jian Shen, Baowei Wang, Nigel Linge, “VPCH: A Consistent Hashing Algorithm for Better Load Balancing in a Hadoop Environment”, Inproceeding of Third International Conference on Advanced Cloud and Big Data, 2015, IEEE DOI 10.1109, pp. 69-82

9.Frezewd Lemma, Thomas Knauth, Christof Fetzer, “PowerCass: Energy Efficient, Consistent Hashing Based Storage For Micro Clouds Based Infrastructure”,In proceeding of IEEE International Conference on Cloud Computing,, 2014, IEEE DOI 10.1109, pp. 48-54

Page 20: Consistent Hashing4.A. Lakshman and P. Malik, “Cassandra: A Decentralized Structured Storage System,”ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010. 5.Mitra

Thank You