zookeeper - usenix · 2019. 2. 25. · zookeeper service server server server server server •all...

24
ZooKeeper Wait-free coordination for Internet-scale systems Patrick Hunt and Mahadev (Yahoo! Grid) Flavio Junqueira and Benjamin Reed (Yahoo! Research)

Upload: others

Post on 29-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

ZooKeeper

Wait-free coordination for Internet-scale systems

Patrick Hunt and Mahadev (Yahoo! Grid)Flavio Junqueira and Benjamin Reed (Yahoo! Research)

Page 2: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Internet-scale Challenges

• Lots of servers, users, data

• FLP, CAP

• Mere mortal programmers

Page 3: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Classic Distributed System

Master

Slave SlaveSlaveSlaveSlaveSlave

Page 4: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Fault Tolerant Distributed System

Master

Slave SlaveSlaveSlaveSlaveSlave

CoordinationService

Master

Page 5: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Fault Tolerant Distributed System

Master

Slave SlaveSlaveSlaveSlaveSlave

CoordinationService

Master

Page 6: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Fully Distributed System

Worker WorkerWorkerWorkerWorkerWorker

CoordinationService

Page 7: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

What is coordination?

• Group membership

• Leader election

• Dynamic Configuration

• Status monitoring

• Queuing

• Barriers

• Critical sections

Page 8: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Goals

• Been done in the past

– ISIS, distributed locks (Chubby, VMS)

• High Performance

– Multiple outstanding ops

– Read dominant

• General (Coordination Kernel)

• Reliable

• Easy to use

Page 9: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

wait-free

• Pros

– Slow processes cannot slow down fast ones

– No deadlocks

– No blocking in the implementations

• Cons

– Some coordination primitives are blocking

– Need to be able to efficiently wait for conditions

Page 10: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Serializable vs Linearizability

• Linearizable writes

• Serializable read (may be stale)

• Client FIFO ordering

Page 11: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Change Events

• Clients request change notifications

• Service does timely notifications

• Do not block write requests

• Clients get notification of a change before they see the result of a change

Page 12: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Solution

Order + wait-free + change events = coordination

Page 13: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

ZooKeeper API

String create(path, data, acl, flags)

void delete(path, expectedVersion)

Stat setData(path, data, expectedVersion)

(data, Stat) getData(path, watch)

Stat exists(path, watch)

String[] getChildren(path, watch)

void sync()

Stat setACL(path, acl, expectedVersion)

(acl, Stat) getACL(path)

Page 14: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Data Model

• Hierarchal namespace (like a file system)

• Each znode has data and children

• data is read and written in its entirety

/

services

users

apps

locks

workers

YaView

s-1

worker2

worker1

Page 15: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Create Flags

• Ephemeral: znode deleted when creator fails or explicitly deleted

• Sequence: append a monotonically increasing counter

/

services

users

apps

locks

workers

YaView

s-1

worker2

worker1

Ephemeralscreated bySession X

Sequenceappendedon create

Page 16: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Configuration

• Workers get configuration

– getData(“.../config/settings”, true)

• Administrators change the configuration

– setData(“.../config/settings”, newConf, -1)

• Workers notified of change and get the new settings

– getData(“.../config/settings”, true)

config

settings

Page 17: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Group Membership

• Register serverName in group

– create(“.../workers/workerName”, hostInfo, EPHEMERAL)

• List group members

– listChildren(“.../workers”, true)

workers

worker2

worker1

Page 18: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Leader Election

• getData(“.../workers/leader”, true)

• if successful follow the leader described in the data and exit

• create(“.../workers/leader”, hostname, EPHEMERAL)

• if successful lead and exit

• goto step 1

workers

worker2

worker1

If a watch is triggered for“.../workers/leader”, followers willrestart the leader election process

leader

Page 19: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Locks

• id = create(“.../locks/x-”, SEQUENCE|EPHEMERAL)

• getChildren(“.../locks”/, false)

• if id is the 1st child, exit

• exists(name of last child before id, true)

• if does not exist, goto 2)

• wait for event

• goto 2)

locks

x-19

x-11

x-20

Each znode watches one other.No herd effect.

Page 20: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Shared Locks

• id = create(“.../locks/s-”, SEQUENCE|EPHEMERAL)

• getChildren(“.../locks”/, false)

• if no children that start with x- before id, exit

• exists(name of the last x- before id, true)

• if does not exist, goto 2)

• wait for event

• goto 2)

locks

x-19

s-11

x-20

x-19x-19

s-21

x-22

s-20

Page 21: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

ZooKeeper Servers

ZooKeeper Service

ServerServer ServerServerServerServer

• All servers have a copy of the state in memory

• A leader is elected at startup

• Followers service clients, all updates go through leader

• Update responses are sent when a majority of servers have persisted the change

We need 2f+1 machines to tolerate f failures

Page 22: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

ZooKeeper Servers

ZooKeeper Service

ServerServer ServerServerServerServer

Client ClientClientClientClientClient ClientClient

Leader

Page 23: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Current Performance

Page 24: ZooKeeper - USENIX · 2019. 2. 25. · ZooKeeper Service Server Server Server Server Server •All servers have a copy of the state in memory •A leader is elected at startup •Followers

Summary

• Easy to use

• High Performance

• General

• Reliable

• Release 3.3 on Apache

– See http://hadoop.apache.org/zookeeper

– Committers from Yahoo! and Cloudera