the chubby lock service for loosely-coupled distributed systems
DESCRIPTION
The Chubby Lock Service for Loosely-coupled Distributed Systems. Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan. Outline. Design System structure Locks, caching, failovers Scaling mechanism Use and observations As name service Failover problems. - PowerPoint PPT PresentationTRANSCRIPT
The Chubby Lock Service for Loosely-coupled Distributed Systems
Mike Burrow, Google Inc
Presented by Xin (Joyce) Zhan
Outline
• Design– System structure– Locks, caching, failovers– Scaling mechanism
• Use and observations– As name service– Failover problems
Lock service for distributed system
• Synchronize access to shared resources
• Other usage– Primary election, meta-data storage, name se
rvice
• Reliability, availability
System Strucure
System Structure
• Set of replicas
• Periodically elected master– Master lease– Paxos protocol
• All client requests are directed to master– updates propagated to replicas
• Replace failed replicas– master periodically polls DNS
Design
• Store small files
• Event notification mechanism
• Consistent caching
• Advisory lock (vs.mandatory)– confilct only when others attempt to acquire th
e same lock
• Coarse grained locks– survive lock server failures
Design - File Interface
• Ease distribution– /ls/fool/wombat/pouch
• Node meta-data include Access Control Lists
• Handle– analogous to UNIX file descriptors– support for use across master changes
Design - Sequencer for lock
• Delayed / Out-of-order messages– introduce sequence numbers into interactions
that use locks– lock holder requests a sequencer, pass it to fil
e server to validate
• Alternative– lock-delay
Design - Events
• Client subscribes when creating handle• Delivered async via up-call from client library• Event types
– file contents modified– child node added / removed / modified– Chubby master failed over– handle / lock have become invalid– lock acquired / conflicting lock request (rarely used)
Design - Caching
• Clients cache file data and meta data – Consistent, write-through
• Invalidation– master keeps list of what clients may have cached– master sends invalidations on top of KeepAlive– clients flush changed data, ack. with KeepAlive– server proceeds the modification only after invalidatio
n
• Clients cache open handle and locks
Design - Sessions
• Session maintained through KeepAlives– handles, locks, cached data remain valid– lease
• Lease timeout advanced when– creation of a session– master fail-over occurs– master responds to KeepAlive RPC
Design - KeepAlive
• Master responds close to lease timeout• Client sends another KeepAlive immediately• Client maintains local lease timeout
– conservative approximation
• When local lease expires– disable cache– session in jeopardy, client waits in grace period– cache enabled on reconnect
• Application informed about session changes– Jeopardy/safe/expired event
Design – Failovers
Design - Failovers
• In-memory state discarded– sessions, handles, locks, etc.
• Lease timer “stops”• Fast master election
– client reconnect before lease expires
• Slow master election– clients flush cache, enter grace period
• New master reconstruct the assumption of in-memory state of previous master
Design - Failovers
Steps of newly-elected master:• Pick new epoch number• Respond only to master location requests• Build in-memory state for sessions / locks from databa
se• Respond to KeepAlives• Emit fail-over events to sessions, flush caches• Wait for acknowledgements / session expire• Allow all operations to proceed• Allow clients to use handles created before fail-over• Delete ephemeral files w/o open handles after an inter
val
Design - Backup and Mirroring
• Master writes snapshots every few hours– GFS server in different building
• Collection of files mirrored across cells– /ls/global/master mirrored to /ls/cell/slave
• Mostly for configuration files– Chubby’s own ACLs– Files advertising presence / location– pointers to Bigtable cells
Design - Scaling Mechanisms
• 90,000 clients communicate with one cell• Regulate the number of Chubby cells
– client use the nearby cell
• Increase lease time• Client caching• Protocol-conversion servers
Scaling - Proxies
• Proxies pass requests from clients to cell
• Reduce traffic of KeepAlive and read requests– Not writes, but writes << 1% of workload– KeepAlive traffic by far most dominant
• Overheads:– additional RPC for writes / first time reads– increased probability of unavailability
Scaling - Partitioning
• Namespace of a cell partitioned between servers
• N partitions, each with master and replicas– Node D/C stored on P(D/C) = hash(D) mod N– meta-data for D may be on different partition
• Little cross-partition communication
• Reduce R/W traffic, no necessarily KeepAlive
Use and Observations
• Many files for naming• Config, ACL, meta-da
ta common• 10 clients use each c
ached file, on avg.• Few locks held, no sh
ared locks• KeepAlives dominate
RPC traffic
Use as Name Service
• DNS uses TTL values– entries must be refreshed within that time– huge (and variable) load on DNS server
• Chubby’s caching uses invalidations, no polling– client builds up needed entries in cache– name entries further grouped in batches
Failover problems
• Master writes sessions to DB when created– Overload when start of many processes at once
• Instead, store session at first modification / lock acquisition etc.
• Active sessions recorded with probability on KeepAlive– spread out writes in time– young read-only session may be discarded in a fail-ov
er
Failover problems
• New design – do not record sessions in database– recreate them like handles after fail-over– new master waits full lease time before operat
ions proceed
Lesson learnt
• Developers rarely consider availability– should plan for short Chubby outages
• Fine-grained locking not essential
• Poor API choices– handles acquiring locks cannot be shared
• RPC use affects transport protocols– forced to send KeepAlives by UDP for timeline
ss
Q & A