distributed coordination with python

40
DISTRIBUTED COORDINATION WITH PYTHON Ben Bangert mozilla

Upload: oscon-byrum

Post on 06-Dec-2014

1.439 views

Category:

Technology


0 download

DESCRIPTION

This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.

TRANSCRIPT

Page 1: Distributed Coordination with Python

DISTRIBUTED COORDINATIONWITH PYTHON

Ben Bangertmozilla

Page 2: Distributed Coordination with Python

Tools of the Trade

Page 3: Distributed Coordination with Python

DISTRIBUTED COORDINATION IS NOT...

• Distributed Databases (Cassandra, Riak)

• Distributed Computing (Hadoop, etc.)

• Distributed Event Analysis (Storm)

Page 4: Distributed Coordination with Python

The Common Element

Page 5: Distributed Coordination with Python

Apache Zookeeper

Page 6: Distributed Coordination with Python

ZooKeeper is a centralized service for maintaining configuration information,

naming, providing distributed synchronization, and providing group services.

Page 7: Distributed Coordination with Python

ZOOKEEPER

Page 8: Distributed Coordination with Python

WHY NOT USE...

• Memcached?

• MongoDB?

• Postgres/MySQL?

Page 9: Distributed Coordination with Python
Page 10: Distributed Coordination with Python
Page 11: Distributed Coordination with Python

Hierarchical data structure in znodes

Page 12: Distributed Coordination with Python
Page 13: Distributed Coordination with Python

• Session Based

• Znode watches

• Ephemeral and Sequential Znodes

Page 14: Distributed Coordination with Python

• Last for duration of client session

• Session dies when connection is closed or expires

• Can’t have children znodes

EPHEMERAL ZNODES

Page 15: Distributed Coordination with Python

SEQUENTIAL ZNODES

• Supply a node name (or not), get node name back with a trailing sequence number (0001, 0002, 0003, etc.)

• Can be combined with ephemeral flag

Page 16: Distributed Coordination with Python

BASIC COMMANDS

• create(PATH, DATA...)

• get(PATH...)

• get_children(PATH...)

• set(PATH, DATA...)

• delete(PATH...)

Page 17: Distributed Coordination with Python
Page 18: Distributed Coordination with Python

PYTHON CLIENTS

• txzookeeper

• kazoo

• unified client that works with gevent

• implements wire protocol in pure Python

Page 19: Distributed Coordination with Python

USE KAZOO

Page 20: Distributed Coordination with Python

EASY TO USE

from kazoo.client import KazooClient

client = KazooClient()client.start()

Page 21: Distributed Coordination with Python

USE CASES

Page 22: Distributed Coordination with Python

CONFIGURATION

• Store settings in node data

• Organize node structure

• Set watches on nodes of interest

Page 23: Distributed Coordination with Python
Page 24: Distributed Coordination with Python

PARTY MEMBERSHIP

• Join a party, find out who else is around

• Elect a leader if desired

• Recipe in Kazoo

Page 25: Distributed Coordination with Python

LOCKS

• Lock a resource for a single client

• Lock a resource for multiple clients (Semaphore)

• Hard to write properly

• Recipe in Kazoo

Page 26: Distributed Coordination with Python

BUILDING HIGHER LEVELABSTRACTIONS

ONZOOKEEPER

Page 27: Distributed Coordination with Python

CAVEAT

Page 28: Distributed Coordination with Python

DO NOT IMPLEMENT YOURSELFUSE THE RECIPE

Page 29: Distributed Coordination with Python
Page 30: Distributed Coordination with Python

BASIC STEPS

• Create lock parent node if needed

• Create ephemeral+sequence node under parent, store node name returned

• Get children of lock node

• Sort children list by sequence number

• First child in the list has the lock!

Page 31: Distributed Coordination with Python

THINGS TO WATCH OUT FOR

• Avoid the thundering herd, use watches only when needed

• When our node isn’t the lowest, watch the one in front of us

• Only one client wanting a lock is ‘woken’ when the lock is released by a different client

Page 32: Distributed Coordination with Python

HANDLING FAILURE

Page 33: Distributed Coordination with Python

ROBUST CODE TAKES EFFORT

• What happens when a server fails?

• What happens when the client fails?

• What happens when we don’t know if the server has failed?

Page 34: Distributed Coordination with Python

STOPPING WHEN UNCERTAIN

Page 35: Distributed Coordination with Python

A BIT BETTER VERSION...

Page 36: Distributed Coordination with Python

EVEN BETTER

Page 37: Distributed Coordination with Python

FAILURE WILL HAPPEN

• Fail fast, fail completely.

• Session expiration is a good time to sys.exit

• Always include jitter (kazoo includes jitter on its connection and command retry operations)

• Consider what exceptions can occur in any code relying on a distributed system

Page 38: Distributed Coordination with Python

• Distributed systems are hard

• Use existing battle-proven tools (Zookeeper, Kazoo)

• Always consider everything that can fail, and how

• Be wary of tools that don’t tell you how they fail

• Read Kyle Kingsbury’s Jepsen posts to see examples of systems failing: http://aphyr.com/tags/jepsen

Page 39: Distributed Coordination with Python

FIN

Page 40: Distributed Coordination with Python

QUESTIONS?