python, async web frameworks, and mongodb

Post on 17-Dec-2014

12.396 Views

Category:

Technology

8 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk covering the state of the art for writing asynchronous web applications using Python and MongoDB.

TRANSCRIPT

Python, MongoDB, and asynchronous web frameworks

A. Jesse Jiryu Davisjesse@10gen.comemptysquare.net

Agenda• Talk about web services in a really dumb

(“abstract”?) way• Explain when we need async web servers• Why is async hard?• What is Tornado and how does it work?• Why am I writing a new PyMongo wrapper to

work with Tornado?• How does my wrapper work?

CPU-bound web service

Client Serversocket

• No need for async• Just spawn one process per core

Normal web service

Client Serversocket

• Assume backend is unbounded• Service is bound by: • Context-switching overhead • Memory!

Backend(DB, web service,

SAN, …)socket

What’s async for?

• Minimize resources per connection• I.e., wait for backend as cheaply as possible

CPU- vs. Memory-bound

Crypto ChatMost web services?•

Memory-boundCPU-bound

HTTP long-polling (“COMET”)

• E.g., chat server• Async’s killer app• Short-polling is CPU-bound: tradeoff between

latency and load• Long-polling is memory bound• “C10K problem”: kegel.com/c10k.html• Tornado was invented for this

Why is async hard to code?BackendClient Server

request

response

store state

request

response

time

Ways to store statethis slide is in beta

Coding difficulty

Multithreading

Tornado, Node.jsGreenlets / Gevent

Mem

ory

per c

onne

ction

What’s a greenlet?

• A.K.A. “green threads”• A feature of Stackless Python, packaged as a

module for standard Python• Greenlet stacks are stored on heap, copied

to / from OS stack on resume / pause• Cooperative• Memory-efficient

Threads:State stored on OS stacks

# pseudo-Python

sock = listen()

request = parse_http(sock.recv())

mongo_data = db.collection.find()

response = format_response(mongo_data)

sock.sendall(response)

Gevent:State stored on greenlet stacks

# pseudo-Pythonimport gevent.monkey; monkey.patch_all() sock = listen() request = parse_http(sock.recv()) mongo_data = db.collection.find() response = format_response(mongo_data) sock.sendall(response)

Tornado:State stored in RequestHandler

class MainHandler(tornado.web.RequestHandler): @tornado.web.asynchronous def get(self): AsyncHTTPClient().fetch(

"http://example.com", callback=self.on_response)  def on_response(self, response): formatted = format_response(response) self.write(formatted) self.finish()

Tornado IOStreamclass IOStream(object): def read_bytes(self, num_bytes, callback): self.read_bytes = num_bytes self.read_callback = callback

io_loop.add_handler( self.socket.fileno(),

self.handle_events,events=READ)

def handle_events(self, fd, events): data = self.socket.recv(self.read_bytes) self.read_callback(data)

Tornado IOLoop

class IOLoop(object): def add_handler(self, fd, handler, events): self._handlers[fd] = handler # _impl is epoll or kqueue or ... self._impl.register(fd, events)

def start(self): while True: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)

Python, MongoDB, & concurrency

• Threads work great with pymongo• Gevent works great with pymongo– monkey.patch_socket(); monkey.patch_thread()

• Tornado works so-so– asyncmongo

• No replica sets, only first batch, no SON manipulators, no document classes, …

– pymongo• OK if all your queries are fast• Use extra Tornado processes

Introducing: “Motor”

• Mongo + Tornado• Experimental• Might be official in a few months• Uses Tornado IOLoop and IOStream• Presents standard Tornado callback API• Stores state internally with greenlets• github.com/ajdavis/mongo-python-driver/tree/tornado_async

Motorclass MainHandler(tornado.web.RequestHandler): def __init__(self): self.c = MotorConnection()

@tornado.web.asynchronous def post(self): # No-op if already open self.c.open(callback=self.connected)

def connected(self, c, error): self.c.collection.insert( {‘x’:1}, callback=self.inserted)

def inserted(self, result, error): self.write(’OK’) self.finish()

Motor internals

pymongoIOLoop RequestHandlerrequest

schedulecallback

start

time

Client greenlet

IOStream.sendall(callback)switch()

switch()

return

stack depth

callback()

HTTP response

parse Mongo response

callback()

Motor internals: wrapperclass MotorCollection(object): def insert(self, *args, **kwargs): callback = kwargs['callback'] del kwargs['callback'] kwargs['safe'] = True

def call_insert(): # Runs on child greenlet result, error = None, None try: sync_insert = self.sync_collection.insert result = sync_insert(*args, **kwargs) except Exception, e: error = e

# Schedule the callback to be run on the main greenlet tornado.ioloop.IOLoop.instance().add_callback( lambda: callback(result, error) )

# Start child greenlet greenlet.greenlet(call_insert).switch()

return

1

2

3

6

8

Motor internals: fake socketclass MotorSocket(object): def __init__(self, socket): # Makes socket non-blocking self.stream = tornado.iostream.IOStream(socket)

def sendall(self, data): child_gr = greenlet.getcurrent()

# This is run by IOLoop on the main greenlet # when data has been sent; # switch back to child to continue processing def sendall_callback(): child_gr.switch()

self.stream.write(data, callback=sendall_callback)

# Resume main greenlet child_gr.parent.switch()

4

5

7

Motor

• Shows a general method for asynchronizing synchronous network APIs in Python

• Who wants to try it with MySQL? Thrift?• (Bonus round: resynchronizing Motor for

testing)

Questions?

A. Jesse Jiryu Davisjesse@10gen.comemptysquare.net

(10gen is hiring, of course:10gen.com/careers)

top related