asynchronous architectures for implementing scalable cloud services - evan cooke - gluecon 2012

Asynchronous Architectures for Implementing Scalable Cloud ServicesDesigning for Graceful Degradation

EVAN COOKE

CO-FOUNDER & CTO twilioCLOUD COMMUNICATIONS

Cloud services power the apps that are the backbone of modern society. How

we work, play, and communicate.

Cloud WorkloadsCan Be

Unpredictable

6x spike in 5 mins

SMS API Usage

RequestLatency

Danger!Load higher than instantaneous throughput

Don’t Fail Requests

LoadBalancer

Incoming Requests

AAA AAA AAA

...Throttling Throttling Throttling

Throttling Throttling Throttling

App Server

WorkerPool

FailedRequests

Worker Poolse.g., Apache/Nginx

Problem Summary

•Cloud services often use worker pools to handle incoming requests

•When load goes beyond size of the worker pool, requests fail

What next?

A few observations based on work implementing and scaling the Twilio API over the past 4 years...

• Twilio Voice/SMS Cloud APIs

• 100,000 Twilio Developers

• 100+ employees

Observation 1

For many APIs, taking more time to service a request is better than failing that request

Implication: in many cases, it is better to service a request with some delay rather than failing it

Observation 2

Matching the amount of available resources precisely to the size of incoming request worker pools is challenging

Implication: under load, it may be possible delay or drop only those requests that truly impact resources

What are we going to do?

Suggestion: if request concurrency was very cheap, we could implement delay and finer-grained resource controls much more easily...

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

TimeWorker

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

Huge IO latency blocks worker

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req, fn() {

socket.read(fn(resp) {print(resp);});

Make IO operations async and “callback” when done

});Central dispatch to coordinate event callbacksreactor.run_forever();

});reactor.run_forever();

Result: we don’t block the worker

(Some)Reactor Pattern Frameworks

js/node.js

python/twistedpython/gevent

c/libeventc/libev

ruby/eventmachine

java/nio/netty

The Callback Mess

Python Twistedreq = ‘GET /’req += ‘/r/n/r/n’

def r(resp): print resp

def w(): socket.read().addCallback(r)

socket.write().addCallback(w)

The Callback Mess

yield socket.write()resp = yield socket.read()print resp

Use deferred generators and inline callbacks

The Callback Mess

yield socket.write()resp = yield socket.read()print resp

Easy sequential programming with

mostly implicit async IO

Enter gevent“gevent is a coroutine-based Python networking library that uses greenlet

to provide a high-level synchronous API on top of the libevent event loop.”

socket.write()resp = socket.read()print resp

Natively Async

Enter gevent

from gevent.server import StreamServer

def echo(socket, address): print ('New connection from %s:%s' % address) socket.sendall('Welcome to the echo server!\r\n') line = fileobj.readline() fileobj.write(line) fileobj.flush() print ("echoed %r" % line)

if __name__ == '__main__': server = StreamServer(('0.0.0.0', 6000), echo) server.serve_forever()

Simple Echo Server

Easy sequential modelFully async

Async Services with Ginkgo

Ginkgo is a simple framework for composing async gevent services with common

configuration, logging, demonizing etc.

https://github.com/progrium/ginkgo

Let’s look a simple example that implements a TCP and

HTTP server...

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

Import WSGI/TCPServers

HTTP Handler

TCP Handler

Service Composition

LoadBalancer

Incoming Requests

Async Server

Using our async reactor-based approach let’s redesign our serving infrastructure

LoadBalancer

Incoming Requests

Async Server

Step 1: define an authentication and authorization layer that will identify the user and the resource being requested

LoadBalancer

Incoming Requests

Throttling

Async Server

Throttling

Async Server

Throttling

Async Server

ConcurrencyManager

Step 2: add a throttling layer and concurrency manager

Concurrency Admission Control

•Goal: limit concurrency by delaying or selectively failing requests

•Common metrics- By Account

- By Resource Type

- By Availability of Dependent Resources

•What we’ve found useful- By (Account, Resource Type)

Delay - delay responses without failing requests

Latency

Latency /x Fail

Latency /*

Deny - deny requests based on resource usage

LoadBalancer

Incoming Requests

Throttling

App Server

Throttling

App Server

Throttling

App Server

DependentServices

ConcurrencyManager

Throttling Throttling Throttling

Step 3: allow backend resources to throttle requests

SummaryAsync frameworks like gevent allow you to easily decouple a request from access to constrained resources

RequestLatency

Service-wideFailure

Don’t Fail RequestsDecrease

Performance

Evan Cooke@emcooke

twilio

asynchronous architectures for implementing scalable cloud services - evan cooke - gluecon 2012

Technology

gluecon 2013 netflix api crash course

what's next for apis - gluecon 2011

gluecon kubernetes & container engine

data reduction for the scalable automated analysis...

architecting ecommerce apis - gluecon 13

mobile single sign-on (gluecon '15)

storm: the real-time layer - gluecon 2012

gluecon 2010

scaling twilio - evan cooke - twilio conference 2011

gluecon oauth-03

gluecon 2014 - bringing node.js to the jvm

gluecon infinitegraph/db

gaming aws with docker - gluecon 2014

nosql session gluecon may 2010

immutable infrastructure with docker and containers (gluecon...

coreos @ gluecon 2015

a hybrid honeypot architecture for scalable network …...

tracking huge files with git lfs (gluecon 2016)

gluecon monitoring microservices and containers: a challenge

developing polyglot persistence applications (gluecon 2013)