asynchronous architectures for implementing scalable cloud services - evan cooke - gluecon 2012

Post on 08-May-2015

6.017 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cloud services power the apps that are becoming backbone of modern society. The workload of cloud APIs is typically driven by external customers and can fluctuate dramatically minute-by-minute. Rapid spikes in load can result in request failures as load increases beyond backend capacity and the size of web worker pools. This talk explores the use of asynchronous frameworks like python Twisted and gevent to implement services that can dynamically keep socket connections open and increase request latency in order to avoid request failures. We explore how that architectural approach helps Twilio provides high-availability Voice and SMS APIs.

TRANSCRIPT

Asynchronous Architectures for Implementing Scalable Cloud ServicesDesigning for Graceful Degradation

EVAN COOKE

CO-FOUNDER & CTO twilioCLOUD COMMUNICATIONS

Cloud services power the apps that are the backbone of modern society. How

we work, play, and communicate.

Cloud WorkloadsCan Be

Unpredictable

6x spike in 5 mins

SMS API Usage

RequestLatency

Load

Time

FAIL

Danger!Load higher than instantaneous throughput

Don’t Fail Requests

LoadBalancer

Incoming Requests

AAA AAA AAA

...Throttling Throttling Throttling

Throttling Throttling Throttling

App Server

App Server

App Server

App Server

W

WW

W

WWW

W

WorkerPool

10%

70%

100%+

FailedRequests

Time

Worker Poolse.g., Apache/Nginx

Problem Summary

•Cloud services often use worker pools to handle incoming requests

•When load goes beyond size of the worker pool, requests fail

What next?

A few observations based on work implementing and scaling the Twilio API over the past 4 years...

• Twilio Voice/SMS Cloud APIs

• 100,000 Twilio Developers

• 100+ employees

Observation 1

For many APIs, taking more time to service a request is better than failing that request

Implication: in many cases, it is better to service a request with some delay rather than failing it

Observation 2

Matching the amount of available resources precisely to the size of incoming request worker pools is challenging

Implication: under load, it may be possible delay or drop only those requests that truly impact resources

What are we going to do?

Suggestion: if request concurrency was very cheap, we could implement delay and finer-grained resource controls much more easily...

Event-driven programming and the Reactor Pattern

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

TimeWorker

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req);resp = socket.read();print(resp);

1110000x10000000x10

Time

Huge IO latency blocks worker

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req, fn() {

socket.read(fn(resp) {print(resp);});

});

Make IO operations async and “callback” when done

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req, fn() {

socket.read(fn(resp) {print(resp);});

});Central dispatch to coordinate event callbacksreactor.run_forever();

Event-driven programming and the Reactor Pattern

req = ‘GET /’;req.append(‘/r/n/r/n’);socket.write(req, fn() {

socket.read(fn(resp) {print(resp);});

});reactor.run_forever();

11

10

Time

1010

Result: we don’t block the worker

(Some)Reactor Pattern Frameworks

js/node.js

python/twistedpython/gevent

c/libeventc/libev

ruby/eventmachine

java/nio/netty

The Callback Mess

Python Twistedreq = ‘GET /’req += ‘/r/n/r/n’

def r(resp): print resp

def w(): socket.read().addCallback(r)

socket.write().addCallback(w)

The Callback Mess

Python Twistedreq = ‘GET /’req += ‘/r/n/r/n’

yield socket.write()resp = yield socket.read()print resp

Use deferred generators and inline callbacks

The Callback Mess

Python Twistedreq = ‘GET /’req += ‘/r/n/r/n’

yield socket.write()resp = yield socket.read()print resp

Easy sequential programming with

mostly implicit async IO

Enter gevent“gevent is a coroutine-based Python networking library that uses greenlet

to provide a high-level synchronous API on top of the libevent event loop.”

socket.write()resp = socket.read()print resp

Natively Async

Enter gevent

from gevent.server import StreamServer

def echo(socket, address): print ('New connection from %s:%s' % address) socket.sendall('Welcome to the echo server!\r\n') line = fileobj.readline() fileobj.write(line) fileobj.flush() print ("echoed %r" % line)

if __name__ == '__main__': server = StreamServer(('0.0.0.0', 6000), echo) server.serve_forever()

Simple Echo Server

Easy sequential modelFully async

Async Services with Ginkgo

Ginkgo is a simple framework for composing async gevent services with common

configuration, logging, demonizing etc.

https://github.com/progrium/ginkgo

Let’s look a simple example that implements a TCP and

HTTP server...

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

Import WSGI/TCPServers

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

HTTP Handler

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

TCP Handler

Async Services with Ginkgoimport geventfrom gevent.pywsgi import WSGIServerfrom gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) print 'new http request!' return ["hello world"]

def handle_tcp(socket, address): print 'new tcp connection!' while True: socket.send('hello\n') gevent.sleep(1)

app = Service()app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))app.serve_forever()

Service Composition

LoadBalancer

...

Incoming Requests

Async Server

Async Server

Async Server

Using our async reactor-based approach let’s redesign our serving infrastructure

LoadBalancer

...

Incoming Requests

Async Server

AAA

Async Server

AAA

Async Server

AAA

Step 1: define an authentication and authorization layer that will identify the user and the resource being requested

LoadBalancer

...

Incoming Requests

Throttling

Async Server

AAA

Throttling

Async Server

AAA

Throttling

Async Server

AAA

ConcurrencyManager

Step 2: add a throttling layer and concurrency manager

Concurrency Admission Control

•Goal: limit concurrency by delaying or selectively failing requests

•Common metrics- By Account

- By Resource Type

- By Availability of Dependent Resources

•What we’ve found useful- By (Account, Resource Type)

Delay - delay responses without failing requests

Latency

Load

Load

Latency /x Fail

Latency /*

Deny - deny requests based on resource usage

LoadBalancer

...

Incoming Requests

Throttling

App Server

AAA

Throttling

App Server

AAA

Throttling

App Server

AAA

DependentServices

ConcurrencyManager

Throttling Throttling Throttling

Step 3: allow backend resources to throttle requests

SummaryAsync frameworks like gevent allow you to easily decouple a request from access to constrained resources

RequestLatency

Time

Service-wideFailure

Don’t Fail RequestsDecrease

Performance

CONTENTS CONFIDENTIAL & COPYRIGHT © TWILIO INC. 2012

Evan Cooke@emcooke

twilio

top related