scalable service architectures

Scalable Service Architectures

Lessons learned

Zoltán Németh

Engineering Manager, Core Systems

Agenda

Our scalability experience

What is Scalability?

Requirements in detail

Tips and tools

Extras, Closing remarks

Our experience

Streaming stack

Defining scalability

Scalability is the ability to handle increased workload

by repeatedly applying a costeffective strategy for

extending a system’s capacity.

(CMU paper, 2006)

How well a solution to some problem will work when

the size of the problem increases. When the size

decreases, the solution must fit. (dictionary.com and

Theo Schlossnagle, 2006)

Self-contained

service

Explicitly declare and

isolate dependencies

Isolation from the outside

system

Static linking

Do not rely on system

packages

Disposability Maximize robustness with

fast startup and graceful

shutdown

Disposable processes

Graceful shutdown on

SIGTERM

Handling sudden death:

robust queue backend

Startup and

Shutdown

Automate all the things

Chef

Docker

Gold image based

deployment

Immutable

Handling tasks before

shutdown

Backing Services Treat backing services as

attached resources

No distinction between

local and third party

services

Easily swap out resources

Export services via port

binding

Become the backing

service for another app

Processes,

concurrency Stateless processes (not

even sticky sessions)

Process types by work type

We <3 linux process

Shared-nothing adding

concurrency is safe

Process distribution

spanning machines

Statelessness Store everything in a

datastore

Aggregate data

Chandra

Scalable datastores

Redis

Cassandra

Aerospike

Handling user sessions

Monitoring Application state and

metrics

Dashboards

Alerting

Health

Remove failing nodes

Capacity

Act on trends

Monitoring Metrics collecting

Graphite, New Relic

Self-aware checks

Cluster state

Zookeeper, Consul

Scaling decision types

Capacity amount

Graph derivative

App requests

Load Balance and

Resource

Allocation

Load Balance: distribute

tasks

Utilize machines

efficiently

VM compatible apps

Flexibility

Adapting to available

resources

Load Balance DNS or API

App level balance

Uniform entry point or

proxy

Balance decisions

Load

Zookeeper state

Resource policies

Service

Separation

Failure is inevitable

Protect from failing

components

Cascading failure

Fail fast

Decoupling

Asynchronous operations

Message queues

Service

Separation

Rate limiting

Circuit Breaker pattern

Stop cascading failure,

allow recovery

Hystrix

Fail fast, fail silent

Service decoupling

Extras Debugging features

Logs

Clojure / JS consoles

Runtime configuration

via env

Scaling API

Integrating several

cloud providers

Automatic start / stop

Reading

Scalable Internet Architectures by Theo Schlossnagle

The 12-factor App: http://12factor.net/

Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf

Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html

Release It! by Michael T. Nygard

Questions

[email protected]