erlang as a cloud citizen

69
Erlang as a cloud citizen Paolo Negri @hungryblank

Upload: wooga

Post on 29-Nov-2014

3.306 views

Category:

Technology


1 download

DESCRIPTION

This talk wants to sum up the experience of designing, deploying and maintaining an Erlang application targeting the cloud and precisely AWS as hosting infrastructure. As the application now serves a significantly large user base with a sustained throughput of thousands of games actions per second we're able to analyse retrospectively our engineering and architectural choices and see how Erlang fits in the cloud environment also comparing it to previous experiences of clouds deployments of other platforms. We'll discuss properties of Erlang as a language and OTP as a framework and how we used them to design a system that is a good cloud citizen. We'll also discuss topics that are still open for a solution.

TRANSCRIPT

Page 1: Erlang as a Cloud Citizen

Erlang as a cloud citizenPaolo Negri @hungryblank

Page 2: Erlang as a Cloud Citizen

• 10 millions players

• 2 billions game server requests (http)

• 20 devops people

1 day at

Page 3: Erlang as a Cloud Citizen

Cloud

“A cloud is made of billows upon billows upon billows that look like clouds.

As you come closer to a cloud you don't get something smooth, but irregularities at a smaller scale.”

Benoît B. Mandelbrot

http://www.flickr.com/photos/nirak/644336486

Page 4: Erlang as a Cloud Citizen

AWS Cloud

Page 5: Erlang as a Cloud Citizen

This talk will answer

• Why building a system targeting the cloud?

• How many EC2 instances do you need to respond to 0.25 billion uncacheable game reqs/day?

Page 6: Erlang as a Cloud Citizen

15 months ago...

http://www.flickr.com/photos/wheatfields/515068701

Page 7: Erlang as a Cloud Citizen

1st cloud hosted project, lessons learned

Pushing live the 60th application server?

not different from adding the 6th!

push a button1

Page 8: Erlang as a Cloud Citizen

1st cloud hosted project, lessons learned

local network/local disk are low performance general purpose tools

(nothing to do with ad hoc data center solutions)2

Page 9: Erlang as a Cloud Citizen

1st cloud hosted project, lessons learned

Complete automation is cool

Ease of adding hosts/automation can lead to bloated infrastructure3

Page 10: Erlang as a Cloud Citizen

1st cloud hosted project, points of pain

• A lot of inefficient app servers (as per tweets)

• Much effort to scale up/maintain databases (mySQL & Redis)

• Expensive, not crazy expensive, but expensive

Page 11: Erlang as a Cloud Citizen

Why trying again?

http://www.flickr.com/photos/kky/704056791/

Page 12: Erlang as a Cloud Citizen

Uncertainity

• will we reach100K or 3 millions users?

• 3 million users in 2 weeks or 12 months?

• cheat tool released 1h ago => single game call up 5000%

• weekly releases, new feature performance impact?

Page 13: Erlang as a Cloud Citizen

the cloud

• standard units (instances) of computing capacity

• a network connecting all instances

• an API to provision/dismiss instances

Page 14: Erlang as a Cloud Citizen

the cloud

Sounds like a good framework to compose computing capacity

Why didn’t work as a framework to compose throughput?

Page 15: Erlang as a Cloud Citizen

Scaling in the cloudthe recipe

CLOUD: composable units of computing capacity

+DEVELOPER: turn a unit of computing capacity in

a unit of throughput

=composable throughput, a plan for scaling BIG

Page 16: Erlang as a Cloud Citizen

turn a unit of computing capacity

in a unit of throughput

http://www.flickr.com/photos/pasukaru76

Page 17: Erlang as a Cloud Citizen

Unit of throughput,where?

Database

App server

Page 18: Erlang as a Cloud Citizen

Unit of throughput,joke?

Database

App serverApp server App server App server

Database

Cache

Page 19: Erlang as a Cloud Citizen

Unit of throughput?

Database

App server

No unit!

Page 20: Erlang as a Cloud Citizen

Unit of throughput?

Database

App server

Tightly coupled throughput?

Page 21: Erlang as a Cloud Citizen

Unit of throughput?

Database

App server

Monolithic throughput?

Page 22: Erlang as a Cloud Citizen

Monolithic throughput

Page 23: Erlang as a Cloud Citizen

Monolithic throughput

• likes monolithic infrastructure!

• scales well vertically

• wants screaming fast stack (network, disks...)

• any performance glitch impacts the whole system

Page 24: Erlang as a Cloud Citizen

Tightly coupled throughput+

loosely coupled hardware(like cloud)

=frustration

Page 25: Erlang as a Cloud Citizen

Who leads the tightly coupled dance?

Database

App server

Page 26: Erlang as a Cloud Citizen

Who leads the tightly coupled dance?

Database

App server

The stateless application server!

Page 27: Erlang as a Cloud Citizen

Stateless application servers guarantee

one thing...

which?

Page 28: Erlang as a Cloud Citizen

Data is neverwhere you need it

Page 29: Erlang as a Cloud Citizen

And another one...

Page 30: Erlang as a Cloud Citizen

If you can feed them data fast enough...

they’ll choke on garbage collection

Page 31: Erlang as a Cloud Citizen

We measure memcacheHIT / MISS

why app servers need to be 100% MISS?

Page 32: Erlang as a Cloud Citizen

Where’s the best knowledge about hot/

cold data?

Page 33: Erlang as a Cloud Citizen

Even the reverse makes more sense

Database

App server

1. pick your data up

2. go in the stateless app server

Page 34: Erlang as a Cloud Citizen

WhatWentWrong?

Page 35: Erlang as a Cloud Citizen

He can tell you!

• Rich Hickey

• Clojure author

“...If not in Erlang which I think has a complete story for how they do state”[1][1] Value Identity State @0.27

http://goo.gl/Zdjv0

http://www.flickr.com/photos/ghoseb/5120173586

Page 36: Erlang as a Cloud Citizen

Most languages and runtimes don’t have a safe solution for concurrent, long lived state

Erlang stands out as an exception in this panorama

Page 37: Erlang as a Cloud Citizen

Erlang...

Processes are the primary means to structure an Erlang

application.

wikipedia

Page 38: Erlang as a Cloud Citizen

Erlang + OTPGeneric Server Behaviour

A generic server process (gen_server) implemented

using this module...

otp documentation

Page 39: Erlang as a Cloud Citizen

Erlang + OTPGeneric Server Behaviour

handle_call(_Request, _From, State) ->    {reply, ignored, State}.

handle_cast(_Msg, State) ->    {noreply, State}.

handle_info(_Info, State) ->    {noreply, State}.

terminate(_Reason, _State) ->    ok.

code_change(_OldVsn, State, _Extra) ->    {ok, State}.

Page 40: Erlang as a Cloud Citizen

gen_server

• An erlang process

• With LOCAL state

• responding to requests from clients

A unit of throughput!

Page 41: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instancegen_server

+

Page 42: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instance

1 EC2 instance + 1 erlang VM=

N kilo gen_servers (N kilo units of throughput)

Page 43: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instance

1 EC2 instance + 1 erlang VM=

N kilo gen_servers (N kilo units of throughput)

Page 44: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instance

1 EC2 instance + 1 erlang VM=

N kilo gen_servers (N kilo units of throughput)

Page 45: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instance

1 EC2 instance + 1 erlang VM=

N kilo gen_servers (N kilo units of throughput)

Page 46: Erlang as a Cloud Citizen

Composing throughput with erlang

EC2 instance

1 EC2 instance + 1 erlang VM=

N kilo gen_servers (N kilo units of throughput)

Page 47: Erlang as a Cloud Citizen

Scale by adding units instances

Page 48: Erlang as a Cloud Citizen

Scale up adding units instances

Page 49: Erlang as a Cloud Citizen

Erlang distribution

Page 50: Erlang as a Cloud Citizen

Throughput complexity

• Losely coupled peers• Independent throughput

VS.

• Tightly coupled roles• Dependent throughput

Page 51: Erlang as a Cloud Citizen

Where does the state come from?

gen_serverDatabase

Start

Stop

Page 52: Erlang as a Cloud Citizen

Now with database

AWSS3

Page 53: Erlang as a Cloud Citizen

Database scalability

AWSS3

DB is (almost) never on latency critical path

No need for low latency DB

Page 54: Erlang as a Cloud Citizen

Database scalability

AWSS3

Throughput required is low

We can approximate S3 capacity as infinite

Page 55: Erlang as a Cloud Citizen

Database scalability

AWSS3

Ubiquitous and uniform from application servers point of view.

Page 56: Erlang as a Cloud Citizen

Remember?

AWSS3

Page 57: Erlang as a Cloud Citizen

How it actually works

Data from the S3 is uniformly

available to any ec2 instance

Page 58: Erlang as a Cloud Citizen

And as you zoom in...

Page 59: Erlang as a Cloud Citizen

And zoom in...

Page 60: Erlang as a Cloud Citizen

And zoom in you see...

EC2 instance

Page 61: Erlang as a Cloud Citizen

Always the samekind of structure

A fractal approach to throughput

Page 62: Erlang as a Cloud Citizen

Fractal

“A cauliflower shows how an object can be made of many parts, each of which is like a whole, but smaller.”

Benoît B. Mandelbrot

http://www.flickr.com/photos/paulobrabo/3588387063

Page 63: Erlang as a Cloud Citizen

Homework

The exact same solution might not work for you...

but look for that unit of throughput

Page 64: Erlang as a Cloud Citizen

You needXXXX

Smallish instancesto serve 0.25

billions uncacheable reqs/day

Answer time

Page 65: Erlang as a Cloud Citizen

You need0XXX

Smallish instancesto serve 0.25

billions uncacheable reqs/day

Answer time

Page 66: Erlang as a Cloud Citizen

You need00XX

Smallish instancesto serve 0.25

billions uncacheable reqs/day

Answer time

Page 67: Erlang as a Cloud Citizen

You need0012

Smallish instancesto serve 0.25

billions uncacheable reqs/day

Answer time

Page 68: Erlang as a Cloud Citizen

What’s1200 ?

Page 69: Erlang as a Cloud Citizen

Thanks

Paolo Negri @hungryblank

http://www.wooga.com/jobs