scaling erlang cluster to 10,000 nodes · 2019-08-14 · scaling erlang cluster to 10,000 nodes....

33
Maxim Fedorov Software Engineer @ WhatsApp Scaling Erlang cluster to 10,000 nodes

Upload: others

Post on 19-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Maxim FedorovSoftware Engineer @ WhatsApp!

Scaling Erlang cluster to 10,000 nodes

Page 2: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

WhatsApp User Base GrowthFrom 200m to 1.5B and growing fast !

0

200

400

600

800

1000

1200

1400

1600

2013 2014 2015 2016 2017 2018

Page 3: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

WhatsApp Features DevelopmentNot just a simple messenger !

•  New platforms support!•  Voice and video calls!•  End-to-end encryption!•  WhatsApp Business!•  Live location !•  … and a few more!

Page 4: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Paradigm ShiftFew powerful servers to many tightly packed blades !

•  Dual Socket Xeon E5-2690 2.6-3.5 GHz!128 - 512 G RAM!

•  Xeon-D 1540 2.0 – 2.6 GHz 32 G RAM!•  Dual Skylake-X, 256 G RAM!

Page 5: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Foundation Replacement

WhatsApp!

Erlang R16

FreeBSD

IBM (SoftLayer)

WhatsApp!

Erlang R21

Linux

Facebook

Page 6: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

WhatsApp Cluster SizeFrom just a few to over 10,000 !

2013 2014 2015 2016 2017 2018

Page 7: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Erlang Cluster!

Page 8: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Erlang Cluster: Fully Connected Mesh

node 1

node 2

node 3node 4

node 5

•  1500 nodes in a single distribution cluster!

•  TCP connection maintenance overhead is hardly noticeable after a few tweaks!

•  Fast startup!

Discovery is a Problem!

Page 9: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Distributed Process RegistryA good problem to solve!

Coordinated•  global•  pg2•  gproc•  s_groups

Eventual•  Riak PG•  cpg•  Syn•  Swarm•  Lasp PG

Page 10: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Coordinated Approach (global)

node 1

node 2

node 3node 4

node 5

Registration

•  Lock boss node•  Lock the rest•  Register process•  Unlock all except boss•  Unlock boss node

Page 11: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Distributed Hash Tables

node 1

node 2

node 3node 4

node 5

Consistent Hashing CRDT

node 1

node 2

node 3

Kademlia

1 2 4 5 6

Page 12: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Keep It SimpleMore than one process registry!

•  Centralised store for high rate registrations -!session manager!

•  Globally replicated state for rare changes -!pg2!

Page 13: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Session ManagerCentral storage of phone-to-node mapping!

chat

chat 1

chat 2

chat 3

chat 4

chat 5

chat 6

chat 7

chat 8

chat 9

chat 10

session

session 1

session 2

session 3

session 4

Page 14: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

pg2 for Service Discovery

chat

chat 1

<...>

chat 8

session

session 1�{session, 1}�{session, 2}

session 2�{session, 3}�{session, 4}

Page 15: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

WhatsApp Meta-cluster!

Page 16: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Meta-clustering

chat

offline session contacts

notifications groups

Page 17: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Limits Are Still Therepg2 scaling is limited, but limits can be pushed further away!

•  Denormalise pg2_table for fast access to local and remote group members!

•  Apply ‘boss node’ algorithm to pg2!•  Add monitoring for local processes!•  … Introduce hidden (non-transitive) pg2 membership!•  Pushed from 32 partitions to hundreds!

Page 18: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

wandist: Extending Erlang DistributionConnecting disjoint Erlang clusters!

•  SSL support!•  SOCKS proxy support!•  Delivery confirmation!•  Standby connections!•  Maintain non-transitive pg2 lists!•  Compatibility (R16 <-> R21)!

chat�{session, 1}�{account, 1}�{group, 1}

session�{session, 1}

account�{account, 1}

group�{group, 1}

Page 19: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Challenges•  I/O scaling – going from kqueue to epoll!

Upgrade to Erlang R21!•  Routing performance – pg2 concurrent updates!

Reduce contention!•  Long-range communications – increased latency!

Test with injected latency!Absorb latency with increased concurrency!

•  SSL performance – handshake bottleneck!Reduce contention!

Page 20: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Bits & Bolts!

Page 21: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Diagnostic Tools•  Built-in inspection: process_info, statistics, system_info!•  MSACC – microstate accounting (with extra acc on)!•  Lock-counting BEAM!•  gdb (with etp-commands)!•  BPF/BCC!•  fprof, valgrind!•  Erlang OTP source code!!

Page 22: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Microstate Accounting

When things go wrong

Page 23: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Lock Counting

Page 24: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Lock Counting & Source Code

Meaningful lock name

Page 25: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Lock Counting

Name is already there!

Page 26: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

BCC (BPF Compiler Collection)

Trace large reads

Page 27: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

gdb + etp-commandsNot necessarily post-mortem

Page 28: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Erlang OTP is getting betterSome original WhatsApp patches are no longer in use!

•  GC Throttling -> Off-Heap message queue !•  prim_file patches -> built-in NIF-based file I/O !•  TLS 1.2 support (cipher suite selection)!•  HW-accelerated crypto!•  public_key: PKCS8 support, certificate verification, SNI!•  Hashing clashes (ETS to mnesia)!•  Bugfixes!!

Page 29: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

But Not There YetRecently added and reworked patches!

•  SSL/TLS handshake acceleration, PEM cache validation!•  inet_db: race condition during .hosts file reload !•  prim_inet: race/suboptimal accept() behaviour!•  prepend send (stuck worker detection, TTL)!•  flush process message queue!•  process message/signal queue stats!

Page 30: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

But Not There YetRecently added and reworked patches!

•  system monitoring (signal queues, rpc tracing)!•  httpc_client TLS upgrade timeout!•  wider lock tables (check IO, ETS meta)!•  worker pools, dispatcher pools!•  convenience patches (noisy logging suppression, default

eunit timeouts, listen backlog queues sizes, pretty printing, shell history, supervisor ‘ETS-TRANSFER’)!

Page 31: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Embrace Open Source community

•  Upstream our patches!•  Be Open!!

Page 32: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Questions?

Maxim Fedorovdane@ whatsapp.comGitHub: max-au

Page 33: Scaling Erlang cluster to 10,000 nodes · 2019-08-14 · Scaling Erlang cluster to 10,000 nodes. WhatsApp User Base Growth From 200m to 1.5B and growing fast ! 0 200 400 600 800 1000

Paradigm Shift

•  Per-node monitoring -> cluster health!•  Years of uptime -> simple & reliable restart!•  Trigger-based alerts -> level-based!•  Local configuration -> global!•  Aggregates and centralised logging!