scaling the britain's got talent buzzer
DESCRIPTION
How Live Talkback scaled the Britain's Got Talent buzzer to support 50,000 requests/secondTRANSCRIPT
![Page 1: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/1.jpg)
Powering the Britain’s Got Talent buzzer*
*And Big Data
Big Data Meetup, London 25/5/2011
1
1Thursday, 26 May 2011
![Page 2: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/2.jpg)
2
What we do
2Thursday, 26 May 2011
![Page 3: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/3.jpg)
Me
Malcolm Box, Co-founder & CTO
@malcolmbox
3
3Thursday, 26 May 2011
![Page 4: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/4.jpg)
The Buzzer
4
BIG DATA
4Thursday, 26 May 2011
![Page 5: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/5.jpg)
The challenge
10 Million+ viewers
Design goal of 50,000 requests/s, 10,000 buzzes/second
Equivalent to 130 Billion requests/month
But just on Saturday night
And four weeks to build
5
5Thursday, 26 May 2011
![Page 6: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/6.jpg)
The challenge
6
Source: http://www.google.com/adplanner/static/top1000/#
Where does 130 Billion requests fit?
6Thursday, 26 May 2011
![Page 7: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/7.jpg)
Where we started....
7
ELB
WebserverDjangoUbuntu
WebserverDjangoUbuntu
MySQL
app.livetalkback.com
Zabbix
Control plane
S3
CloudFront
cdn.livetalkback.com
7Thursday, 26 May 2011
![Page 8: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/8.jpg)
Step 1: Testing
Started with a platform with a previous peak of 100 requests/s
No idea where it would break
Tsung! http://tsung.erlang-projects.org/
8
8Thursday, 26 May 2011
![Page 9: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/9.jpg)
Step 2: ELB
Amazon Elastic Load Balancer
“Infinite capacity”
BUT very long impulse response and NO controls :(
HAProxy to the rescue
5K requests/s per node
9
9Thursday, 26 May 2011
![Page 10: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/10.jpg)
Step 3: Avoid the DB
MySQL was never going to be able to handle 10,000 writes/s, nor 50,000 reads
“Hey, Django does memcached. Problem solved”
Help, our memcached server I/O is maxed out :(
Two-layer cache: https://gist.github.com/953524
Write-behind data
10
10Thursday, 26 May 2011
![Page 11: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/11.jpg)
But we want analytics!
Now 10K things to write to disk every second
Logging? Database?
This is starting to look like BIG DATA
11
11Thursday, 26 May 2011
![Page 12: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/12.jpg)
Step 4: Baby
12
12Thursday, 26 May 2011
![Page 13: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/13.jpg)
Step 5: Cassandra
Deployed Cassandra cluster on EC2 to handle buzz records
Tested to > 10K writes/s
All good!
“So how many users did we have last night?”
13
13Thursday, 26 May 2011
![Page 14: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/14.jpg)
Where we ended...
14
HAProxy HAProxy
WebserverDjangoUbuntu
WebserverDjangoUbuntu
Memcached CassandraRDS Master
app.livetalkback.com
Chef
Zabbix
Control plane
CassandraMemcached S3
CloudFront
cdn.livetalkback.com10
nodes
100+ nodes
14Thursday, 26 May 2011
![Page 15: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/15.jpg)
Scaling up - and down
Configuring 100+ servers by hand each week would have been a pain
Used to Chef to automate
Also builds the test swarm
http://wiki.opscode.com/display/chef/Home
15
15Thursday, 26 May 2011
![Page 16: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/16.jpg)
Now what?
Still challenges with analytics & ad-hoc queries
Looking at Brisk and Hadoop
We’re sucking the Twitter firehose for Tellybug
MySQL is coping so far, but only just
16
16Thursday, 26 May 2011
![Page 17: Scaling the Britain's Got Talent Buzzer](https://reader033.vdocuments.net/reader033/viewer/2022051513/5478e799b4af9fd3158b467d/html5/thumbnails/17.jpg)
Questions?
@malcolmbox
17
17Thursday, 26 May 2011