Download - Making clouds go faster, for fun and profit!
This slide intentionally left blank.
Wednesday, 17 October 12
MAKING CLOUDS GO FASTERFOR FUN AND PROFIT
2
Wednesday, 17 October 12
3
Wednesday, 17 October 12
SpeakersWho crafted this talk?
4
Wednesday, 17 October 12
Alex Howells@nixgeek
Technical OperationsLivingSocial
[email protected]://github.com/agh
5
Wednesday, 17 October 12
Paul Thomas@ftergl0w
Technical OperationsLivingSocial
[email protected]://github.com/AfterGlow
6
Wednesday, 17 October 12
Bedtime ReadingYou can get a copy of these slides after the talk -
https://speakerdeck.com/u/nixgeek
Wednesday, 17 October 12
Problem?8
Wednesday, 17 October 12
PerformanceIt doesn’t need to be rocket science.
It does matter though!
I promise I’m not trolling you.
9
Wednesday, 17 October 12
“Oh man, that was too fast!It’s so much betternow it’s slow!!”
-- Average User
In a parallel universe...
10
Wednesday, 17 October 12
YEAH RIGHTI wish I had users who were that easy to please!
But since we live in the real world...
11
Wednesday, 17 October 12
“Why is that dude smiling?!This is too slow!
Why can’t it be faster?”
-- Average Users
In our universe...
12
Wednesday, 17 October 12
THINGS ARE IMPROVINGCactus => Diablo => Essex => Folsom
13
But things can improve faster with focus!
Wednesday, 17 October 12
Today
Mostly reliable,but can be a bit slow!
14
Wednesday, 17 October 12
The Future?
Faster. More scalable.A real driving experience.
15
Wednesday, 17 October 12
Why should I listen to you?
What’s the big deal?16
Wednesday, 17 October 12
WE’RE A LOT LIKE YOU!Developers. Operators. Engineers. Users.We see potential. We see opportunities.
17
Wednesday, 17 October 12
18
Wednesday, 17 October 12
AirspaceLivingSocial PaaS
We care about speed because ...
19
* Scaling services up/down needs to happen fast! * Needing to maintain huge pools of “slack capacity” to account for sudden spikes in traffic sucks. * Upgrading applications should be fast.
What does fast mean to us? One example?
New instances online in under 10 seconds.
Wednesday, 17 October 12
Performance Matters
20
What could your business do if instances came online in under 5 seconds vs. 50 seconds?
> Makes integration tests leveraging the Cloud complete much faster. > Seasonal spikes? React to them faster - happier customers spend more money. > Engineers who don’t grumble that “getting servers is a pain in the ass”. > Deploy new applications and services more quickly and easily.
Along with many other things ...
Wednesday, 17 October 12
What do we do?
21
Wednesday, 17 October 12
Think Positive22
Because solutions are better than problems!
Wednesday, 17 October 12
23
Wednesday, 17 October 12
Two-ProngedApproach
Hardware & Software“A Love Story”
24
Wednesday, 17 October 12
Warning!
Picking the right hardware is quite hard.It’s often individual to your users needs.
What works for us may not rock your world.
25
Wednesday, 17 October 12
Hardware26
Wednesday, 17 October 12
Our Servers
27
Supermicro 1027R-WRFT+2x Intel Xeon E5-2670 (8C/16T 2.60GHz)16 x 8GB 1600MHz ECC MemoryLSI 9266-8i (1-LD RAID-10)8 x Intel 520-series 240GB SSDDual-Port Intel X540 10GBASE-T
Wednesday, 17 October 12
Benefits
28
* ‘Just right’ balance of CPU/RAM for us.
* Exceptional ephemeral I/O performance > Not using eMLC - trade off? > We can think about SQL on IaaS
* A surplus of network bandwidth
Servers are not a bottleneck!
Wednesday, 17 October 12
Our Network
29
Top of Rack -Arista Networks 7050T48-port 10GBASE-T Switch+ 4-port 40GbE (uplinks)
Zone Spine -Arista Networks 7050Q16-port 40GbE Switch
Wednesday, 17 October 12
Benefits
30
* A network which runs Linux!* Ability to automate it via ZTP and Chef
* Non-blocking communication in a rack.* Provision 160Gbps to spine via four cables.* Under 2:1 contention for comms in/out of rack.
* Less need to think about QoS!
Network is not a bottleneck!
Wednesday, 17 October 12
Software31
Wednesday, 17 October 12
Production
32
Ubuntu 12.04 LTS (‘Precise Pangolin’)Hypervisor -- KVM
CloudScaling OCS 1.3 .. based off OpenStack Essex ..
Moving to OCS 2.0 in near future... .. that one is OpenStack Folsom ..
Wednesday, 17 October 12
33
Ubuntu 12.04 LTS (‘Precise Pangolin’)Hypervisor -- KVM
Useful for development and testing .. we’re running OpenStack Folsom now ..
Most of the data shown later was grabbedwith help from DevStack running on similarhardware to our production environment.
Wednesday, 17 October 12
34
WHAT NOW?We’ve picked the hardware stack. It’s awesome.
We’ve got our software installed. It’s looking great.
Wednesday, 17 October 12
Support calls are imprecise. We need data!
Monitoring35
Wednesday, 17 October 12
Old School* Is my service (API) responding on TCP/8774?* Am I able to make a GET and fetch instance info?* Is my server running all the processes it should?* Are there any errors on my network ports?
If any of this looks broken,send me alerts saying so!
Wednesday, 17 October 12
New Thinking
* “How long did my website take to show?”* Individual performance of each click or API call* Inspection of latency within the application
If lots of users interactions are slow,then I want you to alert me.
If its just an outlier - log it and shut up.
“End-User Experience Monitoring”
Wednesday, 17 October 12
DEMO TIME!Because pretty pictures are awesome.
We’ll call the slowest transactions our “Disaster Porn”.
38
Wednesday, 17 October 12
Boundary
39
“AppViz”
* Port-to-port throughput/latency* How much SQL traffic are you doing?
Updates in real-time.Look backwards in time.
Powered by IPFIX (RFC 5101)
Wednesday, 17 October 12
Tracelytics
40
Lots more cool stuff to help ...We’ll blitz through a few more things next ...
Latency Trends* Over the last 60 minutes* Over the last 24 hours* Over the last 7 days
Top Tip: This is bad news.
Wednesday, 17 October 12
TracelyticsPatches
41
If you want to try out OpenStack APM -https://github.com/Afterglow/tracelytics-openstack
Any questions? Just open an issue!
Wednesday, 17 October 12
Glance
Wednesday, 17 October 12
Keystone
Wednesday, 17 October 12
Nova
Wednesday, 17 October 12
Nova
Wednesday, 17 October 12
Nova
Wednesday, 17 October 12
Nova
Wednesday, 17 October 12
“Call to Arms”
48
Reminder about those patches -https://github.com/Afterglow/tracelytics-openstack
> Performance regression tests as an OpenStack CI gate?> More people talking about “How I fixed those >5 second outliers!”> Better ‘shared knowledge’ about what settings to tweak for added oomph> Architectural analysis asking about “big picture” (big impact) changes
Wednesday, 17 October 12
CreditsBecause these folks are awesome
49
N.B. Not intended as an exhaustive list of all the awesome people in the world/room!
Wednesday, 17 October 12
http://www.livingsocial.com
Credits
50
Wednesday, 17 October 12
http://www.cloudscaling.com
Credits
51
Wednesday, 17 October 12
http://www.aristanetworks.com
Credits
52
Wednesday, 17 October 12
http://www.tracelytics.com
Credits
53
Wednesday, 17 October 12
We’re done talking,thanks for listening!
Any questions?
54
Wednesday, 17 October 12
Interested?E-mail Ken -
Or just find me!
Reminder that these slides are over at -https://speakerdeck.com/u/nixgeek
Wednesday, 17 October 12