austin cloud users group - august 23rd, 2011

Post on 20-Aug-2015

691 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

coppereggAustin CUG - August 23rd, 2011

(presented by Eric Anderson)anderson@copperegg.com

Wednesday, August 24, 11

About UsCopperEgg

• Founded spring 2010• Super real-time monitoring and analytics

About me (Eric Anderson)• SysAdmin - Centaur - 1999-2007

• 1400 compute nodes, ~50-100 file servers, ~200 misc systems, hundreds of TB’s

• Software Engineer - StorSpeed - 2007-2010• built distributed file system cache for NAS acceleration product

• Co-Founder/COO - CopperEgg - 2010-Present

2Wednesday, August 24, 11

Why Cloud?Important Differences:

• Installs in seconds – copy/paste install• No configuration required - anyone can do it

3

All reliable and business-worthy systems need something like this:

•Physical security•Redundant power•Redundant AC•Redundant & fast network•Peak hardware•Spare equipment•Physical space (storage of spare stuff too)•People to manage physical infrastructure•Hardware repairs

•Redundant infrastructure•Multi-AZ, Regions, storage, etc

•Resilient Applications•Designed for failure

•Performance measurement•Automatic failover/recovery•Security of your infrastructure•Monitoring - up/down/status•Visibility into system as a whole•Don’t rely on cloud vendor!•Delayed, inaccurate

Wednesday, August 24, 11

Why Cloud?Important Differences:

4

All reliable and business-worthy systems need something like this:

Physical

•Physical security•Redundant power•Redundant AC•Redundant & fast network•Peak hardware•Spare equipment•Physical space (storage of spare stuff too)•People to manage physical infrastructure•Hardware repairs

Cloud

•Redundant infrastructure•Multi-AZ, Regions, storage, etc

•Resilient Applications•Designed for failure

•Performance measurement•Automatic failover/recovery•Security of your infrastructure•Monitoring - up/down/status•Visibility into system as a whole•Don’t rely on cloud vendor!•Delayed, inaccurate

Wednesday, August 24, 11

Why Cloud? (for CopperEgg)Why did we go cloud?

• Needed to get building fast• We didn’t know what we needed• Just-in-time scaling• Keep costs low and still provide awesome service levels• Easy deployment for developers

• Test different scenarios, try new setups, etc

• We use it for everything!• code repositories, tickets, email, phone, alerting, etc

5Wednesday, August 24, 11

What we were buildingStorage analytics product

• visualize network attached storage in real-time• massive amounts of data

• analyzing 10 billion ops/day in beta, in real-time

• super real-time (seconds vs minutes)

Requirements:• highly available• super responsive• gobble large amounts of analytics data in real-time• historical data for 2 yrs• great UI

6Wednesday, August 24, 11

Where we started

Bad:• Outgrew it before we outgrew it• Slow!

So then what?

7

+ SimpleDB

Wednesday, August 24, 11

Amazon RDS to save the day!

Good:• Faster than SimpleDB• Could scale the storage

Bad:• Realized it still would not handle our dataset

• Inserts were too slow

So then what?

8

+ SimpleDB

+ RDS

Wednesday, August 24, 11

MySQL on EC2 to save the day!

Good:• Faster than RDS• Increased insert performance

• Using some cheats to get the insert rate up

Bad:• Still not good enough insert performance..

So then what?

9

+ SimpleDB

+ RDS

EC2 + MySQL

Wednesday, August 24, 11

MySQL on Rackspace Cloud

Good:• Faster than Amazon (CPU)• Seemed cheaper

Bad:• No easy way to scale across different zones or regions• No way to expand storage per instance (whole instance only - costly!)• Then we got the bill: they charge for data xfer between instances - OUCH

So then what?

10

+ SimpleDB

+ RDS

EC2 + MySQL+ MySQL

Wednesday, August 24, 11

Back to Amazon!

Why did we move back?• Lots of great services: S3, EC2, EBS, Route 53, ELB (we use all of these)• Even more: SQS, SES, etc• Multiple regions and availability zones• Scale-as-you-need: storage, memory, cpu, redundancy• Documentation

We’re still happy with this.. (9 months and running)

11

+ SimpleDB

+ RDS

EC2 + MySQL+ MySQL

EC2, EBS, MongoDB

Wednesday, August 24, 11

What’s this NoSQL thing?Realized maybe MySQL was not the best choice

• How about a NoSQL database?• So we tested and measured every one we thought was worth looking at:

• Redis• Tokyo Tyrant, Kyoto Cabinet• Cassandra• MongoDB• etc, etc, etc (there are a lot)

12Wednesday, August 24, 11

MongoDB wonMongoDB won the award - why?

• Redundant• Scalable• Persistent data-store• Handles large amounts of data• Awesome user community• Vendor support• Open source• Lots of momentum

13Wednesday, August 24, 11

Where are we now?Needed a way to monitor our site:

• Requirements:• Know right away when problems occur• See into the performance of the system• See historical trends as we grow the business• Super real-time product needs super real-time monitoring

• Not satisfied with existing solutions• slow updates (1m or 5m way to slow - not real-time)• not ‘cloud friendly’• pain to maintain• some are pricey

14Wednesday, August 24, 11

Not real-time?Then what *is* real-time?

• Smallest amount of time you can comfortably have poor service before someone notices and changes their behavior.

• Example:• Web site can only be slow/unavailable for a few seconds before people leave• Email can be slow for tens of seconds before people get grumpy (or less depending on

the people!)• Twitter - well, we’ll leave that one for you to decide

So, if seconds is the yardstick for measuring poor performance, why do we monitor every 1 or 5 minutes?

15Wednesday, August 24, 11

1

25

50

75

100

5:00 PM 5:05 PM

CPU Usage: 5min sampling

Here’s what a 5 minute sample provides• Doesn’t look like much is happening• Users should not be complaining right?

16Wednesday, August 24, 11

CPU Usage: 1min sampling

Same data - 1 minute sample• Looks like there was some kind of cpu activity at 5:01pm - 5:02pm

• Still no issue though - right?

17

0

25

50

75

100

5:00 PM 5:01 PM 5:02 PM 5:03 PM 5:04 PM 5:05 PM

Wednesday, August 24, 11

CPU Usage: 5 second sampling

Same data - 5s sampling• Becomes clear there was something happening:

• between 5:01:10pm - 5:01:25pm

18

0

25

50

75

100

5:00 PM 5:01 PM 5:02 PM 5:03 PM 5:04 PM 5:05 PM

Wednesday, August 24, 11

So we rolled our ownRevealCloud

• Turns out a lot of people agreed with us• Highlights:

• Built on our super real-time analytics engine• Updates in seconds vs minutes• Easy to install, no config required• Great looking and usable interface• Works anywhere - public/private cloud, vm, bare metal)

19Wednesday, August 24, 11

coppereggQuestions

Wednesday, August 24, 11

coppereggDemo

Wednesday, August 24, 11

Demo Screenshots

22Wednesday, August 24, 11

Demo Screenshots

23Wednesday, August 24, 11

Demo Screenshots

24Wednesday, August 24, 11

top related