high availability clouds-cloud computing expo

43
Cloud Computing Expo 2009 High Availability Clouds “Moving mission critical applications to the cloud.” Jeremy Hitchcock, CEO Dynamic Network Services

Upload: dyn

Post on 22-Apr-2015

5.778 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

High Availability Clouds“Moving mission critical applications to the cloud.”

Jeremy Hitchcock, CEODynamic Network Services

Page 2: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Who cares? Why Relevant?

• Enterprises and service providers: “now what”?• Desire to move business or mission critical apps– That’s most of them

• Clouds have an “unstable” feel

Page 3: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Who cares? Why Relevant?

• Still, benefits to virtualizing computing resources• Most don’t care about raw hardware• Becoming more software/resource integrators– Less concerned with software/hardware integration

• Better use of hardware resources– Most systems are pretty idle all the time

• Hardware is getting expensive (well, power is)

Page 4: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Where are Clouds?You Are

Here

Page 5: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Where we are going (or like to be)

• Cloud adoption going to be like this?– Limited to spiky demand or distributed processing

• Will more services move to cloud environments?• Even between clouds and traditional hosting?• No hardware?– Someone has to worry about infrastructure though

Page 6: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Background on me

• Internet infrastructure: DNS for other people– DynDNS.com, Dynect Platform

• Do traffic management, dynamic "routing" for clouds• Work with a lot of cloud providers to get domain.com

to node-19334 but not node-49291• Background in networking, software engineering• Use all unmanaged hosting (but do have a VPS

offering for consumer (it was a dev project))

Page 7: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Terms

• Unmanaged hosting – corporate/outsourced datacenter, your own everything

• Managed hosting – Hardware is provided with ping port and power

• Cloud hosting – Using virtual resources to accomplish the same as the above two items

Page 8: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Goals with High Availability

• Availability: Users do not see outages• Scaling: Not impossible or easy– Does not mean more resources available– Important when you think “on demand”

• Efficient use of resources (more on that)• Institutionalized operations practices– Monitoring, security regimes

Page 9: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

High Available What?

• Well, anything?• Applications• File systems• CPU, I/O, and network– I/O is both storage space and retrieval

Page 10: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

HA Availability

Page 11: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Early Days of Hosting

• Been here before: mainframes to 1U servers• Copy over redundancy in larger systems– “That’s how larger systems were so accessible”

• Expensive 1Us lead to commodity hardware• “We just take our application and move it over here”• And that was when things took a turn…

Page 12: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Page 13: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Ouch!

• Lots of cheap hardware, gained efficiency– Most of the time anyway

• Applications were not available– Up and down all of time

• DB admins, network admins, system admins all pointing fingers

Page 14: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Ouch!

• Needed more 1Us to do the job• 1U equipment quality was not as good• More people, more operations issues• Security concerns, DB admins having system access• Failures and scaling became a problem until…

Page 15: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Ah Ha Moment!

It’s ok if a 1U fails. It happens all the time!

Page 16: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Ah Ha Moment!

• Make the system more redundant, fault-tolerant• Break apart units to create working spaces– N+1 redundancy, whatever your risk tolerance is

• Specialized hardware to maintain efficiency• Monitor the units of work– Ping, port, power separately

Page 17: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Ah Ha Moment!

• Separate DB/app/file into clusters– That makes scaling and failover easy

• Filiers for DB and large scale storage• Demand SLAs for network transit• Get the NOC to work on cross system outages

Page 18: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Still Some Lingering Issues

• Architectures grew to match applications– Tightly coupled, is that good?– Makes it hard to move around– Specialized hardware pieces

• Do you look like Flickr?– If you do, their hosting platform will work for your app

Page 19: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Page 20: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Still Some Lingering Issues

• Systems are more complicated– Yahoo 9/11 Memorial site cascade failures

• Fix was a load balancer/DNS tweak

• Lots of “glue” to make sure everything works• Each architecture is [slightly] different

Page 21: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Finally: Some Lingering Issues

• Therefore:– Failures, if an application is in shards, works– Scaling is application specific, different bottlenecks– Reasonable efficiency, limited specialized hardware– More people to maintain “the system” but secure

Page 22: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Now Onto Clouds…

• Promise:– On demand resources (true if you can use it)– Greater computer efficiency (all costs are internalized)– More flexibility for development and peak usage– Greater availability

• Reality:– Your responsibility to throw in more hardware– Trade specialization for generalization (bottlenecks)– Limited by tools provided and consumed– Maybe

Page 23: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Availability

Page 24: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Availability is Defined by Outages

Page 25: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Amazon/Cloud Outages?

• Not clear:– “There was this one in July 2008”– “Some DNS issues yesterday”

• How often? How regular?• Out of 500,000 harddrives, x will fail in 3.243 years• Out of 1 cloud provider? (or maybe 5)– We don’t know.

Page 26: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Cloud Realities

• “Best effort” to provide services• Ever ask for an SLA?– I’m sure it’s coming but not soon enough for some

• Remember, Amazon is providing a service– Unmanaged environment

• Relax, that’s the Internet, we’ll figure it out

Page 27: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Cloud Realities

• No physical access to systems• No guarantee for systems to be available• No guarantee that new systems to be available• No continuity guarantee– Great performance one moment, maybe not the next– Shared resources

• Everything is local, security is a lot different

Page 28: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

But Clouds are Virtualized 1Us!

• Well, they are, but not really• Used to be:– Ping, port, power – raw access– Hybrids: corporate datacenter, managed, unmanaged

• Now:– Ping, port, power, file I/O– virtual access

• Outsourcing network, hardware, and OS

Page 29: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Why is it different

• Hardware becomes a service– Depending on the application, that may matter

• More vendors in the mix– Network, hardware, OS much more packaged

• Simpler presentation but complicated behind the scenes• Library issues, security issues, OS upgrades?

Page 30: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Availability

• Goal: Eliminate single points of failure– Clouds are consolidations of services– Solution is to split it apart

• Achieve true diversity– Business continuity diversity– Geographic diversity– Network diversity– OS diversity

• More layers make interactions hard to predict

Page 31: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Eliminate Points of Failure

• Cloud diversity• Cloud outages are typically binary• Interoperability needed to make it easier– That will come in several ways

Page 32: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Failover Events

• Failure events happen (more frequently in clouds?)• Trick is detecting and redirecting– "Once is a mistake, twice is jazz” – Miles Davis

• Needs to be seamless and automatic• Good provisioning and monitoring in place– Server builds, revisioning, server configurations– Everything more modular

Page 33: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Scaling

• Go from 1 to 2 to 4 to 10,000 units• Split apart work units• Have to do it sooner than later• More sharding, less efficient• Not all units are going to be equal nor constant

Page 34: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Provisioning

• Everything needs to be automatic (or at least close)• As you grow, this hurts more and more• Provisioning means lab, dev, and production• This becomes a critical system– Monitoring and backups should work with provisioning

Page 35: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Hardware Considerations

• Hardware optimized software packages may change• Security patches– Default images v. custom images

• Physical access not granted to you but others– Physical access means all access– Encrypted data on disc– Less recovery options

• Do you really have access to your data?– See backups

Page 36: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Host Issues

• Host system security vulnerabilities• Everything is local– VLANing becoming more available

• Underlying systems need maintenance– Live migrations

Page 37: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Monitoring

• System related outages because units will fail• Normal tools are based on physical limitations• Cloud environments not always clear where the

failure is• Test from the last mile• Performance testing important too• System testing and transactions• May not pinpoint problems but it does send pages

Page 38: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Backups

• Incremental backups much more important• Backup within the same cloud? – Probably not, but where?

• Data files, application files, configuration files– Version everything– Document how they all go together

• But you already do that so it’s ok

Page 39: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Migrations

• Be able to take your data (server image)– Server import and export

• Live migration, underlying software provides it• This is all interoperability needs

Page 40: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Disaster Planning

• When things go really wrong:– Need to communicate using other means

• Social networking like Twitter (are they affected as well?)– Have a plan B, diversity of cloud providers– Seek SLAs?

Page 41: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Some Things External

• DNS– Point domain.com to your plan B

• Backups and files– When you want to publish content at plan B

• Customer communications– Tell customers and users what’s going on

• Last-mile monitoring– Everything might look ok in the cloud

• Want options if there is an outage

Page 42: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Key Points

• Clouds are great for applications, even mission critical ones

• Best practices for server farms aren’t always best practices for clouds

• Need to rely on software to make hardware assumptions work right

• Constant trade off of cost and availability, what’s the risk tolerance

Page 43: High Availability Clouds-Cloud Computing Expo

Cloud Computing Expo

2009

Questions

Jeremy [email protected]://dyn-inc.com/