uc berkeley 1 beyond the hype: a berkeley view of cloud computing anthony d. joseph*, uc berkeley...

31
UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31 May 2010 Image: John Curley http://www.flickr.com/photos/jay_que/1834540/ *Director, Intel Labs Berkeley ://abovetheclouds.cs.berkeley.edu/

Upload: lorraine-fisher

Post on 25-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

UC Berkeley

1

Beyond the Hype:A Berkeley View of Cloud Computing

Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems Lab

TNC 201031 May 2010

Image: John Curley http://www.flickr.com/photos/jay_que/1834540/

*Director, Intel Labs Berkeleyhttp://abovetheclouds.cs.berkeley.edu/

Page 2: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

2

Outline

• What is it? • Why now?

– Cloud economics and opportunities• Challenges

Page 3: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

3

Datacenter is new “server”

• “Program” == Web search, email, map/GIS, …• “Computer” == 1000’s computers, storage, network• Warehouse-sized facilities and workloads• New datacenter ideas (2007-2008): truck container (Sun),

floating (Google), datacenter-in-a-tent (Microsoft)• How to enable innovation in new services without first

building & capitalizing a large company?

photos: Sun Microsystems & datacenterknowledge.com

Page 4: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

RAD Lab 5-year Mission

Enable 1 person to develop, deploy, operate next -generation Internet application

• Key enabling technology: Statistical machine learning– debugging, power management, performance prediction, ...

• Highly interdisciplinary faculty & students– PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine

learning), Joseph (systems/networks/security), Stoica (networks/P2P), Franklin (databases)

– 2 postdocs, ~30 PhD students, ~5 undergrads

4

Page 5: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Examples

• Predict performance of complex software system when demand is scaled up

• Automatically add/drop servers to fit demand, without violating SLO

• Distill millions of lines of log messages into an operator-friendly “decision tree” that pinpoints “unusual” incidents/conditions

• Recurring themes: – Cutting-edge SML methods work where simpler methods

have failed– Demonstrate applicability on at least 1000’s of machines!

5

Page 6: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Utility Computing Arrives• Amazon Elastic Compute Cloud (EC2)• “Compute unit” rental: $0.10-0.80 0.085-0.68/hour

– 1 CU ≈ 1.0-1.2 GHz 2007 AMD Opteron/Intel Xeon core

• No up-front cost, no contract, no minimum• Billing rounded to nearest hour (also regional, spot pricing)• New paradigm(!) for deploying services?, HPC?

Platform Units Memory Disk

Small - $0.10 $.085/hour 32-bit 1 1.7GB 160GB

Large - $0.40 $0.35/hour 64-bit 4 7.5GB 850GB – 2 spindles

X Large - $0.80 $0.68/hour 64-bit 8 15GB 1690GB – 4 spindles

High CPU Med - $0.20 $0.17 64-bit 5 1.7GB 350GB

High CPU Large - $0.80 $0.68 64-bit 20 7GB 1690GB

High Mem X Large - $0.50 64-bit 6.5 17.1GB 1690GB

High Mem XXL - $1.20 64-bit 13 34.2GB 1690GB

High Mem XXXL - $2.40 64-bit 26 68.4GB 1690GBNorthern VA cluster

Page 7: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

• Major enabler for SW as a Service (SaaS) startups– Animoto traffic doubled every 12 hours for 3 days when

released as Facebook plug-in in April 2008

• Peak EC2 instances: – Mon 50, Tues 400, Wed 900, Friday 3400

Cloud as Major Enabler

Page 8: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

But...

What is cloud computing, exactly?

8

Page 9: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

“It’s nothing new”

“...we’ve redefined Cloud Computing to include everything that we already do... I don’t understand what we would do differently ... other than change the wording of some of our ads.”

Larry Ellison, CEO, Oracle (Wall Street Journal, Sept. 26, 2008)

9

Page 10: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

“It’s a trap”

“It’s worse than stupidity: it’s marketing hype. Somebody is saying this is inevitable—and whenever you hear that, it’s very likely to be a set of businesses campaigning to make it true.”

Richard Stallman, Founder, Free Software Foundation (The Guardian, Sept. 29, 2008)

10

Page 11: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Above the Clouds:A Berkeley View of Cloud Computing

abovetheclouds.cs.berkeley.edu• White paper by RAD Lab PI’s and students

– Clarify terminology around Cloud Computing– Quantify comparison with conventional computing– Identify Cloud Computing challenges & opportunities

• Why can we offer new perspective?– Strong engagement with industry– Users of cloud computing in our own research and

teaching in last 3 years

• Goal: stimulate discussion on what’s really new– without resorting to weather analogies ad nauseam

11

Page 12: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Cloud Computing: True UtilityCloud Computing: App and Infrastructure over Internet

Software as a Service: Applications over the Internet

Utility Computing: “Pay-as-You-Go” Datacenter Hardware and Software

Three New Aspects to Cloud Computing

The Illusion of Infinite Computing Resources Available on Demand

The Elimination of an Upfront Commitment by Cloud Users

The Ability to Pay for Use of Computing Resources on a Short-Term Basis as Needed

Page 13: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Classifying Clouds

App Model for Utility ComputingSomething

New???

???

???

Amazon EC2

Close to Physical Hardware

User Controls Most of Stack

Hard to Auto Scale and Failover

Windows Azure

.NET and CLR… ASP.NET Support

More Constraints on User Stack

Auto Provisioning of Stateless App

Google AppEngineApp Specific Traditional

Web App Model

Constrained Stateless/Stateful Tiers

Auto Scaling and Auto High-Availability

Constraints on App Model Offer Tradeoffs… Lots of Ongoing Innovation…

Lower-level,Less managed“flexibility/portability”

Higher-level,More managed

“more built-in functionality”

• Instruction Set VM (Amazon EC2, 3Tera)• Managed runtime VM (Microsoft Azure)• Framework VM (Google AppEngine, Force.com)

Page 14: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

14

Outline

• What is it? • Why now?

– Cloud economics and opportunities• Challenges

Page 15: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Why Now (not then)?

15

Economies of Scale for Humongous Datacenters(1,000’s to 10,000’s of commodity computers)

ElectricityPut Datacenters at Cheap Power

NetworkPut Datacenters on Main Trunks

OperationsStandardize and Automate Ops

HardwareContainerized

Low-Cost Servers

Technology Cost in Medium-Sized DC

Cost in Very Large DC Ratio

Network $95 per Mbit/sec/month $13 per Mbit/sec/Month 7.1

Storage $2.20 per GByte/month $0.40 per Gbyte/month 5.7

Administration ≈ 140 Servers / Administrator

> 1000 Servers / Administrator

7.1

James Hamilton, Internet Scale Service Efficiency, Large-Scale Distributed Systems and Middleware (LADIS) Workshop Sept‘08

Huge DCs 5-7X as Cost Effective as Medium-Scale DCs

Page 16: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

16

Why Now (not then)?

• Common HW & SW platform– x86 as universal ISA, plus fast virtualization – Standard software stack, largely open source (LAMP)– Bet: Can statistically multiplex multiple instances onto

a single box without interference between instances

• Novel economic model: fine grain billing– Earlier examples: Sun, Intel Computing Services—

longer commitment, more $$$/hour

• Infrastructure software: eg Google FileSystem• Operational expertise: failover, DDoS, firewalls...• More pervasive broadband Internet

Page 17: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Unused resources

Cloud Economics 101

• Static provisioning for peak: wasteful, but necessary for SLO

“Statically provisioned” data center

“Virtual” data center in the cloud

Demand

Capacity

Time

Re

sou

rce

s

Demand

Capacity

TimeR

eso

urc

es

17

Risk of underutilization if peak predictions are too

optimistic – Wasted CapEx

Page 18: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Risks of underprovisioning

Lost revenue

Lost users

Re

sou

rce

s

Demand

Capacity

Time (days)1 2 3

Re

sou

rce

s

Demand

Capacity

Time (days)1 2 3

Re

sou

rce

s

Demand

Capacity

Time (days)1 2 3

18

Page 19: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

New Scenarios Enabled by “Risk Transfer”

• Not (just) CapEx vs. OpEx!

• “Cost associativity”: 1,000 computers for 1 hour same price as 1 computer for 1,000 hours– Washington Post converted Hillary Clinton’s travel

documents to post on WWW <1 day after released– RAD Lab graduate students demonstrate improved

Hadoop (batch job) scheduler—on 1,000 servers

19

Page 20: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

20

Outline

• What is it? • Why now?

– Cloud economics and opportunities• Challenges

Page 21: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Obstacles and OpportunitiesObstacle Opportunity

1 Availability of Service Use Multiple Cloud Providers; Use Elasticity to Prevent DDOS

2 Data Lock-In Standardized APIs; Compatible Software to Enable Surge Computing

3 Data Confidentiality and Auditability Deploy Encryption, VLANs, Firewalls; Geographical Data Storage

4 Data Transfer Bottlenecks FedExing Disks; Data Backup/Archival; Higher Bandwidth Switches

5 Performance Unpredictability Improved VM Support; Flash Memory; Gang Scheduling VMs

6 Scalable Storage Invent Scalable Store

7 Bugs in Large Distributed Systems Invent Debugger that Relies on Dist VMs

8 Scaling quickly Auto-Scaler; Snaphots for Conservation

9 Reputation Fate Sharing Reputation Guarding Services

10 Software Licensing Pay-for-Use Licenses; Bulk Use Sales

Open source reimplementations of Google AppEngine (AppScale), EC2 API (Eucalyptus), BigTable

(HyperTable)

Freedom OSS partnership with Amazon to allow FedEx-ing disks into their datacenters, Amazon hosting free public datasets to “attract” cycles

2/11/09: IBM WebSphere™ and other service-delivery software available on AWS with pay-as-you-go pricing

Page 22: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Cloud or Earthbound: “Should I Move to the Cloud?”

• Some app types made more compelling– Surge computing: overflow into the cloud– Extend desktop apps into cloud: Matlab,

Mathematica; soon productivity apps?– Batch processing to exploit cost associativity, e.g. for

business analytics

• Other apps more challenged– Bulk data movement expensive, slow– Jitter-sensitive apps (long-haul latency & transient

performance distortion due to virtualization)

• What about HPC/Grid apps?22

Page 23: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

2008 Observed EC2 Topology

Caveat: only shows routers, not switches

Page 24: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

EC2 Large Instance Network Performance

• Measurement:– Amount of data

transferred between 2 instances in 10 seconds

• Network quirks– 96% RTT < 1ms– Occasional route

changes lead to weird paths and increased latency

Page 25: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Effects on MPI Performance

• Many research opportunities to address issues– Collocated rack-level allocation schemes– Gang scheduling for clouds/virtual clouds– New network topologies/implementations

25[Walker, :login 2008-10]

Page 26: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Summary

• Many Cloud Computing Benefits:– Shift CapEx to OpEx , Scale OpEx to demand– Startups and prototyping, One-off tasks (Wash. Post)– Cost associativity– Research at scale

• Many Cloud Computing Challenges:– Availability – Data in cloud may be a “gravity well” ($$$ to move)

• Not ready for HPC applications yet– Opportunities to address performance issues

• More: http://abovetheclouds.cs.berkeley.edu/26

Page 27: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

UC Berkeley

Thank you!

[email protected]

http://abovetheclouds.cs.berkeley.edu/

27

Page 28: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

BACKUP SLIDES

28

Page 29: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Performance for Money Spent on EC2

• LINPACK cost effectiveness on EC2

29[Napper and Bientinesi]

Page 30: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Heterogeneity in Virtualized Environments

• VM technology isolates CPU and memory, but disk and network are shared– Full bandwidth when no contention– Equal shares when there is contention

• 2.5x performance difference

1 2 3 4 5 6 70

10

20

30

40

50

60

70

VMs on Physical Host

IO P

erf

orm

an

ce

pe

r V

M (

MB

/s)

EC2 small

instances

Page 31: UC Berkeley 1 Beyond the Hype: A Berkeley View of Cloud Computing Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab TNC 2010 31

Power and Cooling Is Expensive!

Belady, C., “In the Data Center, Power and Cooling Costs More than IT Equipment it Supports”, Electronics Cooling Magazine (Feb 2007)

The Infrastructure for Power and Cooling Costs a LOT

Infrastructure PLUS Energy > Server Cost Since 2001

Infrastructure Alone > Server Cost Since 2004

Energy Alone > Server Cost Since 2008

Cost Effective to Discard Inefficient Servers

Power Savings Infrastructure Savings!

Like Airlines Retiring Fuel-Guzzling Airplanes

Willing to pay more $/server for more power efficient servers