basics of cloud computing - kursused · basics of cloud computing ... •introduction to...

35
Basics of Cloud Computing MTAT.08.027 Basics of Cloud Computing (3 ECTS) MTAT.08.027 Basics of Cloud Computing (3 ECTS) MTAT.08.011 Basics of Grid and Cloud Computing Satish Srirama [email protected]

Upload: trinhthu

Post on 30-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Basics of Cloud Computing

MTAT.08.027 Basics of Cloud Computing (3 ECTS) MTAT.08.027 Basics of Cloud Computing (3 ECTS)

MTAT.08.011 Basics of Grid and Cloud Computing

Satish [email protected]

Course Purpose

• Introduce cloud computing concepts

• Introduce cloud providers

• Introduction to distributed computing

algorithms like MapReducealgorithms like MapReduce

• Glance of research at Mobile Cloud Lab in

cloud computing domain

• http://courses.cs.ut.ee/2013/cloud/

4/3/2013 Satish Srirama 2/33

Questions

• Is everyone comfortable with data structures?

• How comfortable you are with algorithms?

• How comfortable you are with programming?

– Java ? – Java ?

• External APIs?

– Python – I assume you are

– Web programming

• Were you able to submit all exercises in Grid part?

4/3/2013 Satish Srirama 3/33

Outline

• Cloud computing

• Cloud providers

• MapReduce

• MapReduce in different domains• MapReduce in different domains

• Large scale data processing on cloud

4/3/2013 Satish Srirama 4/33

Grading

• Written exam – 50%

• Labs – 45%

– 6 lab exercises

• Active participation in the lectures (Max 5%)• Active participation in the lectures (Max 5%)

• To pass the course

– You need to score at least 50% in each of the subsections

– You need to score at least 50% in the total

4/3/2013 Satish Srirama 5/33

Course schedule

• 03.04 Basics of Cloud computing

• 10.04 Cloud Providers

• 17.04 MapReduce

• 24.04 MapReduce algorithms• 24.04 MapReduce algorithms

• 01.05 No lecture – May day

• 08.05 MapReduce in Information Retrieval

• 15.05 Cloud scale distributed data storage

• 22.05 No lecture

– I will be on a project meeting in Paris

4/3/2013 Satish Srirama 6/33

Course schedule - continued

• Labs

03-09.04 Starting with a cloud

10-16.04 Working with SciCloud

17-23.04 MapReduce - Basics17-23.04 MapReduce - Basics

24-30.04 Data analysis with MapReduce

08-14.05 MapReduce in information retrieval

15-21.05 NoSQL

4/3/2013 Satish Srirama 7/33

Course schedule - continued

• 28.05 - Examination 1

• 29.05 - Examination 2

• Students who are defending their theses

should pass the exam in the first attemptshould pass the exam in the first attempt

• Examination for second attempt students 12th

June

4/3/2013 Satish Srirama 8/33

CLOUD COMPUTING

Lecture 1

4/3/2013 Satish Srirama 9

WHAT IS CLOUD COMPUTING?

“It’s nothing new”“...we’ve redefined Cloud Computing to include everything that we already do... I don’t understand what we would do differently ... other than change the wording of some of our ads.”

“It’s a trap”“It’s worse than stupidity: it’s marketing hype. Somebody is saying this is inevitable—and whenever you hear that, it’s very likely to be a set of businesses campaigning to make it true.”

WHAT IS CLOUD COMPUTING?

No consistent answer!

Everyone thinks it is something else…

the wording of some of our ads.”

Larry Ellison, CEO, Oracle (Wall

Street Journal, Sept. 26, 2008)

campaigning to make it true.”

Richard Stallman, Founder, Free Software Foundation (The Guardian, Sept. 29, 2008)

Slide taken from Professor Anthony D. Joseph’s lecture at RWTH Aachen4/3/2013 Satish Srirama 10

What is Cloud Computing?

• Computing as a utility– Utility services e.g. water, electricity, gas etc

– Consumers pay based on their usage

• Cloud Computing characteristics – Illusion of infinite resources– Illusion of infinite resources

– No up-front cost

– Fine-grained billing (e.g. hourly)

• Gartner: “Cloud computing is a style of computing where massively scalable IT-related capabilities are provided ‘as a service’ across the Internet to multiple external customers”

Satish Srirama4/3/2013 11/33

Timeline

4/3/2013 Satish Srirama 12/33

How Cloud & Grid are related…

• Share a lot commonality– Intention, architecture and technology

• Differences– Programming model, business model, compute

model, applications, and Virtualization.model, applications, and Virtualization.

• The problems are mostly the same– Manage large facilities;

– Define methods by which consumers discover, request and use resources provided by the central facilities;

– Implement the often highly parallel computations that execute on those resources.

4/3/2013 Satish Srirama 13/33

Virtualization

• Virtualization techniques are the basis of the cloud computing

• Virtualization technologies partition hardware and thus provide flexible and scalable computing platforms

• Virtual machine techniques– VMware and Xen OS

App App App

OS OS– VMware and Xen

– OpenNebula

– Amazon EC2

• Grid– do not rely on virtualization as much as Clouds do, each

individual organization maintain full control of their resources

• For cloud, virtualization is almost an indispensable ingredient

Hardware

OS

Hypervisor

OS OS

Virtualized Stack

4/3/2013 Satish Srirama 14/33

4/3/2013 Satish Srirama

Enabling Grids for E-sciencE - EGEE

15/33

Clouds - Why Now (not then)?

• Commoditization of HW– x86 as universal ISA, plus fast virtualization

– Bet: Can statistically multiplex multiple instances onto a single box without interference between instances

• Web 2.0– Standard software stack, largely open source (LAMP)– Standard software stack, largely open source (LAMP)

– Asynchronous JavaScript and XML (AJAX)

• Novel economic model: fine grain billing– Earlier examples: Sun, Intel Computing Services—longer commitment,

more $$$/hour

• Infrastructure software: e.g. Google FileSystem, HDFS

• Operational expertise: failover, DDoS, firewalls...

• More pervasive broadband Internet

4/3/2013 Satish Srirama

ISA - Instruction Set Architecture

LAMP – Linux, Apache Http Server, MySQL, PHP, perl or python

16/33

Cloud Computing - Services• Software as a Service – SaaS

– A way to access applications hosted on the web through your web browser

• Platform as a Service – PaaS– A pay-as-you-go model for IT

resources accessed over the

SaaS

Facebook, Flikr, Myspace.com,

Google maps API, Gmail

Level of

Abstraction

resources accessed over the Internet

• Infrastructure as a Service –IaaS– Use of commodity computers,

distributed across Internet, to perform parallel processing, distributed storage, indexing and mining of data

– VirtualizationSatish Srirama

PaaS

Google App Engine, Force.com, Hadoop, Azure,

Amazon S3, etc

IaaSAmazon EC2, SciCloud, Joyent Accelerators, Nirvanix Storage Delivery Network, etc.

4/3/2013 17/33

Cloud Computing - Themes

• Massively scalable

• On-demand & dynamic

• Only use what you need - Elastic – No upfront commitments, use on short term basis

• Accessible via Internet, location independent• Accessible via Internet, location independent

• Transparent – Complexity concealed from users, virtualized,

abstracted

• Service oriented – Easy to use SLAs

Satish Srirama

SLA – Service Level Agreement

4/3/2013 18/33

Cloud Models

• Internal (private) cloud– Cloud with in an organization

• Community cloud– Cloud infrastructure jointly

owned by several organizations

• Public cloud• Public cloud– Cloud infrastructure owned by

an organization, provided to general public as service

• Hybrid cloud– Composition of two or more

cloud models

Satish Srirama4/3/2013 19/33

Short Term Implications of Clouds

• Startups and prototyping

– Minimize infrastructure risk

– Lower cost of entry

• Batch jobs• Batch jobs

• One-off tasks

– Washington post, NY Times

• Cost associatively for scientific applications

• Research at scale

4/3/2013 Satish Srirama 20/33

Cloud Application Demand

• Many cloud applications have cyclical demand curves– Daily, weekly, monthly, …

Demand

Re

so

urc

es

• Workload spikes are more frequent and significant– When some event happens like a pop star has expired:

• More # tweets, Wikipedia traffic increases

• 22% of tweets, 20% of Wikipedia traffic when Michael Jackson expired in 2009 – Google thought they are under attack

Demand

Time

Re

so

urc

es

4/3/2013 Satish Srirama 21/33

Economics of Cloud Users

• Pay by use instead of provisioning for peak

Capacity

Re

so

urc

es

Re

so

urc

es

Unused resources

Static data center Data center in the cloud

Demand

Time

Re

so

urc

es

Demand

Capacity

TimeR

eso

urc

es

4/3/2013 Satish Srirama 22/33

Unused resources

Economics of Cloud Users - continued

• Risk of over-provisioning: underutilization

– Huge sunk cost in infrastructureCapacity

Static data center

Demand

Time

Re

so

urc

es

4/3/2013 Satish Srirama 23/33

Economics of Cloud Users - continued

• Heavy penalty for under-provisioning

Re

so

urc

es

Demand

Capacity

Re

so

urc

es

Lost revenue

Lost users

Re

so

urc

es

Demand

Capacity

Time (days)1 2 3

Demand

Time (days)1 2 3

Re

so

urc

es

Demand

Capacity

Time (days)1 2 3

4/3/2013 Satish Srirama 24/33

Economics of Cloud Providers

• Building a very large-scale datacenter is very expensive

– $100+ Million (Minimum)

• Large Internet Companies Already Building Huge DCs

– Google, Amazon, Microsoft…

• 5-7x economies of scale [Hamilton 2008]• 5-7x economies of scale [Hamilton 2008]

ResourceCost in

Medium DC

Cost in

Very Large DCRatio

Network $95 / Mbps / month $13 / Mbps / month 7.3x

Storage $2.20 / GB / month $0.40 / GB / month 5.5x

Administration ≈140 servers/admin >1000 servers/admin 7.1x

4/3/2013 Satish Srirama 25/33

Economics of Cloud Providers -

continued• Power

Price per

KWH

Where Possible Reasons Why

3.6¢ Idaho Hydroelectric power; not sent long distance

10.0¢ California Electricity transmitted long distance over the

grid; limited transmission lines in Bay Area; no

• Cooling is also expensive– Build data centers near rivers

• Extra benefits– Amazon: utilize off-peak capacity

– Microsoft: sell .NET tools

– Google: reuse existing infrastructure

4/3/2013 Satish Srirama

grid; limited transmission lines in Bay Area; no

coal fired electricity allowed in California.

18.0¢ Hawaii Must ship fuel to generate electricity

26/33

Economics of Cloud Providers - Failures

• Cloud Computing providers bring a shift from high reliability/availability servers to commodity servers

– At least one failure per day in large datacenter

• Why?

– Significant economic incentives – much lower per-server – Significant economic incentives – much lower per-server cost

• Caveat: User software has to adapt to failures

– Very hard problem!

• Solution: Replicate data and computation

– MapReduce & Distributed File System (Will discuss later in Lecture 3)

4/3/2013 Satish Srirama 27/33

Adoption Challenges

Challenge Opportunity

Availability Multiple providers & Use elasticity to prevent DDoSattacks

Data lock-in StandardizationData lock-in Standardization

Data Confidentiality and Auditability

Encryption, VLANs, Firewalls; Geographical Data Storage

4/3/2013 Satish Srirama 28/33

Growth Challenges

Challenge Opportunity

Data transfer bottlenecks

FedEx-ing disks, Data Backup/Archival

Performance unpredictability

Improved VM support, flash memory, scheduling VMsunpredictability memory, scheduling VMs

Scalable storage Invent scalable store

Bugs in large distributed systems

Invent Debugger that relies on Distributed VMs

Scaling quickly Invent Auto-Scaler; Snapshots for conservation

4/3/2013 Satish Srirama 29/33

Policy and Business Challenges

Challenge Opportunity

Reputation Fate Sharing Offer reputation-guarding services like those for email

Software Licensing Pay-for-use licenses; Bulk use salesuse sales

4/3/2013 Satish Srirama 30/33

Long Term Implications of clouds

• Application software:

– Cloud & client parts, disconnection tolerance

• Infrastructure software:

– Resource accounting, VM awareness– Resource accounting, VM awareness

• Hardware systems:

– Containers, energy proportionality

4/3/2013 Satish Srirama 31/33

Cloud Computing Progress

4/3/2013

Armando Fox, 2010

Satish Srirama 32/33

This week in Lab (Homework)

• Registration to the cloud & keys

– Firefox plugin & working with Eucatools and API

• Study the following paper and write an excerpt (At least 1 page)excerpt (At least 1 page)

– M. Armbrust et al., “Above the Clouds, A Berkeley View of Cloud Computing”, Technical Report, University of California, Feb, 2009.

• Write why and where you think cloud is useful for you?

4/3/2013 33/33Satish Srirama

Next lecture

• Cloud providers

– Amazon EC2, S3, EBS

• Eucalyptus

• OpenStack• OpenStack

4/3/2013 Satish Srirama 34/33

References

• Several of the slides are taken from Prof. Anthony D. Joseph’s lecture at RWTH Aachen (March 2010)

• Papers to read

M. Armbrust et al., “Above the Clouds, A Berkeley – M. Armbrust et al., “Above the Clouds, A Berkeley View of Cloud Computing”, Technical Report, University of California, Feb, 2009.

• The Cloud: Battle of the Tech Titans

– Cover story in Businessweek– http://www.businessweek.com/magazine/content/11_11/b4219052599182.htm

4/3/2013 Satish Srirama 35/33