cloud computing part #2 materials adopted from the slides by jimmy lin, the ischool, university of...

45
Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Upload: edwin-chapman

Post on 11-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Cloud ComputingPart #2

Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland

Zigmunds Buliņš, Mg. sc. ing

1

Page 2: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Source: http://www.free-pictures-photos.com/2

Page 3: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

What is Cloud Computing?1. Web-scale problems

2. Large data centers

3. Different models of computing

4. Highly-interactive Web applications

3

Page 4: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

1. Web-Scale Problems Characteristics:

Definitely data-intensiveMay also be processing intensive

Examples:Crawling, indexing, searching, mining the Web“Post-genomics” life sciences researchOther scientific data (physics, astronomers, etc.)Sensor networksWeb 2.0 applications…

4

Page 5: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

How much data? Wayback Machine has 2 PB + 20 TB/month (2006) Google processes 20 PB a day (2008) “all words ever spoken by human beings” ~ 5 EB NOAA has ~1 PB climate data (2007) CERN’s LHC will generate 15 PB a year (2008)

640K ought to be enough for anybody.

5

Page 6: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Maximilien Brice, © CERN6

Page 7: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Maximilien Brice, © CERN7

Page 8: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

What to do with more data? Answering factoid questions

Pattern matching on the Web Works amazingly well

Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances

Who shot Abraham Lincoln? X shot Abraham Lincoln

Birthday-of(Mozart, 1756)Birthday-of(Einstein, 1879)

Wolfgang Amadeus Mozart (1756 - 1791)Einstein was born in 1879

PERSON (DATE –PERSON was born in DATE

(Brill et al., TREC 2001; Lin, ACM TOIS 2007)(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) 8

Page 9: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

2. Large Data Centers Web-scale problems? Throw more

machines at it! Clear trend: centralization of computing

resources in large data centersNecessary ingredients: fiber, juice, and space

Important Issues:RedundancyEfficiencyUtilizationManagement

9

Page 10: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Source: Harper’s (Feb, 2008)10

Page 11: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Maximilien Brice, © CERN11

Page 12: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Key Technology: Virtualization

Hardware

Operating System

App App App

Traditional Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Stack

12

Page 13: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

3. Different Computing Models

Utility computingWhy buy machines when you can rent cycles?Examples: Amazon’s EC2, GoGrid, AppNexus

Platform as a Service (PaaS)Give me nice API and take care of the

implementationExample: Google App Engine, Heroku

Software as a Service (SaaS)Just run it for me!Example: Gmail

“Why do it yourself if you can pay someone to do it for you?”

13

Page 14: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

4. Web Applications

What is the nature of software applications?From the desktop to the browserSaaS = Web-based applicationsExamples: Google Maps, Facebook

How do we deliver highly-interactive Web-based applications?AJAX (asynchronous JavaScript and XML)For better, or for worse…

14

Page 15: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

What is the course about? MapReduce: the “back-end” of cloud

computingBatch-oriented processing of large datasets

Ajax: the “front-end” of cloud computingHighly-interactive Web-based applications

Computing “in the clouds”Amazon’s EC2/S3 as an example of utility

computing

15

Page 16: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Amazon Web Services

Elastic Compute Cloud (EC2)Rent computing resources by the hourBasic unit of accounting = instance-hourAdditional costs for bandwidth

Simple Storage Service (S3)Persistent storageCharge by the GB/monthAdditional costs for bandwidth

16

Page 17: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Simple Storage Service

Pay for what you use:$0.20 per GByte of data transferred,$0.15 per GByte-Month for storage used,Second Life Update:

○ 1TBytes, 40,000 downloads in 24 hours - $200

17

Page 18: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Some cloud providers

18

Page 19: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Cloud Computing Zen Don’t get frustrated (take a deep breath)…

This is bleeding edge technologyThose W$*#T@F! moments

Be patient… This is the second first time I’ve taught this

course Be flexible…

There will be unanticipated issues along the way Be constructive…

Tell me how I can make everyone’s experience better

19

Page 20: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Source: Wikipedia20

Page 21: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Web-Scale Problems?

Don’t hold your breath:BiocomputingNanocomputingQuantum computing…

It all boils down to…Divide-and-conquerThrowing more hardware at the problem

Simple to understand… a lifetime to master…

21

Page 22: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Divide and Conquer

“Work”

w1 w2 w3

r1 r2 r3

“Result”

“worker” “worker” “worker”

Partition

Combine

22

Page 23: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Different Workers

Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor

system Different machines in a distributed

system

23

Page 24: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Choices, Choices, Choices Commodity vs. “exotic” hardware Number of machines vs. processor vs.

cores Bandwidth of memory vs. disk vs.

network Different programming models

24

Page 25: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Flynn’s TaxonomyInstructions

Single (SI) Multiple (MI)

Da

ta

Mu

ltip

le (

MD

)

SISD

Single-threaded process

MISD

Pipeline architecture

SIMD

Vector Processing

MIMD

Multi-threaded Programming

Sin

gle

(S

D)

25

Page 26: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

SISD

D D D D D D D

Processor

Instructions

26

Page 27: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

SIMD

D0

Processor

Instructions

D0D0 D0 D0 D0

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D0

27

Page 28: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

MIMD

D D D D D D D

Processor

Instructions

D D D D D D D

Processor

Instructions

28

Page 29: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Memory Typology: Shared

Memory

Processor

Processor Processor

Processor

29

Page 30: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Memory Typology: Distributed

MemoryProcessor MemoryProcessor

MemoryProcessor MemoryProcessor

Network

30

Page 31: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Memory Typology: Hybrid

MemoryProcessor

Network

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

31

Page 32: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Parallelization Problems How do we assign work units to workers? What if we have more work units than

workers? What if workers need to share partial

results? How do we aggregate partial results? How do we know all the workers have

finished? What if workers die?

What is the common theme of all of these problems?

32

Page 33: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

General Theme?

Parallelization problems arise from:Communication between workersAccess to shared resources (e.g., data)

Thus, we need a synchronization system!

This is tricky:Finding bugs is hardSolving bugs is even harder

33

Page 34: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Managing Multiple Workers Difficult because

(Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other

Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers

Still, lots of problems: Deadlock, livelock, race conditions, ...

Moral of the story: be careful! Even trickier if the workers are on different machines

34

Page 35: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Patterns for Parallelism

Parallel computing has been around for decades

Here are some “design patterns” …

35

Page 36: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Master/Slaves

slaves

master

36

Page 37: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Producer/Consumer Flow

CP

P

P

C

C

CP

P

P

C

C

37

Page 38: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Work Queues

CP

P

P

C

C

shared queue

W W W W W

38

Page 39: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Rubber Meets Road From patterns to implementation:

pthreads, OpenMP for multi-threaded programming

MPI for clustering computing…

The reality:Lots of one-off solutions, custom codeWrite you own dedicated library, then program

with itBurden on the programmer to explicitly manage

everything MapReduce to the rescue!

39

Page 40: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce (1)

Document

Query

40http://ayende.com/blog/4435/map-reduce-a-visual-explanation

Page 41: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce (2)

Query result

41http://ayende.com/blog/4435/map-reduce-a-visual-explanation

Page 42: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce (3)

Reduce

42http://ayende.com/blog/4435/map-reduce-a-visual-explanation

Page 43: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce (4)

Reduce…

43http://ayende.com/blog/4435/map-reduce-a-visual-explanation

Page 44: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce (5)

Reduce…

44http://ayende.com/blog/4435/map-reduce-a-visual-explanation

Page 45: Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Map/Reduce performance Sorting 210 100 bytes records (~1TB)

45http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf