survey on programming and tasking in cloud computing environments

61
Survey on Programming Survey on Programming and Tasking in Cloud and Tasking in Cloud Computing Environments Computing Environments PhD Qualifying Exam PhD Qualifying Exam Zhiqiang Ma Zhiqiang Ma Supervisor: Lin Gu Supervisor: Lin Gu Feb. 18, 2011 Feb. 18, 2011

Upload: pandora-cummings

Post on 02-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Survey on Programming and Tasking in Cloud Computing Environments. PhD Qualifying Exam Zhiqiang Ma Supervisor: Lin Gu Feb. 18, 2011. Outline. Introduction Approaches Application framework level approach Language level approach Instruction level approach Our work: MRlite Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Survey on Programming and Tasking in Cloud Computing Environments

Survey on Programming and Survey on Programming and Tasking in CloudTasking in Cloud

Computing EnvironmentsComputing Environments

PhD Qualifying ExamPhD Qualifying ExamZhiqiang MaZhiqiang Ma

Supervisor: Lin GuSupervisor: Lin Gu

Feb. 18, 2011Feb. 18, 2011

Page 2: Survey on Programming and Tasking in Cloud Computing Environments

OutlineOutline IntroductionIntroduction ApproachesApproaches

Application framework level approachApplication framework level approach Language level approachLanguage level approach Instruction level approachInstruction level approach

Our work: MRliteOur work: MRlite ConclusionConclusion

2

Page 3: Survey on Programming and Tasking in Cloud Computing Environments

Cloud computingCloud computing

Internet services are the most popular Internet services are the most popular applications nowadaysapplications nowadays Millions of usersMillions of users Computation is large and complexComputation is large and complex

Google already processed 20TB data in 2004Google already processed 20TB data in 2004

Cloud computing provides massive Cloud computing provides massive computing resourcescomputing resources Available on demandAvailable on demand

3

A promising model to support processing large datasets housed on clusters

Page 4: Survey on Programming and Tasking in Cloud Computing Environments

How to program and task?How to program and task? ChallengesChallenges

Parallelize the executionParallelize the execution Scheduling the large scale distributed computationScheduling the large scale distributed computation Handling faultsHandling faults High performanceHigh performance Ensuring fairnessEnsuring fairness

Programming models for GridProgramming models for Grid Do not automatically parallelize users’ programsDo not automatically parallelize users’ programs Pass the fault-tolerance work to applicationsPass the fault-tolerance work to applications

4

Page 5: Survey on Programming and Tasking in Cloud Computing Environments

OutlineOutline IntroductionIntroduction ApproachesApproaches

Application framework level approachApplication framework level approach Language level approachLanguage level approach Instruction level approachInstruction level approach

Our work: MRliteOur work: MRlite ConclusionConclusion

5

Page 6: Survey on Programming and Tasking in Cloud Computing Environments

ApproachesApproaches

6

Approach Advantage Disadvantage

Language level

Instruction level

Application framework level

Page 7: Survey on Programming and Tasking in Cloud Computing Environments

MapReduceMapReduce MapReduce: parallel computing MapReduce: parallel computing

framework framework for large-scale data processingfor large-scale data processing Successful used in datacenters comprising Successful used in datacenters comprising

commodity computerscommodity computers A fundamental piece of software in the Google A fundamental piece of software in the Google

architecture for many years architecture for many years Open source variant already exists: HadoopOpen source variant already exists: Hadoop Widely used in solving data-intensive problemsWidely used in solving data-intensive problems

7

MapReduce … Hadoop or variants …Hadoop

Page 8: Survey on Programming and Tasking in Cloud Computing Environments

MapReduceMapReduce Map and Reduce are higher-order functionsMap and Reduce are higher-order functions Map: Map: apply an operation to apply an operation to all elements in a listall elements in a list Reduce: Like “fold”;Reduce: Like “fold”; aggregate elements of a list aggregate elements of a list

8

11

mm

44

mm

99

mm

1616

mm

2525

mm

11 22 33 44 55

m: x2

00 11

rr

55

rr

1414

rr

3030

rr

5555

rr

final valuefinal valueInitial valueInitial value

r: +

12 + 22 + 32 + 42 + 52 = ?

Page 9: Survey on Programming and Tasking in Cloud Computing Environments

MapReduce’s data flowMapReduce’s data flow

9

Page 10: Survey on Programming and Tasking in Cloud Computing Environments

MapReduceMapReduceMassive parallel processing made simpleMassive parallel processing made simple Example: world countExample: world count Map: parse a document and generate <word, 1> pairsMap: parse a document and generate <word, 1> pairs Reduce: receive all pairs for a specific word, and countReduce: receive all pairs for a specific word, and count

10

// D is a documentfor each word w in D output <w, 1>

// D is a documentfor each word w in D output <w, 1>

MapMap

Reduce for key w:count = 0for each input item count = count + 1output <w, count>

Reduce for key w:count = 0for each input item count = count + 1output <w, count>

ReduceReduce

Page 11: Survey on Programming and Tasking in Cloud Computing Environments

MapReduce easily scales upMapReduce easily scales up

11

Input files

Map phase

Intermediate files

Reduce phase

Output files

Page 12: Survey on Programming and Tasking in Cloud Computing Environments

12

MapReduce

Input Computation Output

Page 13: Survey on Programming and Tasking in Cloud Computing Environments

DryadDryad General-purpose execution environment General-purpose execution environment

for distributed, data-parallel applicationsfor distributed, data-parallel applications Concentrates on throughput not latencyConcentrates on throughput not latency

Application written in Dryad is modeled as Application written in Dryad is modeled as a directed acyclic graph (DAG)a directed acyclic graph (DAG) Many programs can be represented as a Many programs can be represented as a

distributed execution graphdistributed execution graph

13

Page 14: Survey on Programming and Tasking in Cloud Computing Environments

DryadDryad

14

Processingvertices Channels

(file, pipe, shared memory)

Inputs

Outputs

Page 15: Survey on Programming and Tasking in Cloud Computing Environments

DryadDryad Concurrency arise from vertices running Concurrency arise from vertices running

simultaneously across multiple machinessimultaneously across multiple machines Vertices subroutines are usually quite simple as Vertices subroutines are usually quite simple as

sequential programssequential programs

User have control over the communication User have control over the communication graphgraph Each vertex can has multiple input and outputEach vertex can has multiple input and output

15

Page 16: Survey on Programming and Tasking in Cloud Computing Environments

ApproachesApproaches

16

Approach Advantage Disadvantage

Application framework level

Language’ level

Instruction level

Users are relaxed from the details of distributing the execution

Automatically parallelize users’ programs;

Programs must follow the specific model

Page 17: Survey on Programming and Tasking in Cloud Computing Environments

Tasking of executionTasking of execution PerformancePerformance

Locality is crucialLocality is crucial Speculative executionSpeculative execution

FairnessFairness The same cluster shared by multiple usersThe same cluster shared by multiple users Small jobs requires small response time while Small jobs requires small response time while

throughput is important for big jobsthroughput is important for big jobs

CorrectnessCorrectness Fault-toleranceFault-tolerance

17

Page 18: Survey on Programming and Tasking in Cloud Computing Environments

Locality and fairnessLocality and fairness Locality is crucialLocality is crucial

Bandwidth is scarce resourceBandwidth is scarce resource Input data with duplications are stored in the Input data with duplications are stored in the

same cluster for executionssame cluster for executions

FairnessFairness Short jobs requires short response timeShort jobs requires short response time

18

Locality and fairness conflicts with each other

Page 19: Survey on Programming and Tasking in Cloud Computing Environments

FIFO scheduler in HadoopFIFO scheduler in Hadoop

Jobs in a queue with priority orderJobs in a queue with priority order FIFO by defaultFIFO by default

When there are available slotsWhen there are available slots Assign slots to tasks, that have local data, in Assign slots to tasks, that have local data, in

priority orderpriority order Limit the assignment of non-local task to optimize Limit the assignment of non-local task to optimize

localitylocality

19

Page 20: Survey on Programming and Tasking in Cloud Computing Environments

FIFO schedulerFIFO scheduler

20

JobQueue

2 tasks

1 tasks

Node 1

Node 2

Node 3

Node 4

Page 21: Survey on Programming and Tasking in Cloud Computing Environments

FIFO scheduler – locality optimizationFIFO scheduler – locality optimization

21

JobQueue

Far away in network topology

Only dispatch one non-local task at one

time

4 tasks

1 tasks

Node 1

Node 2

Node 3

Node 4

Page 22: Survey on Programming and Tasking in Cloud Computing Environments

Problem: fairnessProblem: fairness

22

JobQueue

3 tasks

3 tasks

Node 1

Node 2

Node 3

Node 4

Page 23: Survey on Programming and Tasking in Cloud Computing Environments

Problem: response timeProblem: response time

23

JobQueue

Small job:Only 1 task)

3 tasks

3 tasks

1 task

Node 1

Node 2

Node 3

Node 4

Page 24: Survey on Programming and Tasking in Cloud Computing Environments

Fair schedulingFair scheduling Assign free slots to the job that has the Assign free slots to the job that has the

fewest running tasksfewest running tasks Strict fairnessStrict fairness

Running jobs gets nearly equal number of slotsRunning jobs gets nearly equal number of slots

The small jobs finishes quicklyThe small jobs finishes quickly

24

Page 25: Survey on Programming and Tasking in Cloud Computing Environments

Fair SchedulingFair Scheduling

25

JobQueue

Node 1

Node 2

Node 3

Node 4

Page 26: Survey on Programming and Tasking in Cloud Computing Environments

Problem: lProblem: localityocality

26

JobQueue

Node 1

Node 2

Node 3

Node 4

Page 27: Survey on Programming and Tasking in Cloud Computing Environments

Delay SchedulingDelay Scheduling Skip the job that cannot launch a local taskSkip the job that cannot launch a local task

Relax fairness slightlyRelax fairness slightly

Allow a job to launch non-local tasks if be Allow a job to launch non-local tasks if be skipped long enoughskipped long enough Avoid starvationAvoid starvation

27

Page 28: Survey on Programming and Tasking in Cloud Computing Environments

Delay SchedulingDelay Scheduling

28

JobQueue

Node 1

skipcount 0 0 00120Threshold: 2

Node 2

Node 3

Node 4

Waiting time is short:

Tasks finish quicklySkipped job is in the head of the queue

Page 29: Survey on Programming and Tasking in Cloud Computing Environments

““Fault” ToleranceFault” Tolerance

Nodes failNodes fail Re-run tasksRe-run tasks

Nodes are slow (stragglers)Nodes are slow (stragglers) Run backup tasks (speculative execution)Run backup tasks (speculative execution) To minimize job’s response timeTo minimize job’s response time

Important for short jobsImportant for short jobs

29

Page 30: Survey on Programming and Tasking in Cloud Computing Environments

Speculative executionSpeculative execution

The scheduler schedules backup The scheduler schedules backup executions of the remaining executions of the remaining in-progress in-progress taskstasks

The task is marked as completed The task is marked as completed whenever either the primary or the backup whenever either the primary or the backup execution completesexecution completes

Improve job response time by 44% Improve job response time by 44% according Google’s experimentsaccording Google’s experiments

30

Page 31: Survey on Programming and Tasking in Cloud Computing Environments

Speculative execution mechanismSpeculative execution mechanism

Seems a simple problem, butSeems a simple problem, but Resource for speculative tasks is not freeResource for speculative tasks is not free How to choose nodes to run speculative How to choose nodes to run speculative

tasks?tasks? How to distinguish “stragglers” from How to distinguish “stragglers” from

nodes that are slightly slower?nodes that are slightly slower? Stragglers should be found out earlyStragglers should be found out early

31

Page 32: Survey on Programming and Tasking in Cloud Computing Environments

Hadoop’s schedulerHadoop’s scheduler

Start speculative tasks based on a simple Start speculative tasks based on a simple heuristicheuristic Comparing each task’s progress to the averageComparing each task’s progress to the average

Assumption of homogeneous environmentAssumption of homogeneous environment The default scheduler works wellThe default scheduler works well Broken in utility computingBroken in utility computing

Virtualized “utility computing” environments, such Virtualized “utility computing” environments, such as EC2as EC2

32

How to robustly perform speculative execution (backup tasks) in heterogeneous environments?

Page 33: Survey on Programming and Tasking in Cloud Computing Environments

Speculative execution in HadoopSpeculative execution in Hadoop When there is no “higher priority” tasks, looks for When there is no “higher priority” tasks, looks for

a task to execute speculativelya task to execute speculatively Assumption: The is no cost to launching a speculative taskAssumption: The is no cost to launching a speculative task

Comparing each task’s progress to the average Comparing each task’s progress to the average progressprogress Assumption: Nodes perform similarly. (“Slow node is faulty”; Assumption: Nodes perform similarly. (“Slow node is faulty”;

“Nodes that ask for new tasks are fast”)“Nodes that ask for new tasks are fast”) Nodes may be slightly (2-3x) slower in “utility computing”, which Nodes may be slightly (2-3x) slower in “utility computing”, which

may not hurt the response time or ask for tasks but not fastmay not hurt the response time or ask for tasks but not fast

33

Page 34: Survey on Programming and Tasking in Cloud Computing Environments

Speculative execution in HadoopSpeculative execution in Hadoop

Threshold for speculative executionThreshold for speculative execution (Average progress score of each category of (Average progress score of each category of

tasks) – 0.2tasks) – 0.2 Tasks beyond the threshold are “equally slow”Tasks beyond the threshold are “equally slow” Ranks candidates by localityRanks candidates by locality

Wrong tasks may be chosenWrong tasks may be chosen 35% completed 2x slower task with 35% completed 2x slower task with data available on idle data available on idle

nodenode or 5% completed 10x slower task? or 5% completed 10x slower task?

Too many speculative tasks and thrashingToo many speculative tasks and thrashing Taking away resources from useful tasksTaking away resources from useful tasks

34

Page 35: Survey on Programming and Tasking in Cloud Computing Environments

Speculative execution in HadoopSpeculative execution in Hadoop

Progress scoreProgress score Map: fraction of input dataMap: fraction of input data Reduce: three phase (1/3 for each) and fraction Reduce: three phase (1/3 for each) and fraction

of data processedof data processed

Incorrect speculation of reduce tasksIncorrect speculation of reduce tasks Copy phase takes most of the time, but account Copy phase takes most of the time, but account

only 1/3only 1/3 30% tasks finishes quickly, 30% tasks finishes quickly, 70% are in copy 70% are in copy

phase:phase: Avg. progress rate = 30%*1+70%*1/3 = Avg. progress rate = 30%*1+70%*1/3 = 53%, threshold=53%, threshold=33%33%

35

Page 36: Survey on Programming and Tasking in Cloud Computing Environments

LATELATE

Longest Approximate Time to EndLongest Approximate Time to End PrinciplesPrinciples

Ranks candidate by longest time to endRanks candidate by longest time to end Choose the right task that hurts the job’s response Choose the right task that hurts the job’s response

time; slow nodes can be utilized as long as it time; slow nodes can be utilized as long as it doesn’t hurt the response timedoesn’t hurt the response time

Only launch speculative tasks on fast nodesOnly launch speculative tasks on fast nodes Not every node that asks for task is fastNot every node that asks for task is fast

Cap speculative tasksCap speculative tasks Limit resource contention and thrashingLimit resource contention and thrashing

36

Page 37: Survey on Programming and Tasking in Cloud Computing Environments

LATE algorithmLATE algorithm

If a node asks for a new task and there are fewer If a node asks for a new task and there are fewer than than SpeculativeCapSpeculativeCap speculative tasks running: speculative tasks running:

Ignore the request if the node's total progress is Ignore the request if the node's total progress is below below SlowNodeThresholdSlowNodeThreshold

Rank currently running tasks by estimated time leftRank currently running tasks by estimated time left

Launch a copy of the Launch a copy of the highest-rankedhighest-ranked task with task with progress rate below progress rate below SlowTaskThresholdSlowTaskThreshold

37

Cap speculative tasksCap speculative tasks

Only launch speculative tasks on fast nodesOnly launch speculative tasks on fast nodes

Rank candidates by longest time to endRank candidates by longest time to end

Page 38: Survey on Programming and Tasking in Cloud Computing Environments

ApproachesApproaches

38

Approach Advantage Disadvantage

Application framework level

Instruction level

Users are relaxed from the details of distributing the execution

Automatically parallelize users’ programs; Programs must

follow the specific model

Language level

Page 39: Survey on Programming and Tasking in Cloud Computing Environments

Language level approachLanguage level approach Programming frameworksProgramming frameworks

Still not clear and compact enoughStill not clear and compact enough

Traditional programming languageTraditional programming language Without giving special focus on high parallelism Without giving special focus on high parallelism

for large computing clusterfor large computing cluster

New languageNew language Clear, compact and expressiveClear, compact and expressive Automatically parallelized “normal” programsAutomatically parallelized “normal” programs Comfortable way for user to think about data Comfortable way for user to think about data

processing problem on large distributed datasetsprocessing problem on large distributed datasets

39

Page 40: Survey on Programming and Tasking in Cloud Computing Environments

SawzallSawzall Interpreted, procedural high-level Interpreted, procedural high-level

programming languageprogramming language Exploit high parallelismExploit high parallelism Automate very large data sets analysisAutomate very large data sets analysis Give users a way to clearly and Give users a way to clearly and

expressively design distributed data expressively design distributed data processing programsprocessing programs

40

Page 41: Survey on Programming and Tasking in Cloud Computing Environments

Overall flowOverall flow

FilteringFiltering Analysis each record individuallyAnalysis each record individually Expressed in SawzallExpressed in Sawzall

AggregationAggregation Collate and reduce the intermediate Collate and reduce the intermediate

valuesvalues Predefined aggregatorsPredefined aggregators 41

Map

Reduce

Page 42: Survey on Programming and Tasking in Cloud Computing Environments

An exampleAn example

42

max_pagerank_url: table maximum(1)[domain:string] of url:string weight pagerank:int;

doc:Document = input;

emit max_pagerank_url[domain(doc.url)] <- doc.url weight doc.pagerank;

Find out the most-linked-to page of each domain

Aggregator: highest valueStores url

Indexed by domainWeighted by pagerank

input: pre-defined variable initialized by SawzallInterpreted into Documentn type

emit: sends intermediate value to the aggregator

Page 43: Survey on Programming and Tasking in Cloud Computing Environments

Unusual featuresUnusual features Sawzall runs on one record at a timeSawzall runs on one record at a time

Nothing in the language to have one input record Nothing in the language to have one input record influent anotherinfluent another

emit statement is the only output primitiveemit statement is the only output primitive Explicit line between filtering and aggregationExplicit line between filtering and aggregation

43

Enables high degree of parallelism even though it is hidden from the language

Page 44: Survey on Programming and Tasking in Cloud Computing Environments

ApproachesApproaches

44

Approach Advantage Disadvantage

Application framework level Users are relaxed from

the details of distributing the execution

Automatically parallelize users’ programs; Programs must

follow the specific model

Language level

Instruction level

Clearer, more expressiveComfortable way for programming

More restrict programming model

Page 45: Survey on Programming and Tasking in Cloud Computing Environments

Instruction level approachInstruction level approach Provides instruction level abstracts and Provides instruction level abstracts and

compatibility to users’ applicationscompatibility to users’ applications May choose traditional ISA such as May choose traditional ISA such as

x86/x86-64x86/x86-64 Run traditional applications without any Run traditional applications without any

modificationmodification

Easier to migrate applications to cloud Easier to migrate applications to cloud computing environmentscomputing environments

45

Page 46: Survey on Programming and Tasking in Cloud Computing Environments

Amazon Elastic Compute Cloud (EC2)Amazon Elastic Compute Cloud (EC2)

Provides virtual machines runs traditional Provides virtual machines runs traditional OSOS Traditional programs can work on EC2Traditional programs can work on EC2

Amazon Machine Image (AMI)Amazon Machine Image (AMI) Boot instancesBoot instances Unit of deployment, packaged-up environmentUnit of deployment, packaged-up environment Users design and implement the application logic Users design and implement the application logic

in AMI; EC2 handles the deployment and in AMI; EC2 handles the deployment and resource allocationresource allocation

46

Page 47: Survey on Programming and Tasking in Cloud Computing Environments

vNUMAvNUMAVirtual shared-memory multiprocessor machine build from Virtual shared-memory multiprocessor machine build from commodity workstationscommodity workstations

Make the computational power available to legacy applications Make the computational power available to legacy applications and OSsand OSs

47

Virtualization vNUMA

PM

VM VM VM VM

PM PM PM

Page 48: Survey on Programming and Tasking in Cloud Computing Environments

48

ArchitectureArchitecture

HypervisorHypervisor On each nodeOn each node

CPUCPU Virtual CPUs are mapped to real CPUs on nodesVirtual CPUs are mapped to real CPUs on nodes

MemoryMemory Divided between the nodes with equal-sized portionsDivided between the nodes with equal-sized portions Each node manages a subset of the pagesEach node manages a subset of the pages

Page 49: Survey on Programming and Tasking in Cloud Computing Environments

49

Memory mappingMemory mapping

VM

VMM

PM PM

OS

Applicationread *a

maps b to real physical address

c on node

In application’s virtual memory address

translate a to VM’s physical memory

address b

find *c

Page 50: Survey on Programming and Tasking in Cloud Computing Environments

ApproachesApproaches

50

Approach Advantage Disadvantage

Application framework level Users are relaxed from

the details of distributing the execution

Automatically parallelize users’ programs; Programs must

follow the specific model

Language level

Instruction level

Clearer, more expressiveComfortable way for programming

More restrict programming model

Supports traditional applications

Users handles the tasking

Hard to scale up

Page 51: Survey on Programming and Tasking in Cloud Computing Environments

OutlineOutline IntroductionIntroduction ApproachesApproaches

Application framework level approachApplication framework level approach Language level approachLanguage level approach Instruction level approachInstruction level approach

Our work: MRliteOur work: MRlite ConclusionConclusion

51

Page 52: Survey on Programming and Tasking in Cloud Computing Environments

Our workOur work Analyze MapReduce’s design and use a case study Analyze MapReduce’s design and use a case study

to probe the limitationto probe the limitation One-way scalabilityOne-way scalability Difficult to handle dynamic, interactive and semantic-rich Difficult to handle dynamic, interactive and semantic-rich

applicationsapplications

Design a new parallelization framework – MRliteDesign a new parallelization framework – MRlite Able to scale “up” like MapReduce, and scale “down” to Able to scale “up” like MapReduce, and scale “down” to

process moderate-size dataprocess moderate-size data Low latency and massive parallelismLow latency and massive parallelism Small run-time system overheadSmall run-time system overhead

52

Design a general parallelization framework and programming paradigm for cloud computing

Page 53: Survey on Programming and Tasking in Cloud Computing Environments

Architecture of MRliteArchitecture of MRlite

53

MRlite clientMRlite client

MRlite masterscheduler

MRlite masterscheduler

slaveslave

slaveslave

slaveslave

slaveslave

applicationapplication

Data flowData flow

Command flowCommand flow

Linked together with the app, the MRlite client library accepts calls

from app and submits jobs to the master

Linked together with the app, the MRlite client library accepts calls

from app and submits jobs to the master High speed distributed

storage, stores intermediate files

High speed distributed storage, stores

intermediate files

The MRlite master accepts jobs from clients and

schedules them to execute on slaves

The MRlite master accepts jobs from clients and

schedules them to execute on slaves Distributed nodes

accept tasks from master and execute

them

Distributed nodes accept tasks from

master and execute them

Page 54: Survey on Programming and Tasking in Cloud Computing Environments

ResultResult

54

The evaluation shows that MRlite is one order of magnitude faster than The evaluation shows that MRlite is one order of magnitude faster than Hadoop on problems that MapReduce has difficulty in handling.Hadoop on problems that MapReduce has difficulty in handling.

Page 55: Survey on Programming and Tasking in Cloud Computing Environments

OutlineOutline IntroductionIntroduction ApproachesApproaches

Application framework level approachApplication framework level approach Language level approachLanguage level approach Instruction level approachInstruction level approach

Our work: MRliteOur work: MRlite ConclusionConclusion

55

Page 56: Survey on Programming and Tasking in Cloud Computing Environments

ConclusionConclusion Cloud Computing needs a general programming Cloud Computing needs a general programming

frameworkframework Cloud computing shall not be a platform to run just simple OLAP Cloud computing shall not be a platform to run just simple OLAP

applications. It is important to support complex computation and even applications. It is important to support complex computation and even OLTP on large data setsOLTP on large data sets

Design MRlite: a general parallelization framework Design MRlite: a general parallelization framework for cloud computingfor cloud computing Handles applications with complex logic flow and data dependenciesHandles applications with complex logic flow and data dependencies Mitigates the one-way scalability problemMitigates the one-way scalability problem Able to handle all MapReduce tasks with comparable (if not better) Able to handle all MapReduce tasks with comparable (if not better)

performanceperformance

56

Page 57: Survey on Programming and Tasking in Cloud Computing Environments

ConclusionConclusionEmerging computing platforms increasingly Emerging computing platforms increasingly emphasize parallelization capability, such as emphasize parallelization capability, such as GPGPUGPGPU

MRlite respects applications’ naturalMRlite respects applications’ natural logic flow logic flow and data dependenciesand data dependencies

This modularization of parallelization capability This modularization of parallelization capability from application logic enables MRlite to from application logic enables MRlite to integrate GPGPU processing very easily (future integrate GPGPU processing very easily (future work)work)

57

Page 58: Survey on Programming and Tasking in Cloud Computing Environments

Thank you!Thank you!

Page 59: Survey on Programming and Tasking in Cloud Computing Environments

AppendixAppendix

Page 60: Survey on Programming and Tasking in Cloud Computing Environments

LATE: Estimate finish timesLATE: Estimate finish times

60

progress score

execution timeprogress rate =

1 – progress score

progress rateestimated time left =

=1

progress scoreX execution time - 1 ) (

The smaller progress score, the longer estimated time left.

Appendix

Page 61: Survey on Programming and Tasking in Cloud Computing Environments

LATE: Solve the problems in Hadoop’s LATE: Solve the problems in Hadoop’s default schedulerdefault scheduler

Nodes may be slightly (2-3x) slower in “utility Nodes may be slightly (2-3x) slower in “utility computing”, which may not hurt the response computing”, which may not hurt the response time or ask for tasks but not fasttime or ask for tasks but not fast

Too many speculative tasks and thrashingToo many speculative tasks and thrashing Ranks candidate by localityRanks candidate by locality

Wrong tasks may be chosenWrong tasks may be chosen Incorrect speculation of reducersIncorrect speculation of reducers

61

Appendix