interruptible tasks: treating memory pressure as

65
Interruptible Tasks: Treating Memory Pressure as Interrupts for Highly Scalable Data-Parallel Programs Lu Fang 1 , Khanh Nguyen 1 , Guoqing(Harry) Xu 1 , Brian Demsky 1 , Shan Lu 2 1 University of California, Irvine 2 University of Chicago SOSP’15, October 7, 2015, Monterey, California, USA Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 1 / 25

Upload: others

Post on 18-Mar-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Interruptible Tasks: Treating Memory Pressure asInterrupts for Highly Scalable Data-Parallel Programs

Lu Fang 1, Khanh Nguyen 1, Guoqing(Harry) Xu 1, Brian Demsky 1,Shan Lu 2

1University of California, Irvine

2University of Chicago

SOSP’15, October 7, 2015, Monterey, California, USA

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 1 / 25

Motivation

Data-parallel systemI Input data are divided into independent partitions

I Many popular big data systems

B Memory pressure on single nodes

Our studyI Search “out of memory” and “data parallel” in StackOverflow

I We have collected 126 related problems

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 2 / 25

Motivation

Data-parallel systemI Input data are divided into independent partitions

I Many popular big data systems

B Memory pressure on single nodes

Our studyI Search “out of memory” and “data parallel” in StackOverflow

I We have collected 126 related problems

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 2 / 25

Memory Pressure in the Real World

Memory pressure on individual nodesI Executions push heap limit (using managed language)

I Data-parallel systems struggle for memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

OutOfMemoryError point

Long and useless GC

CRASH OutOfMemory Error SLOW Huge GC effort

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25

Memory Pressure in the Real World

Memory pressure on individual nodesI Executions push heap limit (using managed language)

I Data-parallel systems struggle for memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

OutOfMemoryError point

Long and useless GC

CRASH OutOfMemory Error

SLOW Huge GC effort

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25

Memory Pressure in the Real World

Memory pressure on individual nodesI Executions push heap limit (using managed language)

I Data-parallel systems struggle for memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

OutOfMemoryError point

Long and useless GC

CRASH OutOfMemory Error SLOW Huge GC effort

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25

Root Cause 1: Hot Keys

Key-value pairs

Popular keys have many associated values

Case study (from StackOverflow)I Process StackOverflow posts

I Long and popular posts

I Many tasks process long and popular posts

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25

Root Cause 1: Hot Keys

Key-value pairs

Popular keys have many associated values

Case study (from StackOverflow)I Process StackOverflow posts

I Long and popular posts

I Many tasks process long and popular posts

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25

Root Cause 1: Hot Keys

Key-value pairs

Popular keys have many associated values

Case study (from StackOverflow)I Process StackOverflow posts

I Long and popular posts

I Many tasks process long and popular posts

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25

Root Cause 2: Large Intermediate Results

Temporary data structures

Case study (from StackOverflow)I Use NLP library to process customers’ review

I Some reviews are quite long

I NLP library creates giant temporary data structures for longreviews

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 5 / 25

Root Cause 2: Large Intermediate Results

Temporary data structures

Case study (from StackOverflow)I Use NLP library to process customers’ review

I Some reviews are quite long

I NLP library creates giant temporary data structures for longreviews

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 5 / 25

Existing Solutions

More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]

I Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutionsI Configuration tuning

I Skew fixing

System-level solutionsI Cluster-wide resource manager, such as YARN

We need a systematic and effective solution!

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25

Existing Solutions

More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]

I Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutionsI Configuration tuning

I Skew fixing

System-level solutionsI Cluster-wide resource manager, such as YARN

We need a systematic and effective solution!

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25

Existing Solutions

More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]

I Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutionsI Configuration tuning

I Skew fixing

System-level solutionsI Cluster-wide resource manager, such as YARN

We need a systematic and effective solution!

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25

Existing Solutions

More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]

I Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutionsI Configuration tuning

I Skew fixing

System-level solutionsI Cluster-wide resource manager, such as YARN

We need a systematic and effective solution!

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25

Our Solution

Interruptible Task: treat memory pressure as interrupt

Dynamically change parallelism degree

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 7 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Program starts with multiple tasks

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Program pushes heap limit

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Long and useless GC

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

OutOfMemory Error

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Long and useless GCs are detected

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Killed Killed

Long and useless GCs are detected, start interrupting tasks

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed MemoryKilled

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed Memory

Released

Killed

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed Memory

Released

Killed

Released

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed Memory

Released

Kept in memory,

can be serialized

Killed

Released

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed Memory

Released

Kept in memory,

can be serialized

Final result: push

out and released

Killed

Released

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Release the memory, memory pressure is gone

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Local Data Structures

Processed Input

Unprocessed Input

Output

Killed Consumed Memory

Released

Kept in memory,

can be serialized

Intermediate result: kept in memory, can be serialized

Final result: push

out and released

Killed

Released

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

Program executes without memory pressure

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Killed Killed

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Why Does Our Technique Help

Task

Consumed

Memory

Me

mo

ry c

on

su

mp

tio

n

Execution time

Heap size

If there is enough memory, increase parallelism degree

Task

Consumed

Memory

Task

Consumed

Memory

Task

Consumed

Memory

Killed Killed

Task

Consumed

Memory

Newly created

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25

Challenges

How to expose semantics

→ a programming model

How to interrupt/reactivate tasks

→ a runtime system

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25

Challenges

How to expose semantics → a programming model

How to interrupt/reactivate tasks

→ a runtime system

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25

Challenges

How to expose semantics → a programming model

How to interrupt/reactivate tasks → a runtime system

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25

Challenges

How to expose semantics → a programming model

How to interrupt/reactivate tasks → a runtime system

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25

The Programming Model

A unified representation of input/outputI Separate processed and unprocessed input

I Specify how to serialize and deserialize

A definition of an interruptible taskI Safely interrupt tasks

I Specify the actions when interrupt happens

I Merge the intermediate results

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 10 / 25

The Programming Model

A unified representation of input/outputI Separate processed and unprocessed input

I Specify how to serialize and deserialize

A definition of an interruptible taskI Safely interrupt tasks

I Specify the actions when interrupt happens

I Merge the intermediate results

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 10 / 25

Representing Input/Output as DataPartitions

I How to separate processed and unprocessed input

I How to serialize and deserialize the data

1 A cursor points to the firstunprocessed tuple

2 Users implement serialize anddeserialize methods

DataPartition Abstract Class// The DataPartition abstract class

abstract class DataPartition {

// Some fields and methods

...

// A cursor points to the first

// unprocessed tuple

int cursor;

// Serialize the DataPartition

abstract void serialize();

// Deserialize the DataPartition

abstract DataPartition deserialize();

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25

Representing Input/Output as DataPartitions

I How to separate processed and unprocessed input

I How to serialize and deserialize the data

1 A cursor points to the firstunprocessed tuple

2 Users implement serialize anddeserialize methods

DataPartition Abstract Class// The DataPartition abstract class

abstract class DataPartition {

// Some fields and methods

...

// A cursor points to the first

// unprocessed tuple

int cursor;

// Serialize the DataPartition

abstract void serialize();

// Deserialize the DataPartition

abstract DataPartition deserialize();

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25

Representing Input/Output as DataPartitions

I How to separate processed and unprocessed input

I How to serialize and deserialize the data

1 A cursor points to the firstunprocessed tuple

2 Users implement serialize anddeserialize methods

DataPartition Abstract Class// The DataPartition abstract class

abstract class DataPartition {

// Some fields and methods

...

// A cursor points to the first

// unprocessed tuple

int cursor;

// Serialize the DataPartition

abstract void serialize();

// Deserialize the DataPartition

abstract DataPartition deserialize();

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25

Defining an ITask

I What actions should be taken when interrupt happens

I How to safely interrupt a task

1 In interrupt, we define how to dealwith partial results

2 Tasks are always interrupted at thebeginning in the scaleLoop

ITask Abstract Class// The ITask interface in the library

abstract class ITask {

// Some methods

...

abstract void interrupt();

boolean scaleLoop(DataPartition dp) {

// Iterate dp, and process each tuple

while (dp.hasNext()) {

// If pressure occurs, interrupt

if (HasMemoryPressure()) {

interrupt();

return false;

}

process();

}

}

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25

Defining an ITask

I What actions should be taken when interrupt happens

I How to safely interrupt a task

1 In interrupt, we define how to dealwith partial results

2 Tasks are always interrupted at thebeginning in the scaleLoop

ITask Abstract Class// The ITask interface in the library

abstract class ITask {

// Some methods

...

abstract void interrupt();

boolean scaleLoop(DataPartition dp) {

// Iterate dp, and process each tuple

while (dp.hasNext()) {

// If pressure occurs, interrupt

if (HasMemoryPressure()) {

interrupt();

return false;

}

process();

}

}

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25

Defining an ITask

I What actions should be taken when interrupt happens

I How to safely interrupt a task

1 In interrupt, we define how to dealwith partial results

2 Tasks are always interrupted at thebeginning in the scaleLoop

ITask Abstract Class// The ITask interface in the library

abstract class ITask {

// Some methods

...

abstract void interrupt();

boolean scaleLoop(DataPartition dp) {

// Iterate dp, and process each tuple

while (dp.hasNext()) {

// If pressure occurs, interrupt

if (HasMemoryPressure()) {

interrupt();

return false;

}

process();

}

}

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25

Multiple Input for an ITask

I How to merge intermediate results

1 scaleLoop takes aPartitionIterator as input

MITask Abstract Class// The MITask interface in the library

abstract class MITask extends ITask{

// Most parts are the same as ITask

...

// Only difference

boolean scaleLoop(

PartitionIterator<DataPartition> i) {

// Iterate partitions through iterator

while (i.hasNext()) {

DataPartition dp = (DataPartition) i.next();

// Iterate all the data tuples in this partition

...

}

return true;

}

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 13 / 25

Multiple Input for an ITask

I How to merge intermediate results

1 scaleLoop takes aPartitionIterator as input

MITask Abstract Class// The MITask interface in the library

abstract class MITask extends ITask{

// Most parts are the same as ITask

...

// Only difference

boolean scaleLoop(

PartitionIterator<DataPartition> i) {

// Iterate partitions through iterator

while (i.hasNext()) {

DataPartition dp = (DataPartition) i.next();

// Iterate all the data tuples in this partition

...

}

return true;

}

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 13 / 25

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator

Merge Operator

Reduce Operator

Final

...

Final Final

Shuffling

Reduce Operator ... Reduce Operator

1

HDFS

Merge Operator

1 1 n n

HDFS

MapOperatorclass MapOperator extends ITask

implements HyracksOperator {

void interrupt() {

// Push out final// results to shuffling...

}

// Some other fields and methods

...

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator

Merge Operator

Reduce Operator

Final

...

Final Final

Shuffling

Reduce Operator ... Reduce Operator

1

HDFS

Merge Operator

1 1 n n

HDFS

ReduceOperatorclass ReduceOperator extends ITask

implements HyracksOperator {

void interrupt() {

// Tag the results;// Output as intermediate// results...

}

// Some other fields and methods

...

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator

Merge Operator

Reduce Operator

Final

...

Final Final

Shuffling

Reduce Operator ... Reduce Operator

1

HDFS

Merge Operator

1 1 n n

HDFS

MergeOperatorclass MergeTask extends MITask {

void interrupt() {

// Tag the results;// Output as intermediate// results

}

// Some other fields and methods

...

}

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25

Challenges

How to expose semantics → a programming model

How to interrupt/activate tasks → a runtime system

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 15 / 25

ITask Runtime System

Monitor

SchedulerPartition Manager

Data Partition

Data Partition

Data Partition

Data Partition

ITask Runtime System

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25

ITask Runtime System

Monitor

SchedulerPartition Manager

Data Partition

Data Partition

Data Partition

Data Partition

MemoryITask Runtime System

Grow/Reduce

Check Reduce

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25

ITask Runtime System

Monitor

SchedulerPartition Manager

Data Partition

Data Partition

Data Partition

Data Partition

Memory

ITasks Disk

ITask Runtime System

Grow/Reduce

Check

Serialize/DeserializeInput/Output

Reduce

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25

ITask Runtime System

Monitor

SchedulerPartition Manager

Data Partition

Data Partition

Data Partition

Data Partition

Memory

ITasks Disk

ITask Runtime System

Grow/Reduce

Check

Interrupt/Create Serialize/DeserializeInput/Output

Reduce

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25

Evaluation Environments

We have implemented ITask onI Hadoop 2.6.0

I Hyracks 0.2.14

An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 17 / 25

Evaluation Environments

We have implemented ITask onI Hadoop 2.6.0

I Hyracks 0.2.14

An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 17 / 25

Experiments on Hadoop

GoalI Show the effectiveness on real-world problems

BenchmarksI Original: five real-world programs collected from Stack Overflow

I RFix: apply the fixes recommended on websites

I ITask: apply ITask on original programs

Name Dataset

Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 18 / 25

Experiments on Hadoop

GoalI Show the effectiveness on real-world problems

BenchmarksI Original: five real-world programs collected from Stack Overflow

I RFix: apply the fixes recommended on websites

I ITask: apply ITask on original programs

Name Dataset

Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 18 / 25

Improvements

Benchmark Original Time RFix Time ITask Time Speed Up

MSA 1047 (crashed) 48 72 -33.3%

IMC 5200 (crashed) 337 238 41.6%

IIB 1322 (crashed) 2568 1210 112.2%

WCM 2643 (crashed) 2151 1287 67.1%

CRP 567 (crashed) 6761 2001 237.9%

I With ITask, all programs survive memory pressure

I On average, ITask versions are 62.5% faster than RFix

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 19 / 25

Experiments on Hyracks

GoalI Show the improvements on performance

I Show the improvements on scalability

BenchmarksI Original: five hand-optimized applications from repository

I ITask: apply ITask on original programs

Name Dataset

WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 20 / 25

Experiments on Hyracks

GoalI Show the improvements on performance

I Show the improvements on scalability

BenchmarksI Original: five hand-optimized applications from repository

I ITask: apply ITask on original programs

Name Dataset

WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 20 / 25

Tuning Configurations for Original Programs

Configurations for best performanceName Thread Number Task Granularity

WordCount (WC) 2 32KBHeap Sort (HS) 6 32KBInverted Index (II) 8 16KBHash Join (HJ) 8 32KBGroup By (GR) 6 16KB

Configurations for best scalabilityName Thread Number Task Granularity

WordCount (WC) 1 4KBHeap Sort (HS) 1 4KBInverted Index (II) 1 4KBHash Join (HJ) 1 4KBGroup By (GR) 1 4KB

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 21 / 25

Improvements on Performance

WC HS II HJ GR0

1

2

1 1 1 1 1

1.41.11

1.28

1.67 1.61

Nor

mali

zed

Sp

eed

Up Original Best

ITask

On average, ITask is 34.4% faster

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 22 / 25

Improvements on Scalability

WC HS II HJ GR0

10

20

1 1 1 1 1

5.12.7

24

6 5

Nor

mal

ized

Dat

aset

Siz

e

Original BestITask

On average, ITask scales to 6.3×+ larger datasets

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 23 / 25

Conclusions

A programming model + a runtime systemI Non-intrusive

I Easy to use

First systematic approachI Help data-parallel tasks survive memory pressure

ITask improves performance and scalabilityI On Hadoop, ITask is 62.5% faster

I On Hyracks, ITask is 34.4% faster

I ITask helps programs scale to 6.3× larger datasets

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 24 / 25

Conclusions

A programming model + a runtime systemI Non-intrusive

I Easy to use

First systematic approachI Help data-parallel tasks survive memory pressure

ITask improves performance and scalabilityI On Hadoop, ITask is 62.5% faster

I On Hyracks, ITask is 34.4% faster

I ITask helps programs scale to 6.3× larger datasets

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 24 / 25

Thank You

Q & A

Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 25 / 25