interruptible tasks: treating memory pressure as
TRANSCRIPT
Interruptible Tasks: Treating Memory Pressure asInterrupts for Highly Scalable Data-Parallel Programs
Lu Fang 1, Khanh Nguyen 1, Guoqing(Harry) Xu 1, Brian Demsky 1,Shan Lu 2
1University of California, Irvine
2University of Chicago
SOSP’15, October 7, 2015, Monterey, California, USA
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 1 / 25
Motivation
Data-parallel systemI Input data are divided into independent partitions
I Many popular big data systems
B Memory pressure on single nodes
Our studyI Search “out of memory” and “data parallel” in StackOverflow
I We have collected 126 related problems
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 2 / 25
Motivation
Data-parallel systemI Input data are divided into independent partitions
I Many popular big data systems
B Memory pressure on single nodes
Our studyI Search “out of memory” and “data parallel” in StackOverflow
I We have collected 126 related problems
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 2 / 25
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error SLOW Huge GC effort
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error
SLOW Huge GC effort
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error SLOW Huge GC effort
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 3 / 25
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 4 / 25
Root Cause 2: Large Intermediate Results
Temporary data structures
Case study (from StackOverflow)I Use NLP library to process customers’ review
I Some reviews are quite long
I NLP library creates giant temporary data structures for longreviews
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 5 / 25
Root Cause 2: Large Intermediate Results
Temporary data structures
Case study (from StackOverflow)I Use NLP library to process customers’ review
I Some reviews are quite long
I NLP library creates giant temporary data structures for longreviews
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 5 / 25
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 6 / 25
Our Solution
Interruptible Task: treat memory pressure as interrupt
Dynamically change parallelism degree
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 7 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program starts with multiple tasks
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program pushes heap limit
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Long and useless GC
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemory Error
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Long and useless GCs are detected
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Long and useless GCs are detected, start interrupting tasks
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed MemoryKilled
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Killed
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Killed
Released
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Killed
Released
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Final result: push
out and released
Killed
Released
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Intermediate result: kept in memory, can be serialized
Final result: push
out and released
Killed
Released
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program executes without memory pressure
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
If there is enough memory, increase parallelism degree
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Task
Consumed
Memory
Newly created
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 8 / 25
Challenges
How to expose semantics
→ a programming model
How to interrupt/reactivate tasks
→ a runtime system
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks
→ a runtime system
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks → a runtime system
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks → a runtime system
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 9 / 25
The Programming Model
A unified representation of input/outputI Separate processed and unprocessed input
I Specify how to serialize and deserialize
A definition of an interruptible taskI Safely interrupt tasks
I Specify the actions when interrupt happens
I Merge the intermediate results
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 10 / 25
The Programming Model
A unified representation of input/outputI Separate processed and unprocessed input
I Specify how to serialize and deserialize
A definition of an interruptible taskI Safely interrupt tasks
I Specify the actions when interrupt happens
I Merge the intermediate results
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 10 / 25
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 11 / 25
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 12 / 25
Multiple Input for an ITask
I How to merge intermediate results
1 scaleLoop takes aPartitionIterator as input
MITask Abstract Class// The MITask interface in the library
abstract class MITask extends ITask{
// Most parts are the same as ITask
...
// Only difference
boolean scaleLoop(
PartitionIterator<DataPartition> i) {
// Iterate partitions through iterator
while (i.hasNext()) {
DataPartition dp = (DataPartition) i.next();
// Iterate all the data tuples in this partition
...
}
return true;
}
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 13 / 25
Multiple Input for an ITask
I How to merge intermediate results
1 scaleLoop takes aPartitionIterator as input
MITask Abstract Class// The MITask interface in the library
abstract class MITask extends ITask{
// Most parts are the same as ITask
...
// Only difference
boolean scaleLoop(
PartitionIterator<DataPartition> i) {
// Iterate partitions through iterator
while (i.hasNext()) {
DataPartition dp = (DataPartition) i.next();
// Iterate all the data tuples in this partition
...
}
return true;
}
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 13 / 25
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
MapOperatorclass MapOperator extends ITask
implements HyracksOperator {
void interrupt() {
// Push out final// results to shuffling...
}
// Some other fields and methods
...
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
ReduceOperatorclass ReduceOperator extends ITask
implements HyracksOperator {
void interrupt() {
// Tag the results;// Output as intermediate// results...
}
// Some other fields and methods
...
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
MergeOperatorclass MergeTask extends MITask {
void interrupt() {
// Tag the results;// Output as intermediate// results
}
// Some other fields and methods
...
}
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 14 / 25
Challenges
How to expose semantics → a programming model
How to interrupt/activate tasks → a runtime system
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 15 / 25
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
ITask Runtime System
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
MemoryITask Runtime System
Grow/Reduce
Check Reduce
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
Memory
ITasks Disk
ITask Runtime System
Grow/Reduce
Check
Serialize/DeserializeInput/Output
Reduce
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
Memory
ITasks Disk
ITask Runtime System
Grow/Reduce
Check
Interrupt/Create Serialize/DeserializeInput/Output
Reduce
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 16 / 25
Evaluation Environments
We have implemented ITask onI Hadoop 2.6.0
I Hyracks 0.2.14
An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 17 / 25
Evaluation Environments
We have implemented ITask onI Hadoop 2.6.0
I Hyracks 0.2.14
An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 17 / 25
Experiments on Hadoop
GoalI Show the effectiveness on real-world problems
BenchmarksI Original: five real-world programs collected from Stack Overflow
I RFix: apply the fixes recommended on websites
I ITask: apply ITask on original programs
Name Dataset
Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 18 / 25
Experiments on Hadoop
GoalI Show the effectiveness on real-world problems
BenchmarksI Original: five real-world programs collected from Stack Overflow
I RFix: apply the fixes recommended on websites
I ITask: apply ITask on original programs
Name Dataset
Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 18 / 25
Improvements
Benchmark Original Time RFix Time ITask Time Speed Up
MSA 1047 (crashed) 48 72 -33.3%
IMC 5200 (crashed) 337 238 41.6%
IIB 1322 (crashed) 2568 1210 112.2%
WCM 2643 (crashed) 2151 1287 67.1%
CRP 567 (crashed) 6761 2001 237.9%
I With ITask, all programs survive memory pressure
I On average, ITask versions are 62.5% faster than RFix
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 19 / 25
Experiments on Hyracks
GoalI Show the improvements on performance
I Show the improvements on scalability
BenchmarksI Original: five hand-optimized applications from repository
I ITask: apply ITask on original programs
Name Dataset
WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 20 / 25
Experiments on Hyracks
GoalI Show the improvements on performance
I Show the improvements on scalability
BenchmarksI Original: five hand-optimized applications from repository
I ITask: apply ITask on original programs
Name Dataset
WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 20 / 25
Tuning Configurations for Original Programs
Configurations for best performanceName Thread Number Task Granularity
WordCount (WC) 2 32KBHeap Sort (HS) 6 32KBInverted Index (II) 8 16KBHash Join (HJ) 8 32KBGroup By (GR) 6 16KB
Configurations for best scalabilityName Thread Number Task Granularity
WordCount (WC) 1 4KBHeap Sort (HS) 1 4KBInverted Index (II) 1 4KBHash Join (HJ) 1 4KBGroup By (GR) 1 4KB
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 21 / 25
Improvements on Performance
WC HS II HJ GR0
1
2
1 1 1 1 1
1.41.11
1.28
1.67 1.61
Nor
mali
zed
Sp
eed
Up Original Best
ITask
On average, ITask is 34.4% faster
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 22 / 25
Improvements on Scalability
WC HS II HJ GR0
10
20
1 1 1 1 1
5.12.7
24
6 5
Nor
mal
ized
Dat
aset
Siz
e
Original BestITask
On average, ITask scales to 6.3×+ larger datasets
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 23 / 25
Conclusions
A programming model + a runtime systemI Non-intrusive
I Easy to use
First systematic approachI Help data-parallel tasks survive memory pressure
ITask improves performance and scalabilityI On Hadoop, ITask is 62.5% faster
I On Hyracks, ITask is 34.4% faster
I ITask helps programs scale to 6.3× larger datasets
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 24 / 25
Conclusions
A programming model + a runtime systemI Non-intrusive
I Easy to use
First systematic approachI Help data-parallel tasks survive memory pressure
ITask improves performance and scalabilityI On Hadoop, ITask is 62.5% faster
I On Hyracks, ITask is 34.4% faster
I ITask helps programs scale to 6.3× larger datasets
Lu Fang et al. Interruptible Tasks SOSP’15, October 7, 2015 24 / 25