the packing server for real-time scheduling of mapreduce workflows shen li, shaohan hu, tarek...
TRANSCRIPT
1
The Packing Server for Real-time Scheduling of MapReduce
Workflows
Shen Li, Shaohan Hu, Tarek AbdelzaherUniversity of Illinois at Urbana Champaign
3
Significance
Underlying Independent Scheduler
Ο1
Ο2
Οn
PackingServer 1
PackingServer 2
PackingServer nβ¦
Ο1
Ο2
Οn
β¦1. the main contribution is the
notion of a packing server.
2. Packing servers allow graphs of tasks with precedence constraints be converted to a set of budgets treated by the underlying scheduler as independent.
-This is achieved thanks to the app-level scheduler of workload inside the server
3. As a result, we are able to convert bounds from independent tasks into equivalent bounds for parallel tasks.
4. This leads to the notion of conversion bound.
5. Using this approach, we come up with bounds for parallel task models that beat the best known ones.
6. We apply to MapReduce
Utilization
Bound
Utilization
Bound
Utilization
Bound
App-Sched App-Sched App-Sched
4
Independent vs Parallel Tasks
G-EDF
G-RM
Federated
EDF-FF
EDF-FFD
RM-ST
EDZL
38.2%
50%
26.8%
In MapReduce applications: m >> 1, D >> L
(πβ2 )(1β 1π½ )+1βππ2 π
β
πβ(πβ1)
π½π
β
80%
80%
π(1β1π
)
πβ63%
π(1β 1π½ )+ 1π½π
β80%
ππ½+1π½+1π
β80%
-
π2 (1β 1π½ )+ 1π½
πβ40%
-
-
-
-
max util. of any task: assume Ξ²=5
Comparing Utilization Bound
[Li et al. ECRTCβ14][Davis et al. ACM Computing Surveysβ11]
5
The Conversion Bound
πβ π½π
π π΅Independent Task Set
Utilization Bound
Parallel Task Set
utilization Bound
Γ
π The stretch: deadline over critical path lengthβ
π½ The reverse of maximum task utilizationβ
6
An Example of Ο=30, Ξ²=5
G-EDF
G-RM
Federated
EDF-FF
EDF-FFD
RM-ST
EDZL
38.2%
50%
26.8%
80%
80%
63%
80%
80%
-
40%
-
-
-
-
Independent ParallelInterdependent
Using Conversion
-
80%Γ30β530
β67%
40%Γ30β530
β33%
80%Γ30β530
β67%
80%Γ30β530
β67%
80%Γ30β530
β67%
63%Γ30β530
β52.5%
7
Construct a Packing Server for a Pipeline
Two questions:
2. What is the conversion bound when using this technique?
1. How to schedule the pipeline in its budgets?
Di Di
Pack to min parallelism without violating deadline
8
Before Packing After Packing
The App-Scheduler
t
1. Find the time instance t such that the accumulative execution time before t equations the total budget size of the first phase.
2. Schedule each phase in its corresponding budget portions using the best-fit-like algorithm.3. For each phase, process one segment at a time. Lay each segment into the budget portions from right to left, starting from the smallest budget portion. Skip any parallelism conflict.
Budget portions
4. This algorithm guarantees to schedule every phase in its own budget portions using a simulation of the Dmax time ahead. Please refer to the paper for more details.
9
Lower bound of total WECT:β βπ
ππππ π
πβ₯ (ππβ1 ) (ππ
π½β1)β
π
π ππ
π’πβπ’π
π’π
=β
{ π|πππ<ππ }
(ππβπππ )π π
π
βπ
πππ ππ
π
# of virtual segments in phase j
β€
(ππβ1 ) β{ π|ππ
π<ππ }π π
π
βπ
ππππ π
πas ππ
π β₯1
π’πβπ’π
π’π
β€
(ππβ1) β{ π|ππ
π<ππ }π π
π
(ππβ1 )(ππ
π½β1)β
π
πππ
β€π½
ππβ π½as β
{ π|πππ<ππ }
π ππβ€β
π
π ππ
Task Οi
-utilization (ui)
Phase j
-# of segments (m )
-WCET (c )
ji
ji
The conversion bound:β π’πβ€π’π β ππ
ππβπ½β€ππ΅
π’πβ€ππ΅ β ππβ π½ππ
The Conversion Bound for M-R Pipelines
Workflow Job i
-deadline (Di)
-crit. path len (Li)
-Stretch (Οi)
-budget utilization
bound (1/Ξ²)
-# of segments (mi)
10
Transform Workflow into Pipeline
m = 3c = 7
21
21
m = 2c = 5
11
11
m = 6c = 5
51
51
m = 4c = 3
31
31
m = 3c = 3
41
41
m = 2c = 2
61
61
t0 5 10 15 20
β Introducing no computational penalty β Respecting dependencies
β Preserving critical path length
11
Summary
2. The app-scheduler schedules pipeline into budgets using underlying-scheduler simulations
πβ π½π
t
Di Di
t0 5 10 15 20
1. The packing operation packs a pipeline into minimum parallelism
3. Prove conversion bound by analyzing the upper bound of the amount of introduced virtual execution time.
4. Translate workflow into pipeline without introducing virtual computation overhead or lengthening critical path length
12
Evaluation: Algorithms
Packing server uses EDF First-Fit as the underlying scheduler. Independent tasks are partitioned into the first resource slot that does not violate 100% utilization bound.
Packing server uses GEDF as the underlying scheduler. GEDF assigns the highest priority to the job with the most urgent deadline.
The workflow with the most urgent deadline gets the highest priority.
Each high-utilization task (u β₯ 1) is assigned a set of dedicated cores and the remaining low-utilization tasks share the remaining cores.
1. Packing & EDF-FF
2. Packing & GEDF
3. GEDF
4. Federated
13
Evaluation: Compute Ξ²
Packing & EDF-FF
Packing & GEDF
π π΅β πβπ½π
=ΒΏππ½+1π(π½+1)
β πβ π½π
By taking the derivative with respect to Ξ², the highest utilization bound can be achieved at:
π½=β (π+1)(πβ1)π
β1
π π΅β πβπ½π
=ΒΏππ½βπ+1
ππ½β πβ π½π
Similarly:
π½=βπ (πβ1)π
14
Evaluation: Accepted Utilization
Workflows are generated based on Yahoo! WebScope data.
Set Ο =20, m = 500 (small granularity)
Compute Ξ² = 3.58 for Packing & EDF-FF
Ξ² = 4.47 for Packing & GEDF
Theoretical utilization bounds:
Packing & EDF-FF: 64%
Packing & GEDF: 60.3%
Federated: 50% [Li et al. ECRTCβ14]
GEDF: 38.2% [Li et al. ECRTCβ14]
Domino effect
15
Evaluation: Accepted Utilization
Workflows are generated based on Yahoo! WebScope data.
Set Ο =30, m = 500 (small granularity)
Compute Ξ² = 4.56 for Packing & EDF-FF
Ξ² = 5.47 for Packing & GEDF
Theoretical utilization bounds:
Packing & EDF-FF: 70%
Packing & GEDF: 66.9%
Federated: 50% [Li et al. ECRTCβ14]
GEDF: 38.2% [Li et al. ECRTCβ14]
Domino effect
16
Evaluation: Admission Control
Workflows are generated based on Yahoo! WebScope data.
Implemented a prototype on WOHA [Li et
al., ICDCSβ14], a variant of Hadoop
Submitted a set of tasks with a total
utilization above 100%
Admission control is enforced at the
theoretical utilization bound.
Set Ο =20, m = 160 (small granularity)
18
The Conversion Bound for M-R Pipelines
β{ πβ¨ππ
πβ₯ππβ1 }
ππππ π
π
ππβ1+ β
{ πβ¨πππ<ππβ1 }
π ππβ₯π·π
β²
To cap the max util. of resulting tasks:β π·πβ²=
π·π
π½=ππ
π½ βπ
π ππ
Together, we have:β
β₯ππ
π½ βπ
π ππββ
π
π ππ
This is a subset of phases
ΒΏ (ππ
π½β1)β
π
π ππ
Moreover:β βπ
ππππ π
πβ₯ β{ π|π π
πβ₯ππβ1 }ππ
ππ ππ
β₯ (ππβ1 )(ππ
π½β1)β
π
πππ
phases need to be packed (big)
phases need virtual segments (small)
Task Οi
-utilization (ui)
Workflow Job i
-deadline (Di)
-crit. path len (Li)
-Stretch (Οi)
-budget utilization
bound (1/Ξ²)
Phase j
-# of segments (m )
-WCET (c )
ji
ji
Find the minimum concurrency mi such that converted budgets do not violate the deadline. Then, we have:
β
β₯ (ππβ1 )(ππ
π½β1)β
π
πππ
19
The Packing Server: straightforward strategy
Budgets
The Problem:It introduces too much virtual computational overhead.
Consider a MapReduce workflow of two phases:
is bad
20
The Packing Server: fit into Hadoop
Sche
dule
Dmax
Container
Container Container
Container
Input Task Set Ο
AM1 AM2 AM3AM:
Application Master
RM RM: Resource Manager
Container
Container
Container: execute segment
Budget Schedule
Budget Schedule
Budget Schedule