opportune job shredding: an efficient approach for scheduling parameter sweep applications
Post on 23-Feb-2016
42 Views
Preview:
DESCRIPTION
TRANSCRIPT
Opportune Job Shredding:An Efficient Approach for
Scheduling Parameter Sweep Applications
Rohan Kurian, Pavan Balaji, P. Sadayappan
The Ohio State University
Parameter Sweep Applications
An important class of applicationsSet of independent tasksMCell Application
3D simulations for sub-cellular architecture/physiologyGTOMO (Parallel Tomography) Application
Multiple view-point simulation
Systems exist for scheduling on the Grid Cluster-based Scheduling?
Application Level Schedulers
Manage the scheduling of applicationsBreak the application to appropriate
chunksAPST (AppLeS Parameter Sweep Template)NIMROD
Greedy approach to schedule PSA chunks
Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions
Job Scheduling in Clusters Mapping arriving jobs to available resources Multiple Schemes for Scheduling
First Come First Serve (FCFS) Conservative Scheduling Aggressive or EASY Scheduling
Fair-Share Constraints A user can not have more than ‘N’ queued jobs
Submitting the multiple chunks of a PSA job Violation of Fair-Share constraints Combine chunks to form a single parallel job
Formation of PSAs in ClustersSmall
Independent Tasks
Parallel Parameter
Sweep Application
Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions
Multi-Site Job SchedulingMultiple Simultaneous Requests
Job submitted to multiple sitesStarted on the earliest clusterExisting schemes have limitations
Heterogeneous ClustersDifferent Scheduling Schemes
Multiple-simultaneous-requests
Meta Scheduler
Local Scheduler
Meta Scheduler
Local Scheduler
Meta Scheduler
Local Scheduler
Jobs
Jobs
JobsSite 1 Site 2
Site 3
Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions
PSA Scheduling Strategies Flooding based Job Shredding
Submit all chunks in the PSA at onceGreedy approach Improves User and System metricsDoesn’t ensure fairness to Non-PSA jobs
Opportune Job ShreddingUses an additional Application-Level Scheduler
Monitors the current schedule of the system If no normal backfill is possible
Allow PSA jobs to shred and backfill
Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions
Multi-Site Scheduling for PSAsTwo-level Application Level SchedulersNo constraints on sites
Allowed to have different speedsAllowed to have different scheduling
policiesSimilar to “Multiple Simultaneous
Requests”Simultaneous requests only for PSAs
Multi-Site Scheduling for PSAs
App-Level Scheduler
Job Queue Local Scheduler
App-Level Scheduler
Job Queue Local Scheduler
App-Level Scheduler
Job Queue Local Scheduler
MetaApplication-Level
Scheduler
Site 1
Site 2
Site 3
Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions
Performance MetricsResponse Time
Completion Time – Submit TimeSlowdown
Response Time / RuntimeLoss of Capacity (LOC)
LOC = min {(waiting jobs procs), idle procs}
T = Time for which this state lastsLOC = LOC x T
Evaluation Scheme Simulation based Approach CTC trace from Feitelson’s archive EASY backfilling used For multi-site evaluation
CTC traces from 3 different monthsProcessing speeds in the ratio 2:1:3
Flooding Based Job ShreddingAverage Slowdown (10% PSA Jobs)
-150
-100
-50
0
50
100
1 1.2 1.5
LoadP
erce
ntag
e de
crea
se
All Jobs PSA Jobs Non-PSA Jobs
Average Response Time(10% PSA Jobs)
-20
0
20
40
60
80
1 1.2 1.5
Load
Per
cent
age
decr
ease
All Jobs PSA Jobs Non-PSA Jobs
• Up to 60% improvement for PSA Jobs• Up to 90% worse performance for Non-PSA
Jobs
Flooding: Job Category wise breakup
Average Response Time(10% PSA Jobs)
-100
-80
-60
-40
-20
0
20
1 1.2 1.5
Load
Per
cent
age
decr
ease
NarrowShort NarrowLongWideShort WideLong
Average Slowdown(10% PSA Jobs)
-140-120-100
-80-60-40-20
02040
1 1.2 1.5
LoadP
erce
ntag
e de
crea
seNarrowShort NarrowLongWideShort WideLong
• Narrow Short Non-PSA jobs suffer most• Loss of back-filling opportunities is the main
reason
Flooding: Loss of CapacityLoss Of Capacity (10% PSA jobs)
0
10
20
30
40
50
60
70
80
1 1.2 1.5
Load
Per
cent
age
decr
ease
10% PSA Jobs
• Up to 75% improvement in the Loss of Capacity
Opportune Job ShreddingAverage Response Time
(10% PSA Jobs)
-2
0
2
4
6
8
10
1 1.2 1.5
Load
Per
cent
age
decr
ease
All Jobs PSA Jobs Non-PSA Jobs
Average Slowdown(10% PSA Jobs)
-100
1020304050607080
1 1.2 1.5Load
Per
cent
age
decr
ease
All Jobs PSA Jobs Non-PSA Jobs
• Up to 70% improvement for PSA Jobs• Less than 2% worsening in performance for Non-
PSA Jobs
Opportune: Job Category wise breakup
Average Response Time(10 % PSA Jobs)
-3
-2-1
01
23
4
1 1.2 1.5
Load
Per
cent
age
decr
ease
NarrowShort NarrowLongWideShort WideLong
Average Slowdown (10% PSA Jobs)
-8
-6
-4
-2
0
2
4
1 1.2 1.5
LoadP
erce
ntag
e de
crea
seNarrowShort NarrowLongWideShort WideLong
• No category of Non-PSA jobs suffers more than 7%
Opportune: Loss of CapacityLoss Of Capacity (10% PSA Jobs)
02468
101214
1 1.2 1.5
Load
Per
cent
age
decr
ease
10% PSA Jobs
• Up to 12% improvement in the Loss of Capacity
Opportune (Multi-Site)Average Response Time
(10% PSA Jobs)
0102030405060708090
1 1.2 1.5Load
Perce
ntag
e dec
reas
e
PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3
Average Slowdown (10% PSA Jobs)
-40-20
020406080
100120
1 1.2 1.5
LoadPe
rcent
age d
ecre
ase
PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Response Time
Average Response Time (10% PSA Jobs)
0102030405060708090
1 1.2 1.5Load
Perce
ntag
e dec
reas
e
PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 75% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Slowdown
Average Slowdown (10% PSA Jobs)
-40-20
020406080
100120
1 1.2 1.5
Load
Perce
ntag
e dec
reas
e
PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3
• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs
Opportune (Multi-Site):Loss of Capacity
Loss Of Capacity (10% PSA Jobs)
05
101520253035404550
1 1.2 1.5
Load
Per
cent
age
decr
ease
Cluster1Cluster2Cluster3
• Up to 45% improvement in the Loss of Capacity
Concluding RemarksOpportune Job Shredding
Efficient Scheduling of PSAsSingle Site and Multi-Site versionsSignificant improvement for PSA jobsEnsures that Non-PSA jobs are not affected
Plan to integrate this with Prod. Schedulers
Thank You!
top related