analysis of user submission behavior on hpc and htc · 7/27/2016  · analysis of user submission...

27
ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC Rafael Ferreira da Silva USC Information Sciences Institute CS&T Directors' Meeting

Upload: others

Post on 22-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

ANALYSIS OF USER SUBMISSION BEHAVIORON HPC AND HTC

Rafael Ferreira da SilvaUSC Information Sciences Institute

CS&T Directors' Meeting

Page 2: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

OUTLINE

Introduction Workload Characterization User Behavior in HPC

User Behavior in HTC Summary

HPC and HTCJob SchedulersProblem Statement

Mira (ALCF)Compact Muon Solenoid (CMS)

Think TimeRuntime and Waiting TimeJob Notifications

ConclusionsFuture Research Directions

Think TimeBatches of JobsBatch-Wise Submission

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC 2Pegasus

Page 3: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

REFERENCES

Consecutive Job Submission Behavior at Mira Supercomputer

Understanding User Behavior: from HPC to HTC

S. Schlagkamp, R. Ferreira da Silva, W. Allcock, E. Deelman, and U. Schwiegelshohn25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2016

S. Schlagkamp, R. Ferreira da Silva, E. Deelman, and U. SchwiegelshohnInternational Conference on Computational Science (ICCS), 2016

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC 3Pegasus

Page 4: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

4

INTRODUCTION

Feedback-Aware Performance Evaluation of Job Schedulers

Evaluation with previously recorded workload traces

One instantiation of a dynamic processUser reaction is a mystery

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

D. G. Feitelson, 2015

Lack between theory and practiceFurther understanding of

reactions to system performanceU. Schwiegelshohn, 2014

Stable state

Systemperformance

User reaction

Generated load

Page 5: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

5

THINK TIME

>

How do users react to system performance?data-driven analysis

Pegasus

Think TimeTime between job

completion and consecutive job submission

>R. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

Page 6: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

OUTLINE

Introduction Workload Characterization User Behavior in HPC

User Behavior in HTC Summary

HPC and HTCJob SchedulersProblem Statement

Mira (ALCF)Compact Muon Solenoid (CMS)

Think TimeRuntime and Waiting TimeJob Notifications

ConclusionsFuture Research Directions

Think TimeBatches of JobsBatch-Wise Submission

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC 6Pegasus

Page 7: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

7

WORKLOAD CHARACTERISTICS

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTCPegasus

487Total Number of Users

MIRA (ALCF)

78,782Total Number of Jobs

5,698CPU hours (millions)

6,093Avg. Runtime (seconds)

786,432Total Number of Cores

49,152Total Number of Nodes

2014 Physics73 Users24,429 Jobs2,256 CPU hours (millions)7,147 Avg. Runtime (sec)

Materials Science77 Users12,546 Jobs895 CPU hours (millions)5,820 Avg. Runtime (sec)

Chemistry51 Users10,286 Jobs810 CPU hours (millions)6,131 Avg. Runtime (sec)

Page 8: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

8

WORKLOAD CHARACTERISTICS

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTCPegasus

COMPACT MUON SOLENOID (CMS)

1,435,280Total Number of Jobs

AUGUST 2014 OCTOBER 2014

392Total Number of Users

75Execution Sites

15,484Execution Nodes

792,603Completed Jobs

385,447Exit Code (!= 0)

9,444.6Avg. Runtime (sec)

55.3Avg. Disk Usage (MB)

408Total Number of Users

15,034Execution Nodes

476,391Exit Code (!= 0/)

32.9Avg. Disk Usage (MB)

1,638,803Total Number of Jobs

72Execution Sites

816,678Completed Jobs

9967.1Avg. Runtime (sec)

Page 9: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

WORKLOAD CHARACTERIZATION

9Pegasus

Jobs’ resource requirements at Mira

Job submission interarrival times per day

Job submission interarrival times per hour

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

Page 10: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

OUTLINE

Introduction Workload Characterization User Behavior in HPC

User Behavior in HTC Summary

HPC and HTCJob SchedulersProblem Statement

Mira (ALCF)Compact Muon Solenoid (CMS)

Think TimeRuntime and Waiting TimeJob Notifications

ConclusionsFuture Research Directions

Think TimeBatches of JobsBatch-Wise Submission

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC 10Pegasus

Page 11: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

11

USER THINK TIME

We only account for positive times (and less than 8 hours) between subsequent job submissions

Average think times in several traces from the Parallel Workloads Archive and Mira>

Pegasus

The user’s think time quantifies the timespan between a job completion and the submission of the next job

(by the same user)

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

No change in the past 20 years

Page 12: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

CORRELATIONS

Response TimeWaiting Time + Runtime

12

General Behavior

Pegasus

>

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

Subsequent behavior independent of the science field/application

Low values (nearly instantaneous submissions) is typically due to the user of automated scripts

Peaks (e.g., Engineering) is due to outliers (about 8h)

Page 13: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

13

CORRELATIONS

Both runtime and waiting time have equal influence on the user behavior Reducing queuing times would not significantly improve

think times for long running jobs

RU

NTI

ME

WA

ITIN

G T

IME

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

Page 14: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

14

CORRELATIONS

For small jobs (~103 nodes), average think times are relatively small (<1.5h)

For larger jobs, it substantially increases:

NO

DES

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

- Users do not fully understand the behavior of their applications as the number of cores increase

- Resource allocation for larger jobs is delayed

- Larger jobs require additional settings and refinements (increased job complexity)

>

Page 15: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

15

CORRELATIONS

Think time is heavily correlated to the workload

Workload has more impact as it also considers runtime

WO

RK

LOA

D

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

w(j) = processing time x number of nodes

>Similar conclusions to the number of nodes analysis

Page 16: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

16

ANALYSIS OF JOB CHARACTERISTICSMore complex jobs do yield higher think times, however there is a similar behavior when runtime or waiting time prevail

Think times are small when runtime prevails

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

WO

RK

LOA

D

<= 106 > 106

NO

DES

<= 512 > 512

small jobsRepresent49.2% oftotaljobs

User behavior is not impacted by the job size

less complex jobsConsumelessthan277CPUhours

- Complex jobs requires more think time- Lack of accurate runtime estimates

Page 17: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

17

ANALYSIS OF JOB NOTIFICATIONS

17,736 out of 78,782 jobs used the email notification mechanism

Average think time as a function of response time for jobs with and without notification upon job completion>

Pegasus

The overall user behavior is nearly identicalregardless of whether the user is notified

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

With notification Without notificationLAR

GE

JOB

S

Page 18: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

>

SUMMARY

DiscussionThere is no shift on the think time behavior during the past 20 years. This similar behavior is due to the current restrictive definition to model think time

Simulating submission behavior has to consider other job characteristics and system performance components

A notification mechanism has no influence on the subsequent user behavior. Thus, there is no urging to model user (un)awareness of job completion in performance evaluation simulations

18PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

User Behavior in HPCThink TimeRuntime and Waiting TimeJob Notifications

Page 19: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

OUTLINE

Introduction Workload Characterization User Behavior in HPC

User Behavior in HTC Summary

HPC and HTCJob SchedulersProblem Statement

Mira (ALCF)Compact Muon Solenoid (CMS)

Think TimeRuntime and Waiting TimeJob Notifications

ConclusionsFuture Research Directions

Think TimeBatches of JobsBatch-Wise Submission

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC 19Pegasus

Page 20: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

20

CHARACTERIZING THINK TIME

>>>HPCTightly coupled applicationsMethods: Think time

HTCEmbarrassingly parallel applications

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

atypical behavior

Page 21: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

21

BATCHES OF JOBSUser-triggered job submissions are often clustered

and denoted as batches

Pegasus

CMS10

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

CMS08MIRA

Two jobs successively submitted by a user belong to the same batch if the interarrival time between their submissions is within a threshold:Large interarrival thresholds may not capture the

actual job submission behavior in HTC systems

Page 22: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

REDEFINING THINK TIME

Think Time for HTCQuantifies the timespan between two subsequent submissions of bags of tasks

22

General Behavior

Pegasus

>

R. Ferreira da SilvaAnalysis of User Submission Behavior on HPC and HTC

Most of the jobs belonging to the same experiment and user (97%) are submitted within one minute

We use the threshold of 60s to distinguish between automated bag of tasks submissions and human-triggered submissions (batches)

Distribution of interarrival times (CDF)

Page 23: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

23

THINK TIME IN HTCBoth HTC workloads follow the same linear trend

User behavior in CMS is not strictly related to waiting time

RES

PON

SE

WA

ITIN

G T

IME

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

Batch-Wise Analysis: Lower think times when compared to standard analysis based on individual jobs HTC BoTs are comparable to HPC jobs

Page 24: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

24

ALTERNATIVE THINK TIME DEFINITIONSBexp the ground truth knowledge (from CMS traces)

CM

S 10

PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

CM

S 08

Buser bag of tasks based on jobs submitted by the same user (most common approach)

general jobs are treated individually

The subsequent think time behavior for the general behavior is closer to Bexp than the Buser>

Page 25: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

>

SUMMARY

DiscussionAlthough HTC jobs are composed of thousands of embarrassingly parallel jobs, the general human submission behavior is comparable to HPC

Additional information is required to properly identify HTC batches

Subsequent behavior in HPC is sensitive to the job complexity, while BoTsdrives the HTC behavior

There is no strong correlation between waiting and think times in the CMS experiments due to the dynamic behavior of queuing times within BoTs

25PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

User Behavior in HTCThink TimeBatches of JobsBatch-Wise Submission

Page 26: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

>

FUTURE WORK

Summary Future Research DirectionsConclusionFuture Research Directions

Extend and explore different think time definitions (e.g., based on concurrent activities)

Model think time as a function of job complexity from past job submissions

Cognitive studies: understand user reactions based on waiting times and satisfaction

In-depth characterization of waiting times in bags of tasks to improve correlation analysis between queuing time and think time

26PegasusR. Ferreira da Silva

Analysis of User Submission Behavior on HPC and HTC

Page 27: ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC · 7/27/2016  · Analysis of User Submission Behavior on HPC and HTC D. G. Feitelson, 2015 Lack between theory and practice Further

ANALYSIS OF USER SUBMISSION BEHAVIORON HPC AND HTC

Rafael Ferreira da Silva, Ph.D.Research Assistant ProfessorDepartment of Computer ScienceUniversity of Southern [email protected] – http://rafaelsilva.com

Thank You

Questions?

http://pegasus.isi.edu