upcoming biowulfseminars - nih hpc › training › handouts › effective_batch...upcoming...
Post on 05-Jul-2020
6 Views
Preview:
TRANSCRIPT
UpcomingBiowulf Seminars
• November30,1- 3pmPythoninHPCOverviewofpythontoolsusedinhighperformancecomputing,andhowtoimprovetheperformanceofyourpythonjobsonBiowulf• Jan16,1- 3pmRelion tipsandtricks,andParalleljobsandbenchmarkingMechanicsandbestpracticesforsubmiting RELIONjobstothebatchsystemfromboththecommandlineandviatheRELIONGUI,aswellasmethodsformonitoringandevaluatingtheresults.Scalingofparalleljobs,howtobenchmarktomakeeffectiveuseofyourallocatedresources
Bldg 50,Rm1227
MakingEffectiveUseoftheBiowulf BatchSystem
NIHHPCSystems
StevenFellinistaff@hpc.nih.govNIHHPCStaff,CIT
Oct30,2017
EffectiveUse==EffectiveResourceAllocation
• Specifyingresources• Estimatingrequiredresources• Allocatingresourceswithsbatch andswarm
• Monitoringresourceallocation• Schedulingandresourceallocation• Post-mortemanalysis
CPUs
CPU
HardwareTerminologyReview
Hyper-threading
Processor
Node
EstimatingResources• CPU• Checkdocumentation(https://hpc.nih.gov/apps/)• Objective-- matchCPU:Threads 1:1(thereareexceptions,e.g.,MDjobs)
• Memory• Runajoborswarmwithalargememoryallocation• Checkactualmemoryusage• Add10%toactualmemoryusage
• Time• Runajoborswarmwithalargetimeallocation• Checkactualwalltime• Add10%toactualwalltime
AllocatingResourceswithsbatch andswarm• Alljobs• --mem(sbatch)or-g(swarm)• --time(sbatch andswarm)• -btobundlecommandlines(swarm)
• Single-threadedjobs• “-p2”toloadcoreswith2threads(swarm)
• Multi-nodejobs• “ParallelJobsandBenchmarking”Jan16
• Multi-threadedjobs• --cpus-per-task(sbatch)or-t(swarm)• Use$SLURM_CPUS_PER_TASKinbatchscript• OMP_NUM_THREADS
MonitoringResourceAllocation
• CPU• jobload whilethejobisrunning• Dashboardduringorafterthejob• (NoeasywaytomonitorGPUutilizationatthemoment)
• Walltime• jobhist,Dashboardorsacct duringorafterthejobhascompleted
• Memory• jobload whilethejobisrunning• jobhist,Dashboardorsacct duringorafterthejobhascompleted
jobload% jobload -u someuserJOBID TIME NODES CPUS THREADS LOAD MEMORY
Elapsed / Wall Alloc Active Used / Alloc51863534 6-22:08:01 / 10-00:00:00 cn3095 4 4 100% 1.0 / 8.0 GB51863535 6-22:08:01 / 10-00:00:00 cn3256 4 5 125% 0.9 / 8.0 GB51863536 6-22:08:01 / 10-00:00:00 cn3348 4 1 25% 1.0 / 8.0 GB51863537 6-22:08:01 / 10-00:00:00 cn3401 4 3 75% 0.9 / 8.0 GB 51881591 6-19:42:16 / 10-00:00:00 cn3097 4 1 25% 1.0 / 8.0 GB
% jobload -j 51874438_233 JOBID TIME NODES CPUS THREADS LOAD MEMORY
Elapsed/Wall Alloc Active Used/Alloc51874438_233 6-20:10:13/10-00:00:00 cn3105 2 1 50% 0.5/ 1.5 GB
jobhist
# jobhist 52102264_67Jobid Partition State Nodes CPUs Walltime Runtime MemReq MemUsed Nodelist52102264_67 norm COMPLETED 1 2 02:00:00 00:05:29 4.0GB/node 0.8GB cn3185
allocated
used
sacct
% sacct --format=Jobname,AllocCPUS,AllocNodes,ReqMem,MaxRSS,Elapsed -j 52102332 JobName AllocCPUS AllocNodes ReqMem MaxRSS Elapsed ---------- ---------- ---------- ---------- ---------- ----------tbss_2_reg 2 1 4Gn 00:05:29 batch 2 1 4Gn 815152K 00:05:29
UsingYourDashboardtoMonitorJobshttps://hpc.nih.gov
https://hpc.nih.gov/dashboard/
SchedulingandResourceAllocation
• Schedulingisdeterminedbyjobpriority• PriorityisdeterminedbyFairshare valueofuser• Fairshare isdeterminedbyrecentcpu andmemoryallocations ofrunningjobs
• Unnecessarilylongtimeallocationwillpreventjobsfrombeingbackfilled
• ‘freen’showsfreeCPUsbutnotfreememoryordisk• Otherjobshavehigherpriority(sprio)• Nodesarereservedforhigher-priorityjobs
Whyaremyjobspending?
Consequencesof…
Specifyingmoreresourcesthanneeded
Specifyingfewerresourcesthanneeded
CPU WastedCPUresources,possiblyunnecessaryschedulingdelays
Jobrunsalittle/a lotslower
Memory Wastedmemoryresources,possiblyunnecessaryschedulingdelays
Jobis“Killed” bythekernel
Time Possiblyunnecessaryschedulingdelays
Jobiskilledbythebatch system
Post-mortemofjobsusinguserDashboardOr
TheGood,theBad,
andtheUgly…
Comment:jobisrunningwithdefaultallocationsforCPUandmemoryRecommendation:ifasubjob ofalargeswarm,try“-p2”
A+
AnotherA+
Recommendation:reduceCPUallocation
A+
Comment:perfectCPUutilization;underutilizedmemorybutentirenodeisallocatedduetocpu allocation
Comment:goodoverallutilization;possiblysplitintotwojobswithadependency,andwithdifferingresourceallocations
Comment:8CPUstoolittle/toomuch,2woulddoRecommendation:couldruninhalfthememory
Comment:goodmemoryutilizationRecommendation:increasingCPUallocationprobablywon’thelp
Comment:CPUsbadlyoverloadedRecommendation:couldruninlessthanhalfthememory
Comment:CPUsoverloaded200%Recommendation:256GBmemoryallocated,MBsused
Comment:goodmemoryutilizationRecommendation:mightrunfasterwith32CPUs?
Recommendation:increasingCPUcountmightimprove?
Recommendation:allocating56CPUswouldlikelyhelp
staff@hpc.nih.gov
top related