compute grid - harvard business...
TRANSCRIPT
ComputeGrid:ParallelProcessing
RCSLunch&LearnTrainingSeries
BobFreeman,PhDDirector,ResearchTechnologyOperations
HBS
8November,2017
Overview• Q&A• Introduction• Serialvsparallel• ApproachestoParallelization• Submittingparalleljobsonthecomputegrid• Paralleltasks• ParallelCode
SerialvsParallelwork
Traditionally,softwarehasbeenwrittenforserialcomputers• Toberunonasinglecomputer havingasingleCentralProcessingUnit(CPU)• Problemisbrokenintoadiscretesetofinstructions• Instructionsareexecutedoneaftertheother• Oneoneinstruction canbeexecutedatanymomentintime
SerialvsMulticoreApproaches
Inthesimplestsense,parallelcomputingisthesimultaneoususeofmultiplecomputeresources tosolveacomputationalproblem:• ToberunusingmultipleCPUs• Aproblemisbrokenintodiscreteparts(eitherbyyouortheapplicationitself)thatcanbe
solvedconcurrently• Eachpartisfurtherbrokendowntoaseriesofinstructions• InstructionsfromeachpartexecutesimultaneouslyondifferentCPUsordifferentmachines
SerialvsMulticoreApproaches
Manydifferentparallelizationapproaches,whichwewon'tdiscuss:
6
Sharedmemory Distributedmemory
HybridDistributed-Sharedmemory
SerialvsMulticoreApproaches
So,wearegoingtobrieflytouchontwoapproaches:
• Paralleltasks• Tasksinthebackground• gnu_parallel• Pleasantlyparallelizing
• Parallelcode• Considerationsforparallelizing• Parallelframeworks&examples
WewillnotdiscussparallelizedframeworkssuchasHadoop,ApacheSpark,MongoDB,ElasticSearch,etc
ParallelProcessing…
NotaBene!!• Inordertoruninparallel,programs(code)mustbeexplicitlyprogrammedtodoso.• Andyoumustasktheschedulertoreservethosecoresforyourprogram/worktouse.
Thus,requestingcoresfromtheschedulerdoesnotautomagicallyparallelizeyourcode!# SAMPLE JOB FILE#!/bin/bash#BSUB -q normal # Queue to submit to (comma separated)#BSUB -n 8 # Number of cores...blastn –query seqs.fasta –db nt –out seqs.nt.blastn # WRONG!!blastn –query seqs.fasta –db nt –out seqs.nt.blastn –num_threads $LSB_MAX_NUM_PROCESSORS# YES!!
# SAMPLE PARALLELIZED CODEbsub –q normal –n 4 –W 24:00 -R "rusage[mem=4000]" stata-mp4 –d myfile.do
# SAMPLE PARALLEL TASKSbsub –q normal –n 4 –W 24:00 -R "rusage[mem=4000]" \
parallel –joblog .log --outputasfiles –j\$LSB_MAX_NUM_PROCESSORS :::: tasklist.txt
# SAMPLE PLEASANT PARALLELIZATIONfor file in folder/*.txt; do
echo $filebsub -q normal -W 24:00 -R "rusage[mem=1000]" python process_input_data.py $file
done
ParallelJobsontheComputeGrid…
ParallelTasks
Shells,bydefault,havetheabilitytomultitask:doingmorethanonethingatatimeInBASH,thiscanbeaccomplishedbysendingacommandtothebackground:• Explicitly,with&• Afterthefact,with^Zandbg
Whenyouputataskinthebackground• Thetaskkeepsrunning,whileyoucontinuetoworkattheshellintheforeground• Ifanyoutputisdone,itappearsonyourscreenimmediately• Ifinputisrequired,theprocessprintsamessageandstops• Whenitisdone,amessagewillbeprinted
Backgroundtasks
FromProcesses&JobControl:http://slideplayer.com/slide/4592906/
11
GNU parallel isashelltoolforexecutingjobsinparallelusingoneormorecomputers.• singlecommandorsmallscriptthathastoberunforeachofthelinesintheinput.• typicalinputisalistoffiles,alistofhosts,alistofusers,alistofURLs,oralistof
tables.• Manyoptionsforworkingwithcontrolandoutputofresults• Canspecifythedegreeofparallelization
# create list of files to unzip
for index in `seq 1 100`; do echo "unzip myfile$index.zip" >> tasklist.txt; done
# Ask the compute cluster to do this for me in parallel, using 4 CPU/coresbsub –q normal –n 4 –W 2:00 -R "rusage[mem=4000]" \
parallel –joblog .log --outputasfiles –j\$LSB_MAX_NUM_PROCESSORS :::: tasklist.txt
GnuParallel Approach
Problem:HowdoIBLAST200,000transcriptsagainstNR?Solution:Fake aparallelBLAST.Buthow?• Divideyourinputfileinton separatefiles• BLASTeachsmallerinputfileonaseparatecore• Runningonn coreswillbealmostexactlyasn timesfaster!
Why?• Eachcoredoesn'tneedtotalktooneanother• Youcouldsubmitn jobsindividually,butnotrecommended• Usemoresophisticatedtechniques:
jobarrays,gnu_parallel,GridRunner• Shouldn'tconfusethiswithtrulyparallelmpiBLAST
Theefficiencyofyourworkdependsonhowparallelizedyoumakeyourtask:• Youwanttoensurethatyourjobsspendmostoftheirtime
computing,andnotinthequeueordoingcomputeprep
12
schedule BLASTmodule load Jobfinish
schedule BLASTmodule load Jobfinishversus X100??Whatwouldyouchoose?
ConceptofPleasantParallelization
• SplitinputfileintoNfilesthatrun1to6hrs each• Canbedonewithperl orpythonscript,unix split,etc• Userscriptparsesthedatafile whosenameispassedasthecommandparameter
for file in my*.datdo
echo $filebsub –q normal –W 6:00 -R "rusage[mem=1000]" \
python process_data_file.py $filesleep 1
done
Foradvancedusers,onecansubmitthisasonejobinajobarray,afeatureonmostschedulers
# create script for job array (process_data_file_array.py)# and now submit file as job array
num_files=`wc –l < $( ls -1 my*.dat )`bsub –J myarray[1-$num_files] –q normal –W 6:00 -R "rusage[mem=1000]" \
python process_data_file_array.py
Thisprocessisidealforseriallynumberedfiles,parametersweeps,&optimizationroutines!!13
Manual(Script)Approach
ParallelCode
Can my code be parallelized?
� Does it have large loops that repeat the same operations?
� Does your code do multiple tasks that are not dependent on one another? If so is the dependency weak?
� Can any dependencies or information sharing be overlapped with computation? If not, is the amount of communications small?
� Do multiple tasks depend on the same data?
� Does the order of operations matter? If so how strict does it have to be?
23
Basic guidance for efficient parallelization:
� Is it even worth parallelizing my code?
� Does your code take an intractably long amount of time to complete?
� Do you run a single large model or do statistics on multiple small runs?
� Would the amount of time it take to parallelize your code be worth the gain in speed?
� Parallelizing established code vs. starting from scratch
� Established code: Maybe easier / faster to parallelize, but my not give good performance or scaling
� Start from scratch: Takes longer, but will give better performance, accuracy, and gives the opportunity to turn a “black box” into a code you understand
24
Basic guidance for efficient parallelization:
� Increase the fraction of your program that can be parallelized. Identify the most time consuming parts of your program and parallelize them. This could require modifying your intrinsic algorithm and code’s organization
� Balance parallel workload
� Minimize time spent in communication
� Use simple arrays instead of user defined derived types
� Partition data. Distribute arrays and matrices – allocate specific memory for each MPI process
25
Designing parallel programs - partitioning:
31
One of the first steps in designing a parallel program is to break the problem into discrete “chunks” that can be distributed to multiple parallel tasks.
Domain Decomposition: Data associate with a problem is partitioned – each parallel task works on a portion of the data
There are different ways to partition the data
Designing parallel programs - partitioning:
32
One of the first steps in designing a parallel program is to break the problem into discrete “chunks” that can be distributed to multiple parallel tasks.
Functional Decomposition: Problem is decomposed according to the work that must be done. Each parallel task performs a fraction of the total computation.
Designing parallel programs - communication:
33
Most parallel applications require tasks to share data with each other. Cost of communication: Computational resources are used to package and transmit data. Requires frequently synchronization – some tasks will wait instead of doing work. Could saturate network bandwidth. Latency vs. Bandwidth: Latency is the time it takes to send a minimal message between two tasks. Bandwidth is the amount of data that can be communicated per unit of time. Sending many small messages can cause latency to dominate communication overhead. Synchronous vs. Asynchronous communication: Synchronous communication is referred to as blocking communication – other work stops until the communication is completed. Asynchronous communication is referred to as non-blocking since other work can be done while communication is taking place. Scope of communication: Point-to-point communication – data transmission between tasks. Collective communication – involves all tasks (in a communication group) This is only partial list of things to consider!
Designing parallel programs – load balancing:
34
Load balancing is the practice of distributing approximately equal amount of work so that all tasks are kept busy all the time.
How to Achieve Load Balance? Equally partition the work given to each task: For array/matrix operations equally distribute the data set among parallel tasks. For loop iterations where the work done for each iteration is equal, evenly distribute iterations among tasks. Use dynamic work assignment: Certain class problems result in load imbalance even if data is distributed evenly among tasks (sparse matrices, adaptive grid methods, many body simulations, etc.). Use scheduler – task pool approach. As each task finishes, it queues to get a new piece of work. Modify your algorithm to handle imbalances dynamically.
Designing parallel programs – I/O:
35
The Bad News: � I/O operations are inhibitors of parallelism � I/O operations are orders of magnitude slower than memory operations � Parallel file systems may be immature or not available on all systems � I/O that must be conducted over network can cause severe bottlenecks
The Good News: � Parallel file systems are available (e.g., Lustre) � MPI parallel I/O interface has been available since 1996 as a part of MPI-2 I/O Tips: � Reduce overall I/O as much as possible � If you have access to parallel file system, use it � Writing large chunks of data rather than small ones is significantly more efficient � Fewer, larger files perform much better than many small files � Have a subset of parallel tasks to perform the I/O instead of using all tasks, or � Confine I/O to a single tasks and then broadcast (gather) data to (from) other tasks
• C/C++• Fortran• MATLAB• Python• R• Perl• Julia• Scala• ….
LanguagesthatuseParallelComputing
• Bydefault,R,Python,Perl,andMATLAB*arenotmultithreaded…sodonotaskforortrytousemorethan1core/CPU!!
• Foralltheseprograms,youcannotusethedrop-downGUImenus,andyoumustsetthe#ofCPUs/coredynamically!DONOTUSESTATICVALUES!
• ForR,youcanuseappropriateroutineswithRparallel• Nowpartofbase-R• IncludesRforeach,RdoMC,orRsnow
• ForPython,youcanusethemultiprocessinglibrary(ormanyothers)• ForPerl,there'sthreadsorParallel::ForkManager• MATLABhasparpool,anddonotsettheworkerthreadcountinGUIsettings
# R example (parallel.R)library(doMC)mclapply(seq_len(), run2, mc.cores = Sys.getenv('LSB_MAX_NUM_PROCESSORS'))
bsub –q normal –n 4 -app R-5g R CMD BATCH parallel.R # custom submission command
# MATLAB example (parallel.m)hPar = parpool( 'local' , str2num( getenv('LSB_MAX_NUM_PROCESSORS') ) );…
matlab-5g –n4 parallel.m # uses command-line wrapper
ParallelOptionsinR,Python,&MATLAB
Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-processing
Stata/MP Performance Report Summary (1)
1 Summary
Stata/MP1 is the version of Stata that is programmed to take full advantage of multicore and multipro-cessor computers. It is exactly like Stata/SE in all ways except that it distributes many of Stata’s mostcomputationally demanding tasks across all the cores in your computer and thereby runs faster—muchfaster.
In a perfect world, software would run 2 times faster on 2 cores, 3 times faster on 3 cores, and soon. Stata/MP achieves about 75% e�ciency. It runs 1.7 times faster on 2 cores, 2.4 times faster on4 cores, and 3.2 times faster on 8 cores (see figure 1). Half the commands run faster than that. Theother half run slower than the median speedup, and some of those commands are not sped up at all,either because they are inherently sequential(most time-series commands) or because theyhave not been parallelized (graphics, mixed).
In terms of evaluating average performanceimprovement, commands that take longer torun—such as estimation commands—are ofgreater importance. When estimation com-mands are taken as a group, Stata/MP achievesan even greater e�ciency of approximately85%. Taken at the median, estimation com-mands run 1.9 times faster on 2 cores, 3.1 timesfaster on 4 cores, and 4.1 times faster on 8cores. Stata/MP supports up to 64 cores.
This paper provides a detailed report onthe performance of Stata/MP. Command-by-command performance assessments are pro-vided in section 8.
Median performance(estimation)
Median performance(all commands)
Logisticregression
Theoreticalupper bound
Lower bound (no improvement)1
2
4
8
Sp
ee
d r
ela
tive
to
sp
ee
d o
f si
ng
le c
ore
1 2 4 8Number of cores
Possible performance region
Figure 1. Performance of Stata/MP. Speed onmultiple cores relative to speed on a single core.
1. Support for this e↵ort was partially provided by the U.S. National Institutes of Health, National Institute on Aginggrants 1R43AG019542-01A1, 2R44AG019542-02, and 5R44AG019542-03. We also thank Cornell Institute for Social andEconomic Research (CISER) at Cornell University for graciously providing access to several highly parallel SMP platforms.CISER sta↵, in particular John Abowd, Kim Burlingame, Janet Heslop, and Lars Vilhuber, were exceptionally helpful inscheduling time and helping with configuration. The views expressed here do not necessarily reflect those of any of theparties thanked above.
Revision 3.0.1 30jan2016
Stataoffersa293-pagereportonitsparallelizationefforts.Theyareprettyimpressive.However:
Example:StataParallelization
Withmultiplecores,onemightexpecttoachievethetheoreticalupperboundofdoublingthespeedbydoublingthenumberofcores—2coresruntwiceasfastas1,4runtwiceasfastas2,andsoon.However,therearethreereasonswhysuchperfectscalabilitycannotbeexpected: 1)somecalculationshavepartsthatcannotbepartitionedintoparallelprocesses;2)evenwhentherearepartsthatcanbepartitioned,determininghowtopartitionthemtakescomputertime;and3)multicore/multiprocessorsystemsonlyduplicateprocessorsandcores,notalltheothersystemresources.
Stata/MPachieved75%efficiencyoveralland85%efficiencyamongestimationcommands.
Speedismoreimportantforproblemsthatarequantifiedaslargeintermsofthesizeofthedatasetorsomeotheraspectoftheproblem,suchasthenumberofcovariates.Onlargeproblems,Stata/MPwith2coresrunshalfofStata’scommandsatleast1.7timesfasterthanonasinglecore.With4cores,thesamecommandsrunatleast2.4timesfasterthanonasinglecore.
Thisparallelizationbenefitismostlyrealizedinbatchmode…mostofinteractiveStataiswaitingforuserinput(orleftidle),asCPUefficiency
istypically<5%- 10%
• Bydefault,R,Python,Perl,andMATLAB*arenotmultithreaded…sodonotaskforortrytousemorethan1core/CPU!!
• ForR,youcanuseappropriateroutineswithRparallel• Nowpartofbase-R• IncludesRforeach,RdoMC,orRsnow• multicore baseenableparallelizationthroughtheapply()functions,butwillnotworkonWindows
systemsduetohowparallelizationisachieved(nofork())
# R example (parallel.R)library(doMC)mclapply(seq_len(), run2, mc.cores = Sys.getenv('LSB_MAX_NUM_PROCESSORS'))
bsub –q normal –n 4 -app R-5g R CMD BATCH parallel.R # custom submission command
ParallelProcessinginR
Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-r
# library(parallel): snow single-node parallel cluster
library(parallel)# wraps the makeSOCKcluster() function and launches the specified number # of R processes on the local machinecluster <- makeCluster(Sys.getenv('LSB_MAX_NUM_PROCESSORS'))# one must explicitly make vars/functions available in the sub-processes.
clusterExport(cluster, list('myProc'))# now result <- clusterApply(cluster, 1:10, function(i) myProc())ResultstopCluster(cluster)
bsub –q normal –n 4 -app R-5g R CMD BATCH parallel_snow.R # custom submission command
# library(parallel): foreach + multicore
library(foreach)library(doMC)
registerDoMC(cores=Sys.getenv('LSB_MAX_NUM_PROCESSORS'))result <- foreach(i=1:10, .combine=c) %dopar% {
myProc()}result
bsub –q normal –n 4 -app R-5g R CMD BATCH parallel_foreach.R # custom submission command
ParallelProcessinginR
Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-r
Bydefault,R,Python,Perl,andMATLAB*arenotmultithreaded…sodonotaskforortrytousemorethan1core/CPU!!• Pythonhas‘multiprocessing’module• Evolvedfromthethreadingmodule• Usessubprocesses,insteadofthreads,tobypasspython’sGlobalInterpreterLock• Richsubclasses,includingPooloffersaconvenientmeansofparallelizingtheexecutionofa
functionacrossmultipleinputvalues,distributingtheinputdataacrossprocesses(dataparallelism).
• RunsonbothUnix&Windowssystems
‘Multiprocessing’inPython
import multiprocessing, os
def worker(num):
"""thread worker function""”
print 'Worker:', num
return
if __name__ == '__main__':
jobs = []
cores = os.environ['LSB_MAX_NUM_PROCESSORS']
for i in range(cores):
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
$ python multiprocessing_simpleargs.py
Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4
bsub –q normal –n 5 –W 1:00 -R "rusage[mem=1000]" python parallel_workers.py
‘Multiprocessing’inPython
Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-processing
Bydefault,R,Python,Perl,andMATLAB*arenotmultithreaded…sodonotaskforortrytousemorethan1core/CPU!!• MATLABhasparpool,theParallelComputingToolbox(PCT)standardonallinstallations• Thisoperatesonasinglemachine!• Onsomesystems(e.g.FASRC'sOdyssey),workerscanbespawnedacrossmultiplemachinesforlarge
scalework• DCS:DistributedComputingToolbox
# MATLAB example (parallel.m)hPar = parpool( 'local' , str2num( getenv('LSB_MAX_NUM_PROCESSORS') ) );
R = 1; darts = 1e7; count = 0; % Prepare settingsparfor i = 1:darts
x = R * rand(1);y = R * rand(1);if x^2 + y^2 <= R^2
count = count + 1end
endmyPI = 4 * count / darts;
% Log results & close down parallel poolfprintf( hLog , ‘The computed value of pi is %2.7f\n’ , myPI );delete(gcp);
matlab-5g –n4 par_compute_pi.m # command-line wrapper bsub –n 4 –q normal –W 2:00 -R "rusage[mem=1000]" matlab par_compute_pi.m # custom submission
MulticoreOptionsMATLAB
Seemoreinfoonourwebsiteathttp://grid.rcs.hbs.org/parallel-processing
OtherImportantPoints&Troubleshooting
Choosingcorecountcanbedifficult,especiallyifthere'samixofserialandparallelsteps….
• Thinkabouthowlongyourcodewillbeineithermodes• Determinethefractionresourceuseacrossthewholejob• If<20%inmulticoreuse,thensplitupthetasksintotwoseparatejobs• Canusejobdependenciestomakesubmissioneasier
32
+
OK,canrunasonelongjob
MixedMulticoreandSerialWorkflows
Notallprogramscanbescaledwell.Thisisdueto• Overheadofprogramstart• Overheadofcommunicationbetweenprocesses(threads)withintheprogram• (worse:)Waitingtowritetothenetworkordisk(I/O)• Other,serialpartsoftheprogram(partsthatcannotbeparallelized)
Scalingtestsareimportanttohelpyoudeterminetheoptimal#ofcorestouse!!
33
ScalingTestsEnsuresEfficiency
34
# Create a SLURM script for an analysis that can be used for multiple CPU (core) values
# Input seqs.fa file has 350 FASTA sequences so we can get good parallelization values:
-- file: blast_scale_test.slurm ---#!/bin/bash##SBATCH -p serial_requeue # Partition to submit to (comma separated)#SBATCH -J blastx # Job name#SBATCH -N 1 # Ensure that all cores are on one machine#SBATCH -t 0-4:00 # Runtime in D-HH:MM (or use minutes)#SBATCH --mem 10000 # Memory pool in MB for all cores#SBATCH --mail-type=END,FAIL # Type of email notification: BEGIN,END,FAIL,ALL
source new-modules.sh; module load ncbi-blast/2.2.31+-fasrc01export BLASTDB=/n/regal/informatics_public/
blastx –in seqs.fa -db $BLASTDB/custom/other/model_chordate_proteins \-out sk_shuffle_seqs.n${1}.modelchordate.blastx -num_threads $1
-----
# and now submit file multiple times with different core valuesfor i in 1 2 4 8 16; do
echo $i# sbatch flags here will override those in the SLURM submission scriptsbatch -n $i -J blastx$i -o blastx_n$i.out -e blastx_n$i.err blast_scale_test.slurm $isleep 1
done
YourOwnScalingTests!
35
[bfreeman@rclogin04 ~]$ sacct -u bfreeman -S 4/6/16 --format=jobid,\elapsed,alloccpus,cputime,totalcpu,state
JobID Elapsed AllocCPUS CPUTime TotalCPU State ------------ ---------- ---------- ---------- ---------- ----------59817008 16:12:26 1 16:12:26 16:03:34 COMPLETED 59817008.ba+ 16:12:26 1 16:12:26 16:03:34 COMPLETED 59817024 10:49:16 2 21:38:32 17:53:07 COMPLETED 59817024.ba+ 10:49:16 2 21:38:32 17:53:07 COMPLETED 59817026 06:03:38 4 1-00:14:32 15:56:55 COMPLETED 59817026.ba+ 06:03:38 4 1-00:14:32 15:56:55 COMPLETED 59817028 04:55:44 8 1-15:25:52 21:27:30 COMPLETED 59817028.ba+ 04:55:44 8 1-15:25:52 21:27:30 COMPLETED 59817043 03:01:51 16 2-00:29:36 1-01:33:03 COMPLETED 59817043.ba+ 03:01:51 16 2-00:29:36 1-01:33:03 COMPLETED 59847485 02:04:58 32 2-18:38:56 1-11:42:36 COMPLETED 59847485.ba+ 02:04:58 32 2-18:38:56 1-11:42:36 COMPLETED
1 2 4 8 16 32Elapsed 16:12:26 10:49:16 6:03:38 4:55:44 3:01:51 2:04:58Ideal 16:12:26 8:06:13 4:03:07 2:01:33 1:00:47 0:30:23CPUTime 16:12:26 21:38:32 24:14:32 39:25:52 48:29:36 66:38:56Ideal 16:12:26 16:12:26 16:12:26 16:12:26 16:12:26 16:12:26NoGain 16:12:26 32:24:52 64:49:44 129:39:28 259:18:56 518:37:52TotalCPU 16:03:34 17:53:07 15:56:55 21:27:30 25:33:03 35:42:36Ideal 16:03:34 16:03:34 16:03:34 16:03:34 16:03:34 16:03:34NoGain 16:03:34 32:07:08 64:14:16 128:28:32 256:57:04 513:54:08
YourOwnScalingTestResults!
36
1 2 4 8 16 32
Elapsed 16:12:26 10:49:16 6:03:38 4:55:44 3:01:51 2:04:58
Ideal 16:12:26 8:06:13 4:03:07 2:01:33 1:00:47 0:30:23
CPUTime 16:12:26 21:38:32 24:14:32 39:25:52 48:29:36 66:38:56
Ideal 16:12:26 16:12:26 16:12:26 16:12:26 16:12:26 16:12:26
NoGain 16:12:26 32:24:52 64:49:44 129:39:28 259:18:56 518:37:52
TotalCPU 16:03:34 17:53:07 15:56:55 21:27:30 25:33:03 35:42:36
Ideal 16:03:34 16:03:34 16:03:34 16:03:34 16:03:34 16:03:34
NoGain 16:03:34 32:07:08 64:14:16 128:28:32 256:57:04 513:54:08
0:14:24
2:24:00
0:00:001 2 4 8 16 32
Elapsed
Ideal
2:24:00
24:00:00
240:00:00
2400:00:00
1 2 4 8 16 32
CPUTime
Ideal
NoGain
2:24:00
24:00:00
240:00:00
2400:00:00
1 2 4 8 16 32
TotalCPU
Ideal
NoGain
YourOwnScalingTestResults!
RCS Website&Documentation-- onlyauthoritativesourcehttps://grid.rcs.hbs.org/
Submitahelprequest [email protected]
Bestwaytohelpustohelpyou?Giveus...DescriptionofproblemAdditionalinfo(login/batch?queue?JobIDs?)StepstoReproduce(1.,2.,3...)ActualresultsExpectedresults
GettingHelp
• Pleasetalktoyourpeers,and…• Wewishyousuccessinyourresearch!
• http://intranet.hbs.edu/dept/research/• https://grid.rcs.hbs.org/• https://training.rcs.hbs.org/
• @hbs_rcs
ResearchComputingServices