lecture 17: analytical modeling of parallel programs ... · asymptotic analysis of parallel...
TRANSCRIPT
Lecture17:AnalyticalModelingofParallelPrograms:Scalability
1
CSCE569ParallelComputing
DepartmentofComputerScienceandEngineeringYonghong Yan
[email protected]://cse.sc.edu/~yanyh
TopicOverview
• Introduction• PerformanceMetricsforParallelSystems– ExecutionTime,Overhead,Speedup,Efficiency,Cost• Amdahl’sLaw• ScalabilityofParallelSystems– Isoefficiency MetricofScalability• MinimumExecutionTimeandMinimumCost-OptimalExecutionTime
• AsymptoticAnalysisofParallelPrograms• OtherScalabilityMetrics– Scaledspeedup,Serialfraction
2
SpeedupandEfficiency
3
Amdahl’sLawSpeedup
4
ScalabilityofParallelSystems
• Scalability:Thepatternsofspeedup– Howtheperformanceofaparallelapplicationchangesasthe
numberofprocessorsisincreased• Scaling:performanceimprovessteadily• Notscaling:performancedoesnotimproveorbecomesworse
ScalabilityofParallelSystems
• Twodifferenttypesofscalingwithregardstotheproblemsize– StrongScaling• Totalproblemsizestaysthesameasthenumberofprocessorsincreases
– WeakScaling• Theproblemsizeincreasesatthesamerateasthenumberofprocessors,keepingtheamountofworkperprocessorthesame
• Strongscalingisgenerallymoreusefulandmoredifficulttoachievethanweakscaling
• http://www.mcs.anl.gov/~itf/dbpp/text/node30.html• https://www.sharcnet.ca/help/index.php/Measuring_Parallel_Scaling_Performance
StrongScaling
7
WeakScalingofParallelSystems
Extrapolateperformance• Fromsmallproblemsandsmallsystemsà largerproblemsonlargerconfigurations
3parallelalgorithmsforcomputingann-pointFFTon64PEs
8
Inferencesfromsmalldatasetsorsmallmachinescanbemisleading
ScalingCharacteristicsofParallelPrograms:Increaseproblemsize
• Efficiency:
• Paralleloverhead:To =pTP – TSè E=TS/(TS+To)– Overheadincreasesasp increase
• Problemsize:– Givenproblemsize,TS remainsconstant
• Efficiencyincreases if– Theproblemsizeincreases(Ts) and– KeepingthenumberofPEsconstant.
9
Example:Addingn Numbersonp PEs
• Addition=1timeunit;communication=1timeunit
10
Speeduptendstosaturateandefficiencydrops
ScalingCharacteristicsofParallelPrograms:Increaseproblemsizeandincrease#PEs
• OverheadTo =ƒ (Ts,p),i.e.problemsizeandp– Inmanycases,To growssublinearly withrespecttoTs
• Efficiency:– Decreasesasweincreasep ->T0– Increasesasweincreaseproblemsize(Ts)
• Keepefficiencyconstant– Increaseproblemsizesand– proportionallyincreasingthenumberofPEs
• Scalable parallelsystems12
Isoefficiency MetricofScalability
Rateatwhichtheproblemsize(Ts)mustincreaseperadditionalPE(T0)tokeeptheefficiencyfixed
• Thescalabilityofthesystem– Theslowerthisrate,thebetterscalability– Rate==0:strongscaling.• Thesameproblem(samesize)scaleswhenincreasingnumberofPEs
• Toformalizethisrate,wedefine– TheproblemsizeW= theasymptoticnumberofoperations
associatedwiththebestserialalgorithmtosolvetheproblem.• Theserialexecutiontime,Ts 14
Isoefficiency MetricofScalability
• Paralleloverhead:To(W,p),again,W~=Ts• Parallelexecutiontime:
• Speedup:
• Efficiency
15
Isoefficiency MetricofScalability
• Tomaintainconstantefficiency(between0and1)
• K =E /(1– E)isaconstantrelatedtothedesiredefficiency
16
RatioTo /W shouldbemaintainedataconstantvalue.
Isoefficiency MetricofScalability
W=Φ (p)suchthatefficiencyisconstant
• W=Φ (p) iscalledtheisoefficiency function– Readas:whatistheproblemsizewhenwehavep PEstomaintain
constantefficiency?– Wp+1 – Wp =Φ (p+1)- Φ (p)• Tomaintainconstantefficiency,howmuchtoincreasetheproblemsizeifaddingonemorePE?
• isoefficiency function determinestheease– Withwhichaparallelsystemmaintainaconstantefficiency– Henceachievespeedupsincreasinginproportionto# PEs
17
Isoefficiency Example1
Addingn numbersusingp PEs• Paralleloverhead:To =2plogp• W=KT0(W,p),substitute T0– W=K*2*p*logp• K*2*p*logp istheisoefficiency function
• Theasymptoticisoefficiency functionforthisparallelsystemisΘ(p*logp)
• Tohavethesameefficiencyonp’processorsasonp– problemsizenmustincreaseby(p’logp’)/(plogp)when
increasingPEsfromptop’18
Examples
• by(p’logp’)/(plogp)
• Ifp=8,p’=16• 16*log16/(8*log8)=16*4/(8*3)=8/3=2.67
• 10Mon8cores• 10*2.67Mon16cores
19
Cost-OptimalityandIsoefficiency
• Aparallelsystemiscost-optimalifandonlyif– Parallelcost==totalwork• Efficiency=1
• Fromthis,wehave:– i.e.workdominatesoverhead
• Ifwehaveanisoefficiency functionf(p)– TherelationW=Ω(f(p)) mustbesatisfiedtoensurethecost-
optimalityofaparallelsystemasitisscaledup21
TopicOverview
• Introduction• PerformanceMetricsforParallelSystems– ExecutionTime,Overhead,Speedup,Efficiency,Cost• Amdahl’sLaw• ScalabilityofParallelSystems– Isoefficiency MetricofScalability• MinimumExecutionTime• AsymptoticAnalysisofParallelPrograms• OtherScalabilityMetrics– Scaledspeedup,Serialfraction
25
MinimumExecutionTime
• Often,weareinterestedintheminimumtimetosolution• TodeterminetheminimumexetimeTPmin foragivenW– DifferentiatingtheexpressionforTP w.r.t.p andequateitto0
• Ifp0 isthevalueofp asdeterminedbythisequation– TP(p0)istheminimumparalleltime
26
=0
MinimumExecutionTime:Example
Addingnnumbers• Parallelexecutiontime:
• Computethederivative:
• Setthederivative=0,solveforp:
• Thecorrespondingexetime:
27
=
=
Notethatatthispoint,theformulationisnotcost-optimal.
TopicOverview
• Introduction• PerformanceMetricsforParallelSystems– ExecutionTime,Overhead,Speedup,Efficiency,Cost• Amdahl’sLaw• ScalabilityofParallelSystems– Isoefficiency MetricofScalability• MinimumExecutionTime• AsymptoticAnalysisofParallelPrograms• OtherScalabilityMetrics– Scaledspeedup,Serialfraction
30
AsymptoticAnalysisofParallelPrograms
Sortingalistofn numbers.• Thefastestserialprograms:Θ(nlogn).• Fourparallelalgorithms,A1,A2,A3,andA4
31
AsymptoticAnalysisofParallelProgramsSortingalistofn numbers.
• Ifmetricisspeed(TP),algorithmA1isthebest,followedbyA3,A4,andA2• Intermsofefficiency(E),A2andA4arethebest,followedbyA3andA1.• Intermsofcost(pTp),algorithmsA2andA4arecostoptimal,A1andA3are
not.
• Itisimportanttoidentifytheanalysisobjectivesandtouseappropriatemetrics!
32
TopicOverview
• Introduction• PerformanceMetricsforParallelSystems– ExecutionTime,Overhead,Speedup,Efficiency,Cost• Amdahl’sLaw• ScalabilityofParallelSystems– Isoefficiency MetricofScalability• MinimumExecutionTime• AsymptoticAnalysisofParallelPrograms• OtherScalabilityMetrics– Scaledspeedup,Serialfraction
33
ScaledSpeedup:Example
nxnmatrixmultiplication
• Theserialexecutiontime:tcn3.• Theparallelexecutiontime:
• Speedup:
39
ScaledSpeedup:Example(continued)
Considermemory-constrainedscaledspeedup.• Wehavememorycomplexitym=Θ(n2)=Θ(p),orn2=cxp.
• Atthisgrowthrate,scaledspeedupS’ isgivenby:
• Notethatthisisscalable.
40
ScaledSpeedup:Example(continued)
Considertime-constrainedscaledspeedup.
• WehaveTP =O(1)=O(n3/p) ,orn3=cxp .
• Time-constrainedspeedupS’’ isgivenby:
• Memoryconstrainedscalingyieldsbetterperformance.
41
References
• Adaptedfromslides“PrinciplesofParallelAlgorithmDesign”byAnanth Grama
• “AnalyticalModelingofParallelSystems”,Chapter5inAnanth Grama,Anshul Gupta,GeorgeKarypis,andVipinKumar,IntroductiontoParallelComputing'',“AddisonWesley,2003.
• Grama,Ananth Y.;Gupta,A.;Kumar,V.,"Isoefficiency:measuringthescalabilityofparallelalgorithmsandarchitectures,"inParallel&DistributedTechnology:Systems&Applications,IEEE,vol.1,no.3,pp.12-21,Aug.1993,doi:10.1109/88.242438,http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=242438&isnumber=6234
46