topic 2b basic back-end optimization
Post on 20-Jan-2016
30 views
Embed Size (px)
DESCRIPTION
Topic 2b Basic Back-End Optimization. Register allocation. Slides: Topic 3a Dragon book: chapter 10 S. Cooper: Chapter 13 Other papers as assigned in class or homework. Reading List. Focus of This Topic. We focus on “scalar register allocation” - PowerPoint PPT PresentationTRANSCRIPT
*\course\cpeg421-10F\Topic-2b.ppt*Topic 2b Basic Back-End OptimizationRegister allocation
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Reading ListSlides: Topic 3aDragon book: chapter 10 S. Cooper: Chapter 13Other papers as assigned in class or homework
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Focus of This TopicWe focus on scalar register allocationLocal register is straightforward (read Coopers Section 13.3)This global register allocation problem is essentially solved by graph coloring techniques:Chaitin et. al. 1981, 82 (IBM)Chow, Hennesy 1983 (Stanford)Briggs, Kennedy 1992 (Rice)Register allocation for array variables in loops -- subject not discussed here
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Interprocedural Analysis and OptimizationLoop Nest Optimization and ParallelizationGlobal OptimizationCode GenerationFront endGood IRHigh-Level Compiler Infrastructure Needed A Modern View
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Good IPOGood LNOGood global optimizationGood integration of IPO/LNO/OPTSmooth information passing between FE and CGComplete and flexible support of inner-loop scheduling (SWP), instruction scheduling and register allocation Inter-ProceduralOptimization (IPO)Loop NestOptimization (LNO)Global Optimization(OPT)SourceInnermostLoopschedulingGlobal instschedulingReg allocLocal instschedulingExecutableArchModelsCGMEGeneral Compiler Framework
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*A Map of Modern Compiler PlatformsGNU CompilersIMPACT CompilerCydra VLIWCompilerMultiflow VLIW CompilerUcode CompilerChow/HennessyHP ResearchCompiler SGI Pro Compiler - Designed for ILP/MP - Production quality - Open SourceTrimaranCompilerSUIF CompilerLLVM CompilerOpen64 Compiler (PathScale, ORC, Osprey)
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Osprey Compiler Performance (4/3/07)GCC4.3 at O3 With additional options recommended by GCC developersTwo programs has runtime error using additional optionsOsprey3.1 with vanilla O3The performance delta is ~10%, excluding two failing programsCurtesy: S.M. Liu (HP)
\course\cpeg421-10F\Topic-2b.ppt
Chart1
900705
1016954
1132176.gcc
10861090
936945
804815
1074907
883871
641571
1223255.vortex
951800
10831097
923954
26291996
397375
1144873
953.7900062392868.6840686344
opencc3.1 -O3
gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm
SPEC2000 C/C++ Benchmark ComparisonMontecito 1.6GHz 4G Memory (higher is better)
Sheet1
opencc O3gcc4.3 O3 -funroll-loops -fgcse-las -fgcse-smopencc -O3/gcc -O3C/C++
int:timeratiotimeratioopencc3.1 -O3gcc4.3 -O3 -funroll-loops -fgcse-las -fgcse-sm
164.gzip1569001997051.27659574471.2765957447164.gzip900705
175.vpr13810161479541.06498951781.0649895178175.vpr1016954
176.gcc97.21132RE176.gcc1132
181.mcf166108616510900.99633027520.9963302752181.mcf10861090
186.crafty1079361069450.99047619050.9904761905186.crafty936945
197.parser2248042218150.98650306750.9865030675197.parser804815
252.eon12110741439071.1841234841.184123484252.eon1074907
253.perlbmk2048832078711.01377726751.0137772675253.perlbmk883871
254.gap1726411935711.1225919441.122591944254.gap641571
255.vortex1551223255.vortex1223
256.bzip21589511888001.188751.18875256.bzip2951800
300.twolf277108327310970.98723792160.9872379216300.twolf10831097
9568611.0766680749177.mesa923954
fp:179.art26291996
168.wupwise2666023254931.2210953347183.equake397375
171.swim12025915225944.361952862188.ammp1144873
172.mgrid12014976372835.2897526502Geomean953.7900062392868.6840686344
173.applu17811814244952.3858585859979.1533788234868.6840686344
177.mesa1529231479540.96750524110.96750524111.0979711044
178.galgel58.249851.1271685693
179.art98.9262913019961.31713426851.3171342685
183.equake3283973473751.05866666671.0586666667
187.facerec4364364344380.99543379
188.ammp19211442528731.31042382591.3104238259
189.lucas14014332557851.825477707
191.fma3d6533227772701.1925925926
200.sixtrack1616825551983.4444444444
301.apsi24510593926631.5972850679
geomean1050537.38058869481.74747655011.0979711044
spec2006
int:
400.perlbench15416.3416166.041.0496688742
401.bzip213207.3113886.951.0517985612
403.gcc12156.6311676.90.9608695652
429.mcf20534.4421494.241.0471698113
445.gobmk13467.7912568.360.9318181818
456.hmmer10738.6912377.541.1525198939
458.sjeng16637.2816377.390.9851150203
462.libquantum68573.0266963.090.9773462783
464.h264ref154714.3152114.60.9794520548
471.omnetpp15064.1513834.520.9181415929
473.astar8688.0910606.621.2220543807
483.xalancbmk16784.1115674.40.9340909091
6.31293179586.22684801641.0138246155
fp:
410.bwavesRE30544.45
416.gamess27767.0530826.35
433.milc25413.6126073.521.0255681818
434.zeusmp18045.0432672.79
435.gromacs8688.2219643.64
436.cactusADM32993.6235513.37
437.leslie3d1045942192.23
444.namd9198.7212966.191.408723748
447.dealII16836.821115.421.2546125461
450.soplex18714.4618834.431.006772009
453.povray6837.797826.81.1455882353
454.calculix20873.9522343.69
459.GemsFDTD21994.8242152.52
465.tonto10709.223324.22
470.lbm8571619607.012.2824536377
481.wrfRECE
482.sphinx3144813.521149.221.464208243
1.1173802018
Sheet2
Sheet3
*\course\cpeg421-10F\Topic-2b.ppt*Vision and Status of Open64 Today ?People should view it as GCC with an alternative backend with great potential to reclaim the best compiler in the worldThe technology incorporated all top compiler optimization research in 90'sIt has regain momentum in the last three years due to Pathscale, HP and AMDs (and others) investment in robustness and performanceTargeted to x86, Itanium in the public repository, ARM, MIPS, PowerPC, and several other signal processing CPU in private branches
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Register AllocationMotivationLive ranges and interference graphsProblem formulationSolution methods
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Motivation Registers much faster than memory Limited number of physical registers Keep values in registers as long as possible (minimize number of load/stores executed)
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Goals of Optimized Register Allocation1 Pay careful attention to allocating registers to variables that are more profitable to reside in registers2 Use the same register for multiple variables when legal to do so
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register AllocationChaitin: Coloring Heuristic. Use the simple stack heuristic forACM register allocation. Spill/no-spillSIGPLAN decisions are made during theNotices stack construction phase of the1982 algorithmBriggs: Finds out that Chaitins algorithmPLDI spills even when there are available1989 registers. Solution: the optimistic approach: may-spill during stack construction, decide at spilling time.
\course\cpeg421-10F\Topic-2b.ppt
*\course\cpeg421-10F\Topic-2b.ppt*Brief History of Register Allocation (Cont)Callahan: Hierarchical Coloring Graph,PLDI register preference,1991 profitability of spilling.Chow-Hennessy: Priority-based coloring.SIGPLAN Integrate spilling decisions in the1984 coloring decisions: spill a variableASPLOS for a limited life range.1990 Favor dense over sparse use regions. Consider parameter passing convention.
\course\cpeg421-10F\Topic-2b.ppt
- *\course\cpeg421-10F\Topic-2b.ppt*Assigning Registers to more Profitable Variables (example)c = S ;sum = 0 ;i = 1 ;while ( i
- *\course\cpeg421-10F\Topic-2b.ppt*[105] sum := sum + i[106]i := i + 1[107]goto L1[100] c = S[101]sum := 0[102]i := 1[103]label L1:[104]if i > 100 goto L2[108]label L2:[109]square = sum * sum[110]print c, sum, squareThe Control Flow Graph of the Examplec = S ;sum = 0 ;i = 1 ;while ( i
- *\course\cpeg421-10F\Topic-2b.ppt*Desired Register Allocation for ExampleAssume that there are only two non-reserved registers available for allocation ($t2 and $t3). A desired register allocation for the above example is as follows:VariableRegistercno registersum$t2i$t3square$t3c = S ;sum = 0 ;i = 1 ;while ( i
*\course\cpeg421-10F\Topic-2b.ppt*1. Pay careful attention to assigning registers to variables that are more profitableThe number of defs (writes) and uses (reads) to the variables in this sample program is as follows:
Register Allocation GoalsVariable#defs#uses c 1 1 sum101103 i101301 square 1 1 variables sum and i should get priority over variable c for register assignment.
c = S ;sum = 0 ;i = 1 ;while ( i
*\course\cpeg421-10F\Topic-2b.ppt*2. Use the same register for multiple variables when legal to do so
Reuse same register ($t3) for variables I and square since there is no point in the program where both variables are simultaneously live.Register Allocation GoalsVariableRegistercno registersum$t2i$t3square$t3c = S ;sum = 0 ;i = 1 ;while ( i
*\course\cpeg421-10F\Topic-2b.ppt*
Register Allocation vs. Register AssignmentRegister Allocation determining which values should be kept i