seminar series static and dynamic compiler optimizations (6/28). speculative compiler...
Post on 20-Dec-2015
220 views
TRANSCRIPT
Seminar SeriesSeminar Series
Static and Dynamic Compiler Optimizations Static and Dynamic Compiler Optimizations (6/28).(6/28). Speculative Compiler Optimizations Speculative Compiler Optimizations (7/05)(7/05) ADORE: An Adaptive Object Code ReOptimization System ADORE: An Adaptive Object Code ReOptimization System
(7/19)(7/19) Current Trends in CMP/CMT Processors Current Trends in CMP/CMT Processors (7/26)(7/26) Static and Dynamic Helper Thread Prefetching Static and Dynamic Helper Thread Prefetching (8/02)(8/02) Dynamic Instrumentation/Translation Dynamic Instrumentation/Translation (8/16)(8/16) Virtual Machine Technologies and their Emerging Applications Virtual Machine Technologies and their Emerging Applications
(8/23)(8/23)
Professional BackgroundProfessional Background CE BS and CE MS, NCTUCE BS and CE MS, NCTU CS Ph.D. University of Wisconsin, MadisonCS Ph.D. University of Wisconsin, Madison Cray Research, 1987-1993Cray Research, 1987-1993
Architect for Cray Y-MP, Cray C-90, FASTArchitect for Cray Y-MP, Cray C-90, FASTCompiler optimization for Cray-Xmp, Ymp, Cray-2, Compiler optimization for Cray-Xmp, Ymp, Cray-2,
Cray-3.Cray-3. Hewlett Packard, 1993 -1999Hewlett Packard, 1993 -1999
Compiler technical lead for HP-7200, HP-8000, IA-64Compiler technical lead for HP-7200, HP-8000, IA-64 Lab technical lead for adaptive systemsLab technical lead for adaptive systems
University of Minnesota, 2000-nowUniversity of Minnesota, 2000-now ADORE/Itanium and ADORE/Sparc systemsADORE/Itanium and ADORE/Sparc systems
Sun Microsystem, 2004-2005Sun Microsystem, 2004-2005 Visiting professor Visiting professor
Static and Dynamic Compiler Static and Dynamic Compiler OptimizationsOptimizations
Wei Chung HsuWei Chung Hsu6/28/20066/28/2006
Optimization:Optimization: A process of making something as effective as A process of making something as effective as possiblepossible
Compiler:Compiler: A computer program that translates programs A computer program that translates programs written in high-level languages into machine written in high-level languages into machine instructionsinstructions
Compiler Optimization:Compiler Optimization: The phases of compilation that generates good The phases of compilation that generates good code to make as efficiently use of the target code to make as efficiently use of the target machines as possible.machines as possible.
BackgroundBackground
Static Optimization:Static Optimization:
compile time optimization – one time, compile time optimization – one time, fixed optimization that will not change fixed optimization that will not change after distribution.after distribution.
Dynamic Optimization:Dynamic Optimization:
optimization performed at program optimization performed at program execution time – adaptive to the execution time – adaptive to the execution environment.execution environment.
Background (cont.)Background (cont.)
Redundancy eliminationRedundancy elimination
C = (A+B)*(A+B) C = (A+B)*(A+B) t=A+B; C=t*t; t=A+B; C=t*t; Register allocationRegister allocation
keep frequently used data items in keep frequently used data items in registersregisters
Instruction schedulingInstruction scheduling
to avoid pipeline bubblesto avoid pipeline bubbles Cache prefetchingCache prefetching
to minimize cache miss penaltiesto minimize cache miss penalties
Some ExamplesSome Examples
In the last 15 years, the computer In the last 15 years, the computer performance has increased by performance has increased by ~2000 ~2000 times.times. Clock rate increased by Clock rate increased by ~100 X~100 X Micro-architecture contributed Micro-architecture contributed ~5-10X~5-10X
the number of transistors doubles the number of transistors doubles every 18 months.every 18 months.
Compiler optimization addedCompiler optimization added ~2-3X ~2-3X for for single processorssingle processors
How Important Is Compiler Optimization?How Important Is Compiler Optimization?
Have you used compiler Have you used compiler optimization lately?optimization lately?
Speed up from Compiler OptimizationSpeed up from Compiler Optimization099.go 1 1.29 1.94 1.93 2.03 2.6124.m88ksim 1 1.2 2.18 2.27 2.51 4.89126.gcc 1 1.2 1.56 1.55 1.64 2.28129.compress 1 1.51 1.89 2.34 2.26 3.82130.li 1 1.38 1.78 1.97 2.07 4.14132.ijpeg 1 1.24 3.43 3.56 3.62 4.14134.perl 1 1.07 1.24 1.23 1.29 2.4147.vortex 1 1.05 1.41 1.43 1.47 4.62Average 1 1.24 1.93 2.03 2.11 3.61
0
1
2
3
4
5
6
SPEC95Int (running on HP-PA8000)
Sp
eed
up
O1
O2
O3
O4
Peak
Speed up from Compiler OptimizationSpeed up from Compiler Optimization101.tomcatv 1 1.48 4.09 6.28 6.28 10.56102.swim 1 1.62 4.47 4.47 4.48 12.01103.su2cor 1 1.32 3.99 3.99 3.99 5.48104.hydro2d 1 1.3 3.51 3.61 3.62 6.32107.mgrid 1 5.59 37.39 38.66 38.8 77.02110.applu 1 1.67 10.57 12.5 12.44 18.93125.turb3d 1 1.12 7.95 8.03 8.03 9.04141.apsi 1 1.47 6.03 6.52 6.52 10.77145.fpppp 1 1.47 2.41 2.53 2.72 3.18146.wave5 1 1.35 4.51 6.88 6.94 7.97Average 1 1.84 8.49 9.35 9.38 16.13
0
5
10
15
20
25
30
35
40
Spec95fp (Running on HP-PA-8000)
Sp
eed
up
O1
O2
O3
O4
Peak
C Front End
IL to IL Inter-Procedural Optimizer
IntermediateLanguage
(IL, IR)
Profile-DirectedFeedback
C++ Front End Fortran Front End
OptimizingBackend
MachineCode
Static compilation systemStatic compilation system
Platform neutral Machine-independent optimizations
Machine-dependent optimizations
Sampleinput
Criteria for optimizationsCriteria for optimizations
Must preserve the meaning of programsMust preserve the meaning of programs– ExampleExample
For (I=0; I<N;I++){A[I] += b[I]+c/N;}
T1 = c/N;For (I=0; I<N;I++){A[I] += b[I]+T1;}
What if N == 0?
X
ExampleExample
If (C > 0){A += b[j] + d[j];}
What if b[j] except when C<=0?
T1=b[j];T2=d[j];If (C > 0){A += T1+T2;}
X
Basic ConceptsBasic Concepts
Optimizations improve performance, but do not Optimizations improve performance, but do not give optimal performancegive optimal performance
Optimizations generally (or statistically) Optimizations generally (or statistically) improve performance. They could also slow improve performance. They could also slow down the code.down the code.Example: LICM, Cache prefetching, Procedure Example: LICM, Cache prefetching, Procedure
inlining inlining Must be absolutely (not statistically!) correct Must be absolutely (not statistically!) correct
(safe or conservative)(safe or conservative)Some optimizations are more important in Some optimizations are more important in
general purpose compilersgeneral purpose compilersLoop optimizations, reg allocation, inst schedulingLoop optimizations, reg allocation, inst scheduling
Optimization at different levelsOptimization at different levels
Local (within a basic block)Local (within a basic block)Global (cross basic blocks but within a Global (cross basic blocks but within a
procedure)procedure)Inter-proceduralInter-procedural
Cross module (link time)Cross module (link time)Post-link time (such as Spike/iSpike)Post-link time (such as Spike/iSpike)Runtime (as in dynamic compilation)Runtime (as in dynamic compilation)
Tradeoff in OptimizationsTradeoff in OptimizationsSpace vs. SpeedSpace vs. Speed
Usually favors speed. However, on machines with Usually favors speed. However, on machines with small memory or I-cache, space is equally importantsmall memory or I-cache, space is equally important
Compile time vs. Execution TimeCompile time vs. Execution TimeUsually favors execution time, but not necessary Usually favors execution time, but not necessary
true in recent years. (e.g. JIT, large apps)true in recent years. (e.g. JIT, large apps)Absolutely robust vs. statistically robustAbsolutely robust vs. statistically robust
Decrease default optimization level at less important Decrease default optimization level at less important regions.regions.
Complexity vs. EfficiencyComplexity vs. EfficiencySelect between complex but more efficient and Select between complex but more efficient and
simple but less efficient (easier to maintain) simple but less efficient (easier to maintain) algorithms.algorithms.
Overview of OptimizationsOverview of OptimizationsEarly OptimizationsEarly Optimizations
scalar replacement, constant foldingscalar replacement, constant folding
local/global value numberinglocal/global value numbering
local/global copy propagationlocal/global copy propagation
Redundancy EliminationRedundancy Elimination
local/global CSE, PRElocal/global CSE, PRE
LICMLICM
code hoistingcode hoisting
Loop OptimizationsLoop Optimizations
strength reductionstrength reduction
induction variable removalinduction variable removal
unnecessary bound checking eliminationunnecessary bound checking elimination
Overview of OptimizationsOverview of Optimizations
Procedure OptimizationsProcedure Optimizations
tail-recursion elimination, in-line expansion, tail-recursion elimination, in-line expansion, leaf-routine optimization, shrink wrapping, leaf-routine optimization, shrink wrapping, memorizationmemorization
Register AllocationRegister Allocation
graph coloring graph coloring
Instruction SchedulingInstruction Scheduling
local/global code schedulinglocal/global code scheduling
software pipeliningsoftware pipelining
trace scheduling, superblock formationtrace scheduling, superblock formation
Overview of OptimizationsOverview of OptimizationsMemory Hierarchy OptimizationsMemory Hierarchy Optimizations
loop blocking, loop interchangeloop blocking, loop interchangememory padding, cache prefetching, data re-memory padding, cache prefetching, data re-layoutlayout
Loop TransformationsLoop Transformationsreduction recognition, loop collapsing, loop reduction recognition, loop collapsing, loop reversal, strip mining, loop fusion, loop reversal, strip mining, loop fusion, loop distribution distribution
Peephole OptimizationsPeephole OptimizationsProfile Guided OptimizationsProfile Guided Optimizations
Code Re-positioning, I-cache prefetching, Code Re-positioning, I-cache prefetching, Profiling guided in-lining, RA, IS, …. Profiling guided in-lining, RA, IS, ….
Overview of OptimizationsOverview of OptimizationsMore OptimizationsMore Optimizations
SIMD Transformation, VLIW TransformationSIMD Transformation, VLIW TransformationCommunication OptimizationsCommunication Optimizations((See David Bacon and Susan Graham’s survey paper)See David Bacon and Susan Graham’s survey paper)
Optimization EvaluationOptimization EvaluationIs there a commonly accepted method?Is there a commonly accepted method?
user’s choiceuser’s choice benchmarksbenchmarks
Livermore loops (14 kernels from scientific code)Livermore loops (14 kernels from scientific code)PERFECT club, SPLASH, NASPERFECT club, SPLASH, NASSPECSPEC
Importance of Individual Opt.Importance of Individual Opt.How much performance an optimization How much performance an optimization
contributes ?contributes ?Is this optimization commonplace?Is this optimization commonplace?
does it happen in one particular instance?does it happen in one particular instance?
does it happen in one particular program?does it happen in one particular program?
does it happen for one particular type of does it happen for one particular type of app?app?
how much difference does it makes?how much difference does it makes?does it enable other optimizationsdoes it enable other optimizations
procedure integration, unrollingprocedure integration, unrolling
Ordering Ordering Ordering is important, some dependences Ordering is important, some dependences
between optimizations existbetween optimizations existProcedure integration and loop unrolling usually Procedure integration and loop unrolling usually
enable other optimizationsenable other optimizationsLoop transformations should be done before Loop transformations should be done before
address linearization.address linearization.
No optimal orderingNo optimal orderingSome optimizations should be applied multiple Some optimizations should be applied multiple
times (e.g. copy propagation, DCE)times (e.g. copy propagation, DCE)Some recent research advocate exhaustive Some recent research advocate exhaustive
search with intelligent pruningsearch with intelligent pruning
Example OrganizationExample Organization
IRControl
FlowAnalysis
Data Flow
Analysis
Trans-formations
Flow graphIdentify loops
Reaching definitionDefine-use chains
Global CSECopy prop
Code motion
Loops in Flow GraphLoops in Flow Graph DominatorsDominators
d of a flow graph dominates node n, written d of a flow graph dominates node n, written as as d dom nd dom n, if every path from the initial , if every path from the initial node of the flow graph to n goes through d.node of the flow graph to n goes through d.
Example:Example:1
2 3
4
5 6
7
1 dom all3 dom 4,5,6,74 dom 5,6,7
Loops in Flow Graph (cont.)Loops in Flow Graph (cont.) Natural loopsNatural loops1.1. A loop must have a single entry point, called A loop must have a single entry point, called
the “header”. It dominates all nodes in the the “header”. It dominates all nodes in the loop.loop.
2.2. At least one path back to the header.At least one path back to the header.
BackedgeBackedge– An edge in the flow graph whose head An edge in the flow graph whose head
dominates its tail. For example, dominates its tail. For example,
edge 4 edge 4 3 and edge 7 3 and edge 7 11
Global Data Flow AnalysisGlobal Data Flow Analysis
To provide global information about how To provide global information about how a procedure manipulates its data.a procedure manipulates its data.ExampleExample
A=3
B+=A B=A+1
C=ACan we propagate constant 3 for A?
Data Flow EquationsData Flow Equations
A typical data flow equation has the formA typical data flow equation has the form
Out [S] = Gen[S] U (In[S] – Kill[S])Out [S] = Gen[S] U (In[S] – Kill[S])
S means a statementS means a statementGen[S] means definitions generated within SGen[S] means definitions generated within SKill[S] means definitions killed as control flows Kill[S] means definitions killed as control flows
through Sthrough SIn[S] means definitions live at the beginning of SIn[S] means definitions live at the beginning of SOut[S] means definitions available at the end of SOut[S] means definitions available at the end of S
Reaching DefinitionsReaching Definitions A definition A definition dd reaches a point reaches a point pp, if there is a path , if there is a path
from the point immediately following from the point immediately following dd to to pp, such , such that that d d is not killed along that path.is not killed along that path.
d1: I=m-1d2: j:=nd3: a=u1
d4: I=I+1
d5: j=j-1
d6: a=u2
B1
B2
B3
B4
B5 B6
d1,d2,d5 reach B2d5 kills d2, so d2does not reach B3,B4,B5
Data Flow Equation forData Flow Equation forReaching DefinitionReaching Definition
S
S
S S1
S1 S2
d1: a = b+c
gen[S] = {d1}kill[S] = all def of aout[S] = gen[S] U
(in[S] – kill[S])
gen[S] = gen[S1] U gen[S2]kill[S] = kill[S1] kill[S2]out[S] = out[S1] U out[S2]
gen[S] = gen[S1]kill[S] = kill[S1]In[S1] = in[S] U gen[S1]out[S] = out[S1]
Transformation example: LICMTransformation example: LICM
Loop Invariant Code MotionLoop Invariant Code Motion– A loop invariant is an instruction (a load or a A loop invariant is an instruction (a load or a
calculation) in a loop whose result is always the calculation) in a loop whose result is always the same in every iteration.same in every iteration.
– Once we identified loops, and tracked the locations Once we identified loops, and tracked the locations at which operand values are defined (i.e. reaching at which operand values are defined (i.e. reaching definition), we can recognize a loop invariant if each definition), we can recognize a loop invariant if each of its operandsof its operands
1) is a constant, 1) is a constant, 2) has reaching definitions that all lie outside the loop 2) has reaching definitions that all lie outside the loop
oror3) has a single reaching definition that itself is a loop 3) has a single reaching definition that itself is a loop
invariant. invariant.
Static CompilersStatic Compilers
Traditional compilation model for C, C++, Traditional compilation model for C, C++, Fortran, … Fortran, …
Extremely mature technologyExtremely mature technology Static design point allows for extremely deep and Static design point allows for extremely deep and
accurate analyses supporting sophisticated accurate analyses supporting sophisticated program transformation for performance.program transformation for performance.
ABI enables a useful level of language ABI enables a useful level of language interoperabilityinteroperability
But…But…
Static compilation…the downsidesStatic compilation…the downsides
CPU designers restricted by requirement to deliver CPU designers restricted by requirement to deliver increasing performance to applications that will not be increasing performance to applications that will not be recompiledrecompiled– Slows down the uptake of new ISA and micro-Slows down the uptake of new ISA and micro-
architectural featuresarchitectural features– Constrains the evolution of CPU design by discouraging Constrains the evolution of CPU design by discouraging
radical changesradical changes Model for applying feedback information from application Model for applying feedback information from application
profile to optimization and code generation components is profile to optimization and code generation components is awkward and not widely adopted thus diluting the awkward and not widely adopted thus diluting the performance achieved on the systemperformance achieved on the system
Static compilation…the downsidesStatic compilation…the downsides
Largely unable to satisfy our increasing Largely unable to satisfy our increasing desire to exploit dynamic traits of the desire to exploit dynamic traits of the applicationapplication
Even link-time is too early to be able to Even link-time is too early to be able to catch some high-value opportunities for catch some high-value opportunities for performance improvementperformance improvement
Whole classes of speculative Whole classes of speculative optimizations are infeasible without heroic optimizations are infeasible without heroic effortsefforts
Tyranny of the “Dusty Deck”Tyranny of the “Dusty Deck” Binary compatibility Binary compatibility isis one of the crowning one of the crowning
achievements of the early computer yearsachievements of the early computer yearsbut…but…
It does (or at least should) make CPU It does (or at least should) make CPU architects think very carefully about adding architects think very carefully about adding anything new becauseanything new because– you can almost never get rid of anything you addyou can almost never get rid of anything you add– it takes a long time to find out for sure whether it takes a long time to find out for sure whether
anything you add is a good idea or notanything you add is a good idea or not
Profile-Directed Feedback (PDF)Profile-Directed Feedback (PDF)
Two-step optimization process:Two-step optimization process:– First pass instruments the generated code to First pass instruments the generated code to
collect statistics about the program executioncollect statistics about the program execution Developer exercises this program with common Developer exercises this program with common
inputs to collect representative datainputs to collect representative data Program may be executed multiple times to reflect Program may be executed multiple times to reflect
variety of common inputsvariety of common inputs
– Second pass re-optimizes the program based Second pass re-optimizes the program based on the profile data collectedon the profile data collected
Also called Profile-Guided Optimization (PGO) or Profile-Also called Profile-Guided Optimization (PGO) or Profile-Based Optimization (PBO)Based Optimization (PBO)
Data collected by PDFData collected by PDF
Basic block execution countersBasic block execution counters– How many times each basic block in the How many times each basic block in the
program is reachedprogram is reached– Used to derive branch and call frequenciesUsed to derive branch and call frequencies
Value profilingValue profiling– Collects a histogram of values for a particular Collects a histogram of values for a particular
attribute of the programattribute of the program– Used for specializationUsed for specialization
Other PDF OpportunitiesOther PDF Opportunities
Path ProfilePath Profile Alias ProfileAlias Profile Cache Miss ProfileCache Miss Profile
– I-cache missI-cache miss– D-cache missD-cache miss– Miss typesMiss types– ITLB/DTLB missesITLB/DTLB misses
Speculation Failure ProfileSpeculation Failure Profile Event Correlation ProfileEvent Correlation Profile
Optimizations affected by PDFOptimizations affected by PDF
InliningInlining– Uses call frequencies to prioritize inlining sitesUses call frequencies to prioritize inlining sites
Function partitioningFunction partitioning– Groups the program into cliques of routines with high Groups the program into cliques of routines with high
call affinitycall affinity SpeculationSpeculation
– Control speculative execution, data speculative Control speculative execution, data speculative execution and value speculation based optimizations.execution and value speculation based optimizations.
PredicationPredication Code LayoutCode Layout Superblock formationSuperblock formation ……
Optimizations triggered by PDFOptimizations triggered by PDF((in the IBM compilerin the IBM compiler))
Specialization triggered by value profilingSpecialization triggered by value profiling– Arithmetic ops, built-in function calls, pointer callsArithmetic ops, built-in function calls, pointer calls
Extended basic block creationExtended basic block creation– Organizes code to frequently fall-through on branchesOrganizes code to frequently fall-through on branches
Specialized linkage conventionsSpecialized linkage conventions– Treats all registers as non-volatile for infrequent callsTreats all registers as non-volatile for infrequent calls
Branch hintingBranch hinting– Sets branch-prediction hints available on the ISASets branch-prediction hints available on the ISA
Dynamic memory reorganizationDynamic memory reorganization– Groups frequently accessed heap storageGroups frequently accessed heap storage
Impact of PDF on SpecInt 2000Impact of PDF on SpecInt 2000
-10%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
bzip
2
craf
ty
eon
gap
gcc
gzip
mcf
pars
er
perlb
mk
twol
f
vort
ex vpr
PD
F v
s n
o-P
DF
im
pro
ve
me
nt
On a PWR4 system running AIX using the latest IBM compilers, at the highest available optimization level (-O5)
Sounds great…what’s the problem?Sounds great…what’s the problem? Only the die-hard performance types use it (eg. HPC, Only the die-hard performance types use it (eg. HPC,
middleware)middleware) It’s tricky to get right…you only want to train the system to It’s tricky to get right…you only want to train the system to
recognize things that are characteristic of the application recognize things that are characteristic of the application and somehow ignore artifacts of the input setand somehow ignore artifacts of the input set
In the end, it’s still static and runtime checks and multiple In the end, it’s still static and runtime checks and multiple versions can only take you so farversions can only take you so far
Undermines the usefulness of benchmark results as a Undermines the usefulness of benchmark results as a predictor of application performance when upgrading predictor of application performance when upgrading hardwarehardware
In summary…the usability issue for developers that shows In summary…the usability issue for developers that shows no sign of going away anytime soonno sign of going away anytime soon
Dynamic Compilation SystemDynamic Compilation System
class
Java Virtual Machine
JIT CompilerMachine
Code
class jar
JVM EvolutionJVM Evolution First generation of JVMs were entirely interpreted. Pure First generation of JVMs were entirely interpreted. Pure
interpretation is good for proof-of-concept, but too slow for executing interpretation is good for proof-of-concept, but too slow for executing real code.real code.
Second generation JVMs used JIT (Just-in-time) compilers to Second generation JVMs used JIT (Just-in-time) compilers to convert bytecodes into machine codes before execution in a lazy convert bytecodes into machine codes before execution in a lazy fashion.fashion.
Hotspot is the 3Hotspot is the 3rdrd generation technology. It combines interpretation, generation technology. It combines interpretation, profiling and dynamic compilation. It compiles only the frequently profiling and dynamic compilation. It compiles only the frequently executed code. It also comes with 2 compilers: server compiler executed code. It also comes with 2 compilers: server compiler (optimize for speed) and client compiler (optimize for start-up and (optimize for speed) and client compiler (optimize for start-up and memory footprint).memory footprint).
New dynamic compilation techniques for JVMs are CPO New dynamic compilation techniques for JVMs are CPO (Continuous Program Optimization) or continuous recompilation and (Continuous Program Optimization) or continuous recompilation and OSR (On-Stack-Replacement) which can switch a code from OSR (On-Stack-Replacement) which can switch a code from interpretation mode to compiled versions. interpretation mode to compiled versions.
Dynamic CompilationDynamic Compilation
Traditional model for languages like JavaTraditional model for languages like Java Rapidly maturing technologyRapidly maturing technology Exploitation of current invocation behaviour on exact CPU Exploitation of current invocation behaviour on exact CPU
modelmodel Recompilation and other dynamic techniques enable Recompilation and other dynamic techniques enable
aggressive speculationsaggressive speculations Profile feedback to optimizer is performed online Profile feedback to optimizer is performed online
(transparent to user/application)(transparent to user/application) Compile time budget is concentrated on hottest code with Compile time budget is concentrated on hottest code with
the most (perceived) opportunitiesthe most (perceived) opportunities
But…But…
Dynamic compilation…the downsidesDynamic compilation…the downsides Some Some important analyses not affordable at runtimeimportant analyses not affordable at runtime even if even if
applied only to the hottest code (array data flow, global applied only to the hottest code (array data flow, global scheduling, dependency analysis, loop transformations, …)scheduling, dependency analysis, loop transformations, …)
Non-determinismNon-determinism in the compilation system can be in the compilation system can be problematicproblematic– For some users, it severely challenges their notions of For some users, it severely challenges their notions of
quality assurancequality assurance– Requires new approaches to RAS and to getting Requires new approaches to RAS and to getting
reproducible defects for the compiler service teamreproducible defects for the compiler service team Introduces a very complicated code base into each and Introduces a very complicated code base into each and
every applicationevery application Compile time budget is concentrated on hottest code with Compile time budget is concentrated on hottest code with
the most (perceived) opportunities and not on other code, the most (perceived) opportunities and not on other code, which in aggregate may be as important a contributor to which in aggregate may be as important a contributor to performanceperformance– What do you do when there’s no hot code?What do you do when there’s no hot code?
The best of both worldsThe best of both worlds
C
Portable High Level Optimizer
Bytecode, MIL, etc
Profile-DirectedFeedback (PDF)
C++ F90
CommonBackend
StaticMachine
Code
class class jar
Java / .NET
JIT DynamicMachine
Code
CPO
Front
Ends
BinaryTranslation
More boxes, but is it better?More boxes, but is it better? If ubiquitous, could enable a new era in CPU If ubiquitous, could enable a new era in CPU
architectural innovation by reducing the load architectural innovation by reducing the load of the dusty deck millstoneof the dusty deck millstone– Deprecated ISA features supported via binary Deprecated ISA features supported via binary
translation or recompilation from “IL-fattened” translation or recompilation from “IL-fattened” binarybinary
– No No latency effectlatency effect in seeing the value of a new in seeing the value of a new ISA featureISA feature
– New feature mistakes become relatively New feature mistakes become relatively painless to undopainless to undo
There’s moreThere’s more Transparently bring the benefits of dynamic Transparently bring the benefits of dynamic
optimization to traditionally static languages while optimization to traditionally static languages while still leveraging the power of static analysis and still leveraging the power of static analysis and language-specific semantic informationlanguage-specific semantic information– All of the advantages of dynamic profile-directed All of the advantages of dynamic profile-directed
feedback (PDF) optimizations with none of the static pdf feedback (PDF) optimizations with none of the static pdf drawbacksdrawbacks No extra build stepNo extra build step No input artifacts skewing specialization choicesNo input artifacts skewing specialization choices Code specialized to each invocation on exact processor modelCode specialized to each invocation on exact processor model More aggressive speculative optimizationsMore aggressive speculative optimizations Recompilation as a recovery optionRecompilation as a recovery option
– Static analyses inform value profiling choicesStatic analyses inform value profiling choices New static analysis goal of identifying the inhibitors to New static analysis goal of identifying the inhibitors to
optimizations for later dynamic testing and specializationoptimizations for later dynamic testing and specialization
SummarySummary A crossover point has been reached A crossover point has been reached
between dynamic and static compilation between dynamic and static compilation technologies.technologies.
They need to be converged/combined They need to be converged/combined to overcome their individual to overcome their individual weaknessesweaknesses
Hardware designers struggle under the Hardware designers struggle under the mounting burden of maintaining high mounting burden of maintaining high performance backwards compatibilityperformance backwards compatibility