seminar series static and dynamic compiler optimizations (6/28). speculative compiler...

50
Seminar Series Seminar Series Static and Dynamic Compiler Optimizations Static and Dynamic Compiler Optimizations (6/28). (6/28). Speculative Compiler Optimizations Speculative Compiler Optimizations (7/05) (7/05) ADORE: An Adaptive Object Code ReOptimization ADORE: An Adaptive Object Code ReOptimization System System (7/19) (7/19) Current Trends in CMP/CMT Processors Current Trends in CMP/CMT Processors (7/26) (7/26) Static and Dynamic Helper Thread Prefetching Static and Dynamic Helper Thread Prefetching (8/02) (8/02) Dynamic Instrumentation/Translation Dynamic Instrumentation/Translation (8/16) (8/16) Virtual Machine Technologies and their Virtual Machine Technologies and their Emerging Applications Emerging Applications (8/23) (8/23)

Post on 20-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Seminar SeriesSeminar Series

Static and Dynamic Compiler Optimizations Static and Dynamic Compiler Optimizations (6/28).(6/28). Speculative Compiler Optimizations Speculative Compiler Optimizations (7/05)(7/05) ADORE: An Adaptive Object Code ReOptimization System ADORE: An Adaptive Object Code ReOptimization System

(7/19)(7/19) Current Trends in CMP/CMT Processors Current Trends in CMP/CMT Processors (7/26)(7/26) Static and Dynamic Helper Thread Prefetching Static and Dynamic Helper Thread Prefetching (8/02)(8/02) Dynamic Instrumentation/Translation Dynamic Instrumentation/Translation (8/16)(8/16) Virtual Machine Technologies and their Emerging Applications Virtual Machine Technologies and their Emerging Applications

(8/23)(8/23)

Page 2: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Professional BackgroundProfessional Background CE BS and CE MS, NCTUCE BS and CE MS, NCTU CS Ph.D. University of Wisconsin, MadisonCS Ph.D. University of Wisconsin, Madison Cray Research, 1987-1993Cray Research, 1987-1993

Architect for Cray Y-MP, Cray C-90, FASTArchitect for Cray Y-MP, Cray C-90, FASTCompiler optimization for Cray-Xmp, Ymp, Cray-2, Compiler optimization for Cray-Xmp, Ymp, Cray-2,

Cray-3.Cray-3. Hewlett Packard, 1993 -1999Hewlett Packard, 1993 -1999

Compiler technical lead for HP-7200, HP-8000, IA-64Compiler technical lead for HP-7200, HP-8000, IA-64 Lab technical lead for adaptive systemsLab technical lead for adaptive systems

University of Minnesota, 2000-nowUniversity of Minnesota, 2000-now ADORE/Itanium and ADORE/Sparc systemsADORE/Itanium and ADORE/Sparc systems

Sun Microsystem, 2004-2005Sun Microsystem, 2004-2005 Visiting professor Visiting professor

Page 3: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Static and Dynamic Compiler Static and Dynamic Compiler OptimizationsOptimizations

Wei Chung HsuWei Chung Hsu6/28/20066/28/2006

Page 4: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Optimization:Optimization: A process of making something as effective as A process of making something as effective as possiblepossible

Compiler:Compiler: A computer program that translates programs A computer program that translates programs written in high-level languages into machine written in high-level languages into machine instructionsinstructions

Compiler Optimization:Compiler Optimization: The phases of compilation that generates good The phases of compilation that generates good code to make as efficiently use of the target code to make as efficiently use of the target machines as possible.machines as possible.

BackgroundBackground

Page 5: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Static Optimization:Static Optimization:

compile time optimization – one time, compile time optimization – one time, fixed optimization that will not change fixed optimization that will not change after distribution.after distribution.

Dynamic Optimization:Dynamic Optimization:

optimization performed at program optimization performed at program execution time – adaptive to the execution time – adaptive to the execution environment.execution environment.

Background (cont.)Background (cont.)

Page 6: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Redundancy eliminationRedundancy elimination

C = (A+B)*(A+B) C = (A+B)*(A+B) t=A+B; C=t*t; t=A+B; C=t*t; Register allocationRegister allocation

keep frequently used data items in keep frequently used data items in registersregisters

Instruction schedulingInstruction scheduling

to avoid pipeline bubblesto avoid pipeline bubbles Cache prefetchingCache prefetching

to minimize cache miss penaltiesto minimize cache miss penalties

Some ExamplesSome Examples

Page 7: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

In the last 15 years, the computer In the last 15 years, the computer performance has increased by performance has increased by ~2000 ~2000 times.times. Clock rate increased by Clock rate increased by ~100 X~100 X Micro-architecture contributed Micro-architecture contributed ~5-10X~5-10X

the number of transistors doubles the number of transistors doubles every 18 months.every 18 months.

Compiler optimization addedCompiler optimization added ~2-3X ~2-3X for for single processorssingle processors

How Important Is Compiler Optimization?How Important Is Compiler Optimization?

Page 8: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Have you used compiler Have you used compiler optimization lately?optimization lately?

Page 9: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Speed up from Compiler OptimizationSpeed up from Compiler Optimization099.go 1 1.29 1.94 1.93 2.03 2.6124.m88ksim 1 1.2 2.18 2.27 2.51 4.89126.gcc 1 1.2 1.56 1.55 1.64 2.28129.compress 1 1.51 1.89 2.34 2.26 3.82130.li 1 1.38 1.78 1.97 2.07 4.14132.ijpeg 1 1.24 3.43 3.56 3.62 4.14134.perl 1 1.07 1.24 1.23 1.29 2.4147.vortex 1 1.05 1.41 1.43 1.47 4.62Average 1 1.24 1.93 2.03 2.11 3.61

0

1

2

3

4

5

6

SPEC95Int (running on HP-PA8000)

Sp

eed

up

O1

O2

O3

O4

Peak

Page 10: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Speed up from Compiler OptimizationSpeed up from Compiler Optimization101.tomcatv 1 1.48 4.09 6.28 6.28 10.56102.swim 1 1.62 4.47 4.47 4.48 12.01103.su2cor 1 1.32 3.99 3.99 3.99 5.48104.hydro2d 1 1.3 3.51 3.61 3.62 6.32107.mgrid 1 5.59 37.39 38.66 38.8 77.02110.applu 1 1.67 10.57 12.5 12.44 18.93125.turb3d 1 1.12 7.95 8.03 8.03 9.04141.apsi 1 1.47 6.03 6.52 6.52 10.77145.fpppp 1 1.47 2.41 2.53 2.72 3.18146.wave5 1 1.35 4.51 6.88 6.94 7.97Average 1 1.84 8.49 9.35 9.38 16.13

0

5

10

15

20

25

30

35

40

Spec95fp (Running on HP-PA-8000)

Sp

eed

up

O1

O2

O3

O4

Peak

Page 11: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization
Page 12: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

C Front End

IL to IL Inter-Procedural Optimizer

IntermediateLanguage

(IL, IR)

Profile-DirectedFeedback

C++ Front End Fortran Front End

OptimizingBackend

MachineCode

Static compilation systemStatic compilation system

Platform neutral Machine-independent optimizations

Machine-dependent optimizations

Sampleinput

Page 13: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Criteria for optimizationsCriteria for optimizations

Must preserve the meaning of programsMust preserve the meaning of programs– ExampleExample

For (I=0; I<N;I++){A[I] += b[I]+c/N;}

T1 = c/N;For (I=0; I<N;I++){A[I] += b[I]+T1;}

What if N == 0?

X

Page 14: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

ExampleExample

If (C > 0){A += b[j] + d[j];}

What if b[j] except when C<=0?

T1=b[j];T2=d[j];If (C > 0){A += T1+T2;}

X

Page 15: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Basic ConceptsBasic Concepts

Optimizations improve performance, but do not Optimizations improve performance, but do not give optimal performancegive optimal performance

Optimizations generally (or statistically) Optimizations generally (or statistically) improve performance. They could also slow improve performance. They could also slow down the code.down the code.Example: LICM, Cache prefetching, Procedure Example: LICM, Cache prefetching, Procedure

inlining inlining Must be absolutely (not statistically!) correct Must be absolutely (not statistically!) correct

(safe or conservative)(safe or conservative)Some optimizations are more important in Some optimizations are more important in

general purpose compilersgeneral purpose compilersLoop optimizations, reg allocation, inst schedulingLoop optimizations, reg allocation, inst scheduling

Page 16: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Optimization at different levelsOptimization at different levels

Local (within a basic block)Local (within a basic block)Global (cross basic blocks but within a Global (cross basic blocks but within a

procedure)procedure)Inter-proceduralInter-procedural

Cross module (link time)Cross module (link time)Post-link time (such as Spike/iSpike)Post-link time (such as Spike/iSpike)Runtime (as in dynamic compilation)Runtime (as in dynamic compilation)

Page 17: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Tradeoff in OptimizationsTradeoff in OptimizationsSpace vs. SpeedSpace vs. Speed

Usually favors speed. However, on machines with Usually favors speed. However, on machines with small memory or I-cache, space is equally importantsmall memory or I-cache, space is equally important

Compile time vs. Execution TimeCompile time vs. Execution TimeUsually favors execution time, but not necessary Usually favors execution time, but not necessary

true in recent years. (e.g. JIT, large apps)true in recent years. (e.g. JIT, large apps)Absolutely robust vs. statistically robustAbsolutely robust vs. statistically robust

Decrease default optimization level at less important Decrease default optimization level at less important regions.regions.

Complexity vs. EfficiencyComplexity vs. EfficiencySelect between complex but more efficient and Select between complex but more efficient and

simple but less efficient (easier to maintain) simple but less efficient (easier to maintain) algorithms.algorithms.

Page 18: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Overview of OptimizationsOverview of OptimizationsEarly OptimizationsEarly Optimizations

scalar replacement, constant foldingscalar replacement, constant folding

local/global value numberinglocal/global value numbering

local/global copy propagationlocal/global copy propagation

Redundancy EliminationRedundancy Elimination

local/global CSE, PRElocal/global CSE, PRE

LICMLICM

code hoistingcode hoisting

Loop OptimizationsLoop Optimizations

strength reductionstrength reduction

induction variable removalinduction variable removal

unnecessary bound checking eliminationunnecessary bound checking elimination

Page 19: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Overview of OptimizationsOverview of Optimizations

Procedure OptimizationsProcedure Optimizations

tail-recursion elimination, in-line expansion, tail-recursion elimination, in-line expansion, leaf-routine optimization, shrink wrapping, leaf-routine optimization, shrink wrapping, memorizationmemorization

Register AllocationRegister Allocation

graph coloring graph coloring

Instruction SchedulingInstruction Scheduling

local/global code schedulinglocal/global code scheduling

software pipeliningsoftware pipelining

trace scheduling, superblock formationtrace scheduling, superblock formation

Page 20: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Overview of OptimizationsOverview of OptimizationsMemory Hierarchy OptimizationsMemory Hierarchy Optimizations

loop blocking, loop interchangeloop blocking, loop interchangememory padding, cache prefetching, data re-memory padding, cache prefetching, data re-layoutlayout

Loop TransformationsLoop Transformationsreduction recognition, loop collapsing, loop reduction recognition, loop collapsing, loop reversal, strip mining, loop fusion, loop reversal, strip mining, loop fusion, loop distribution distribution

Peephole OptimizationsPeephole OptimizationsProfile Guided OptimizationsProfile Guided Optimizations

Code Re-positioning, I-cache prefetching, Code Re-positioning, I-cache prefetching, Profiling guided in-lining, RA, IS, …. Profiling guided in-lining, RA, IS, ….

Page 21: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Overview of OptimizationsOverview of OptimizationsMore OptimizationsMore Optimizations

SIMD Transformation, VLIW TransformationSIMD Transformation, VLIW TransformationCommunication OptimizationsCommunication Optimizations((See David Bacon and Susan Graham’s survey paper)See David Bacon and Susan Graham’s survey paper)

Optimization EvaluationOptimization EvaluationIs there a commonly accepted method?Is there a commonly accepted method?

user’s choiceuser’s choice benchmarksbenchmarks

Livermore loops (14 kernels from scientific code)Livermore loops (14 kernels from scientific code)PERFECT club, SPLASH, NASPERFECT club, SPLASH, NASSPECSPEC

Page 22: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Importance of Individual Opt.Importance of Individual Opt.How much performance an optimization How much performance an optimization

contributes ?contributes ?Is this optimization commonplace?Is this optimization commonplace?

does it happen in one particular instance?does it happen in one particular instance?

does it happen in one particular program?does it happen in one particular program?

does it happen for one particular type of does it happen for one particular type of app?app?

how much difference does it makes?how much difference does it makes?does it enable other optimizationsdoes it enable other optimizations

procedure integration, unrollingprocedure integration, unrolling

Page 23: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Ordering Ordering Ordering is important, some dependences Ordering is important, some dependences

between optimizations existbetween optimizations existProcedure integration and loop unrolling usually Procedure integration and loop unrolling usually

enable other optimizationsenable other optimizationsLoop transformations should be done before Loop transformations should be done before

address linearization.address linearization.

No optimal orderingNo optimal orderingSome optimizations should be applied multiple Some optimizations should be applied multiple

times (e.g. copy propagation, DCE)times (e.g. copy propagation, DCE)Some recent research advocate exhaustive Some recent research advocate exhaustive

search with intelligent pruningsearch with intelligent pruning

Page 24: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Example OrganizationExample Organization

IRControl

FlowAnalysis

Data Flow

Analysis

Trans-formations

Flow graphIdentify loops

Reaching definitionDefine-use chains

Global CSECopy prop

Code motion

Page 25: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Loops in Flow GraphLoops in Flow Graph DominatorsDominators

d of a flow graph dominates node n, written d of a flow graph dominates node n, written as as d dom nd dom n, if every path from the initial , if every path from the initial node of the flow graph to n goes through d.node of the flow graph to n goes through d.

Example:Example:1

2 3

4

5 6

7

1 dom all3 dom 4,5,6,74 dom 5,6,7

Page 26: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Loops in Flow Graph (cont.)Loops in Flow Graph (cont.) Natural loopsNatural loops1.1. A loop must have a single entry point, called A loop must have a single entry point, called

the “header”. It dominates all nodes in the the “header”. It dominates all nodes in the loop.loop.

2.2. At least one path back to the header.At least one path back to the header.

BackedgeBackedge– An edge in the flow graph whose head An edge in the flow graph whose head

dominates its tail. For example, dominates its tail. For example,

edge 4 edge 4 3 and edge 7 3 and edge 7 11

Page 27: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Global Data Flow AnalysisGlobal Data Flow Analysis

To provide global information about how To provide global information about how a procedure manipulates its data.a procedure manipulates its data.ExampleExample

A=3

B+=A B=A+1

C=ACan we propagate constant 3 for A?

Page 28: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Data Flow EquationsData Flow Equations

A typical data flow equation has the formA typical data flow equation has the form

Out [S] = Gen[S] U (In[S] – Kill[S])Out [S] = Gen[S] U (In[S] – Kill[S])

S means a statementS means a statementGen[S] means definitions generated within SGen[S] means definitions generated within SKill[S] means definitions killed as control flows Kill[S] means definitions killed as control flows

through Sthrough SIn[S] means definitions live at the beginning of SIn[S] means definitions live at the beginning of SOut[S] means definitions available at the end of SOut[S] means definitions available at the end of S

Page 29: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Reaching DefinitionsReaching Definitions A definition A definition dd reaches a point reaches a point pp, if there is a path , if there is a path

from the point immediately following from the point immediately following dd to to pp, such , such that that d d is not killed along that path.is not killed along that path.

d1: I=m-1d2: j:=nd3: a=u1

d4: I=I+1

d5: j=j-1

d6: a=u2

B1

B2

B3

B4

B5 B6

d1,d2,d5 reach B2d5 kills d2, so d2does not reach B3,B4,B5

Page 30: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Data Flow Equation forData Flow Equation forReaching DefinitionReaching Definition

S

S

S S1

S1 S2

d1: a = b+c

gen[S] = {d1}kill[S] = all def of aout[S] = gen[S] U

(in[S] – kill[S])

gen[S] = gen[S1] U gen[S2]kill[S] = kill[S1] kill[S2]out[S] = out[S1] U out[S2]

gen[S] = gen[S1]kill[S] = kill[S1]In[S1] = in[S] U gen[S1]out[S] = out[S1]

Page 31: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Transformation example: LICMTransformation example: LICM

Loop Invariant Code MotionLoop Invariant Code Motion– A loop invariant is an instruction (a load or a A loop invariant is an instruction (a load or a

calculation) in a loop whose result is always the calculation) in a loop whose result is always the same in every iteration.same in every iteration.

– Once we identified loops, and tracked the locations Once we identified loops, and tracked the locations at which operand values are defined (i.e. reaching at which operand values are defined (i.e. reaching definition), we can recognize a loop invariant if each definition), we can recognize a loop invariant if each of its operandsof its operands

1) is a constant, 1) is a constant, 2) has reaching definitions that all lie outside the loop 2) has reaching definitions that all lie outside the loop

oror3) has a single reaching definition that itself is a loop 3) has a single reaching definition that itself is a loop

invariant. invariant.

Page 32: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Static CompilersStatic Compilers

Traditional compilation model for C, C++, Traditional compilation model for C, C++, Fortran, … Fortran, …

Extremely mature technologyExtremely mature technology Static design point allows for extremely deep and Static design point allows for extremely deep and

accurate analyses supporting sophisticated accurate analyses supporting sophisticated program transformation for performance.program transformation for performance.

ABI enables a useful level of language ABI enables a useful level of language interoperabilityinteroperability

But…But…

Page 33: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Static compilation…the downsidesStatic compilation…the downsides

CPU designers restricted by requirement to deliver CPU designers restricted by requirement to deliver increasing performance to applications that will not be increasing performance to applications that will not be recompiledrecompiled– Slows down the uptake of new ISA and micro-Slows down the uptake of new ISA and micro-

architectural featuresarchitectural features– Constrains the evolution of CPU design by discouraging Constrains the evolution of CPU design by discouraging

radical changesradical changes Model for applying feedback information from application Model for applying feedback information from application

profile to optimization and code generation components is profile to optimization and code generation components is awkward and not widely adopted thus diluting the awkward and not widely adopted thus diluting the performance achieved on the systemperformance achieved on the system

Page 34: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Static compilation…the downsidesStatic compilation…the downsides

Largely unable to satisfy our increasing Largely unable to satisfy our increasing desire to exploit dynamic traits of the desire to exploit dynamic traits of the applicationapplication

Even link-time is too early to be able to Even link-time is too early to be able to catch some high-value opportunities for catch some high-value opportunities for performance improvementperformance improvement

Whole classes of speculative Whole classes of speculative optimizations are infeasible without heroic optimizations are infeasible without heroic effortsefforts

Page 35: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Tyranny of the “Dusty Deck”Tyranny of the “Dusty Deck” Binary compatibility Binary compatibility isis one of the crowning one of the crowning

achievements of the early computer yearsachievements of the early computer yearsbut…but…

It does (or at least should) make CPU It does (or at least should) make CPU architects think very carefully about adding architects think very carefully about adding anything new becauseanything new because– you can almost never get rid of anything you addyou can almost never get rid of anything you add– it takes a long time to find out for sure whether it takes a long time to find out for sure whether

anything you add is a good idea or notanything you add is a good idea or not

Page 36: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Profile-Directed Feedback (PDF)Profile-Directed Feedback (PDF)

Two-step optimization process:Two-step optimization process:– First pass instruments the generated code to First pass instruments the generated code to

collect statistics about the program executioncollect statistics about the program execution Developer exercises this program with common Developer exercises this program with common

inputs to collect representative datainputs to collect representative data Program may be executed multiple times to reflect Program may be executed multiple times to reflect

variety of common inputsvariety of common inputs

– Second pass re-optimizes the program based Second pass re-optimizes the program based on the profile data collectedon the profile data collected

Also called Profile-Guided Optimization (PGO) or Profile-Also called Profile-Guided Optimization (PGO) or Profile-Based Optimization (PBO)Based Optimization (PBO)

Page 37: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Data collected by PDFData collected by PDF

Basic block execution countersBasic block execution counters– How many times each basic block in the How many times each basic block in the

program is reachedprogram is reached– Used to derive branch and call frequenciesUsed to derive branch and call frequencies

Value profilingValue profiling– Collects a histogram of values for a particular Collects a histogram of values for a particular

attribute of the programattribute of the program– Used for specializationUsed for specialization

Page 38: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Other PDF OpportunitiesOther PDF Opportunities

Path ProfilePath Profile Alias ProfileAlias Profile Cache Miss ProfileCache Miss Profile

– I-cache missI-cache miss– D-cache missD-cache miss– Miss typesMiss types– ITLB/DTLB missesITLB/DTLB misses

Speculation Failure ProfileSpeculation Failure Profile Event Correlation ProfileEvent Correlation Profile

Page 39: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Optimizations affected by PDFOptimizations affected by PDF

InliningInlining– Uses call frequencies to prioritize inlining sitesUses call frequencies to prioritize inlining sites

Function partitioningFunction partitioning– Groups the program into cliques of routines with high Groups the program into cliques of routines with high

call affinitycall affinity SpeculationSpeculation

– Control speculative execution, data speculative Control speculative execution, data speculative execution and value speculation based optimizations.execution and value speculation based optimizations.

PredicationPredication Code LayoutCode Layout Superblock formationSuperblock formation ……

Page 40: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Optimizations triggered by PDFOptimizations triggered by PDF((in the IBM compilerin the IBM compiler))

Specialization triggered by value profilingSpecialization triggered by value profiling– Arithmetic ops, built-in function calls, pointer callsArithmetic ops, built-in function calls, pointer calls

Extended basic block creationExtended basic block creation– Organizes code to frequently fall-through on branchesOrganizes code to frequently fall-through on branches

Specialized linkage conventionsSpecialized linkage conventions– Treats all registers as non-volatile for infrequent callsTreats all registers as non-volatile for infrequent calls

Branch hintingBranch hinting– Sets branch-prediction hints available on the ISASets branch-prediction hints available on the ISA

Dynamic memory reorganizationDynamic memory reorganization– Groups frequently accessed heap storageGroups frequently accessed heap storage

Page 41: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Impact of PDF on SpecInt 2000Impact of PDF on SpecInt 2000

-10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

bzip

2

craf

ty

eon

gap

gcc

gzip

mcf

pars

er

perlb

mk

twol

f

vort

ex vpr

PD

F v

s n

o-P

DF

im

pro

ve

me

nt

On a PWR4 system running AIX using the latest IBM compilers, at the highest available optimization level (-O5)

Page 42: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Sounds great…what’s the problem?Sounds great…what’s the problem? Only the die-hard performance types use it (eg. HPC, Only the die-hard performance types use it (eg. HPC,

middleware)middleware) It’s tricky to get right…you only want to train the system to It’s tricky to get right…you only want to train the system to

recognize things that are characteristic of the application recognize things that are characteristic of the application and somehow ignore artifacts of the input setand somehow ignore artifacts of the input set

In the end, it’s still static and runtime checks and multiple In the end, it’s still static and runtime checks and multiple versions can only take you so farversions can only take you so far

Undermines the usefulness of benchmark results as a Undermines the usefulness of benchmark results as a predictor of application performance when upgrading predictor of application performance when upgrading hardwarehardware

In summary…the usability issue for developers that shows In summary…the usability issue for developers that shows no sign of going away anytime soonno sign of going away anytime soon

Page 43: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Dynamic Compilation SystemDynamic Compilation System

class

Java Virtual Machine

JIT CompilerMachine

Code

class jar

Page 44: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

JVM EvolutionJVM Evolution First generation of JVMs were entirely interpreted. Pure First generation of JVMs were entirely interpreted. Pure

interpretation is good for proof-of-concept, but too slow for executing interpretation is good for proof-of-concept, but too slow for executing real code.real code.

Second generation JVMs used JIT (Just-in-time) compilers to Second generation JVMs used JIT (Just-in-time) compilers to convert bytecodes into machine codes before execution in a lazy convert bytecodes into machine codes before execution in a lazy fashion.fashion.

Hotspot is the 3Hotspot is the 3rdrd generation technology. It combines interpretation, generation technology. It combines interpretation, profiling and dynamic compilation. It compiles only the frequently profiling and dynamic compilation. It compiles only the frequently executed code. It also comes with 2 compilers: server compiler executed code. It also comes with 2 compilers: server compiler (optimize for speed) and client compiler (optimize for start-up and (optimize for speed) and client compiler (optimize for start-up and memory footprint).memory footprint).

New dynamic compilation techniques for JVMs are CPO New dynamic compilation techniques for JVMs are CPO (Continuous Program Optimization) or continuous recompilation and (Continuous Program Optimization) or continuous recompilation and OSR (On-Stack-Replacement) which can switch a code from OSR (On-Stack-Replacement) which can switch a code from interpretation mode to compiled versions. interpretation mode to compiled versions.

Page 45: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Dynamic CompilationDynamic Compilation

Traditional model for languages like JavaTraditional model for languages like Java Rapidly maturing technologyRapidly maturing technology Exploitation of current invocation behaviour on exact CPU Exploitation of current invocation behaviour on exact CPU

modelmodel Recompilation and other dynamic techniques enable Recompilation and other dynamic techniques enable

aggressive speculationsaggressive speculations Profile feedback to optimizer is performed online Profile feedback to optimizer is performed online

(transparent to user/application)(transparent to user/application) Compile time budget is concentrated on hottest code with Compile time budget is concentrated on hottest code with

the most (perceived) opportunitiesthe most (perceived) opportunities

But…But…

Page 46: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

Dynamic compilation…the downsidesDynamic compilation…the downsides Some Some important analyses not affordable at runtimeimportant analyses not affordable at runtime even if even if

applied only to the hottest code (array data flow, global applied only to the hottest code (array data flow, global scheduling, dependency analysis, loop transformations, …)scheduling, dependency analysis, loop transformations, …)

Non-determinismNon-determinism in the compilation system can be in the compilation system can be problematicproblematic– For some users, it severely challenges their notions of For some users, it severely challenges their notions of

quality assurancequality assurance– Requires new approaches to RAS and to getting Requires new approaches to RAS and to getting

reproducible defects for the compiler service teamreproducible defects for the compiler service team Introduces a very complicated code base into each and Introduces a very complicated code base into each and

every applicationevery application Compile time budget is concentrated on hottest code with Compile time budget is concentrated on hottest code with

the most (perceived) opportunities and not on other code, the most (perceived) opportunities and not on other code, which in aggregate may be as important a contributor to which in aggregate may be as important a contributor to performanceperformance– What do you do when there’s no hot code?What do you do when there’s no hot code?

Page 47: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

The best of both worldsThe best of both worlds

C

Portable High Level Optimizer

Bytecode, MIL, etc

Profile-DirectedFeedback (PDF)

C++ F90

CommonBackend

StaticMachine

Code

class class jar

Java / .NET

JIT DynamicMachine

Code

CPO

Front

Ends

BinaryTranslation

Page 48: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

More boxes, but is it better?More boxes, but is it better? If ubiquitous, could enable a new era in CPU If ubiquitous, could enable a new era in CPU

architectural innovation by reducing the load architectural innovation by reducing the load of the dusty deck millstoneof the dusty deck millstone– Deprecated ISA features supported via binary Deprecated ISA features supported via binary

translation or recompilation from “IL-fattened” translation or recompilation from “IL-fattened” binarybinary

– No No latency effectlatency effect in seeing the value of a new in seeing the value of a new ISA featureISA feature

– New feature mistakes become relatively New feature mistakes become relatively painless to undopainless to undo

Page 49: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

There’s moreThere’s more Transparently bring the benefits of dynamic Transparently bring the benefits of dynamic

optimization to traditionally static languages while optimization to traditionally static languages while still leveraging the power of static analysis and still leveraging the power of static analysis and language-specific semantic informationlanguage-specific semantic information– All of the advantages of dynamic profile-directed All of the advantages of dynamic profile-directed

feedback (PDF) optimizations with none of the static pdf feedback (PDF) optimizations with none of the static pdf drawbacksdrawbacks No extra build stepNo extra build step No input artifacts skewing specialization choicesNo input artifacts skewing specialization choices Code specialized to each invocation on exact processor modelCode specialized to each invocation on exact processor model More aggressive speculative optimizationsMore aggressive speculative optimizations Recompilation as a recovery optionRecompilation as a recovery option

– Static analyses inform value profiling choicesStatic analyses inform value profiling choices New static analysis goal of identifying the inhibitors to New static analysis goal of identifying the inhibitors to

optimizations for later dynamic testing and specializationoptimizations for later dynamic testing and specialization

Page 50: Seminar Series  Static and Dynamic Compiler Optimizations (6/28).  Speculative Compiler Optimizations (7/05)  ADORE: An Adaptive Object Code ReOptimization

SummarySummary A crossover point has been reached A crossover point has been reached

between dynamic and static compilation between dynamic and static compilation technologies.technologies.

They need to be converged/combined They need to be converged/combined to overcome their individual to overcome their individual weaknessesweaknesses

Hardware designers struggle under the Hardware designers struggle under the mounting burden of maintaining high mounting burden of maintaining high performance backwards compatibilityperformance backwards compatibility