interprocedural optimization — a much older version of the lecture — copyright 2011, keith d....

22
Interprocedural Optimization — a much older version of the lecture — Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011 1 COMP 512, Rice University

Upload: sydney-york

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Interprocedural Optimization

— a much older version of the lecture —

Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Comp 512Spring 2011

1COMP 512, Rice University

COMP 512, Rice University

2

Enough Analysis, What Can We Do?

There are interprocedural optimizations

• Choosing custom procedure linkages

• Interprocedural common subexpression elimination

• Interprocedural code motion

• Interprocedural register allocation

• Memo-function implementation

• Cross-jumping

• Procedure recognition & abstraction

COMP 512, Rice University

3

Linkage Tailoring

Should choose the best linkage for circumstances

• Inline, clone, semi-open, semi-closed, closed

• Estimate execution frequencies & improvements

• Assign styles to call sites The choices interact

COMP 512, Rice University

4

Linkage Tailoring

Linkage styles

Traditional closed linkage

p

q

p

q

Open linkage(inlined )• Eliminate the call overhead

• Create tailored private copy of callee

COMP 512, Rice University

5

Linkage Tailoring

Linkage styles

Procedure Cloning

tsp

q0 q1

r

• Partition call sites by environment

• Create a location where desired facts can be true

COMP 512, Rice University

6

Linkage Tailoring

Linkage styles

Traditional closed linkage

p

q

Semi-open linkage

• Move prolog & epilog across call & tailor them

• For call in loop, parts of linkage are loop invariant

p

q

Think of this as an aggressive closed linkage

COMP 512, Rice University

7

Linkage Tailoring

Should choose the best linkage for circumstances

• Inline, clone, semi-open, semi-closed, closed

• Estimate execution frequencies & improvements

• Assign styles to call sites

Practical approach

• Limit choices (standard, cloned, inlined)

• Clone for better information & to specialize (based on idfa)

• Inline for high-payoff optimizations

• Adopt an aggressive standard linkage Move parameter addressing code out of callee (& out of

loop)

The choices interact

COMP 512, Rice University

8

Linkage Tailoring

A low-level, post-compilation idea

• Compute register summaries for each procedure How many registers are really used?

• Rotate assignment to move them into caller-saves registers Leave callee-saves registers unused, if possible Avoid saving them in the callee

Leaf routine with little demand for registers

• Adjust caller-saves/callee-saves boundary

• Remove caller’s stores for registers unused in callee

Some authors call this interprocedural register allocation

COMP 512, Rice University

9

Interprocedural Register Allocation

Some work has been done

• Chow’s compiler for the MIPS machine, “shrink wrapping” Often slowed down the code

• Wall and Goodwin at DEC SRC did link-time allocation Target machine had scads of registers Essentially, adjusted caller-saves & callee-saves

What about full-blown, allocate the whole thing, allocation?

• Arithmetic of costs is pretty complex

• Requires good profile or frequency information

• Need a fair basis for comparing different uses for ri

• Real issue would be spilling (“spill everywhere” would be a disaster)

COMP 512, Rice University

10

Interprocedural Common Subexpression Elimination

Sounds good, but consider the real opportunities

• Procedures only share parameters, globals, & constants

• No local variables in an ICSE, by definition

• Not a very large set of expressions

Possible schemes

• Create a global data area to hold ICSEs Sidestep register pressure issue

• Ellide unnecessary parameters or pass-through parameters “commoning” in Cocke’s terminology Shrink the pre-call sequence

COMP 512, Rice University

11

Invocation Invariant Expressions

Do iie’s exist? (yes)

Is moving them profitable? (it should be)

Can we engineer this into a compiler? (this is tougher)

8.9% to 0.2%

1.66% on average

Static counts, not dynamic counts

COMP 512, Rice University

12

Interprocedural Code Motion

What could we do? What could we find?

• Find and mark loop nests in the call graph

• Compute interprocedural AVAIL ?

• Move code across procedure boundaries

Two ideas

• Invocation invariant expressions Expression whose value is determined at point of call Hoist them to the prolog, or hoist them across the call

• Loop embedding and extraction Move control-flow for entire nest into one procedure Enable optimization using intra-procedural techniques

} All are difficult

COMP 512, Rice University

13

Interprocedural Code Motion

Moving loops across a call

do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end end

fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end

call fee(a,b,c) fee(x,y,z) do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end end end

Loop Embedding

Think of this as dual of partial inlining (partial outlining?)

COMP 512, Rice University

14

Interprocedural Code Motion

Moving loops across a call

do i = 1 to 100 do j = 1 to 100 call fee(a,b,c,i,j) end end

fee(x,y,z,i,j) do k = 1 to 100 x(i,j) = x(i,j) + y(i,k) * z(k,j) end

do i = 1 to 100 do j = 1 to 100 do k = 1 to 100 call fee(a,b,c,i,j,k) end end end

fee(x,y,z,i,j,k) x(i,j) = x(i,j) + y(i,k) * z(k,j)

Loop Extraction

Think of this as partial inlining

COMP 512, Rice University

15

Memo-function implementation

Idea

• Find pure functions & turn them into hashed lookups

Implementation

• Use interprocedural analysis to identify pure functions

• Insert stub with lookup between call & evaluation

Benefits

• Replace evaluations with table lookup

• Potential for substantial run-time savings

• Should share table implementation with other functions

COMP 512, Rice University

16

Cross-jumping

Idea

• Procedure epilogs come in two flavors Returned value & no returned value

• Eliminate duplicates & save space

Implementation

• At start of each block, compare ops before predecessor branch

• If identical, move it across the branch

• Repeat until code stops changing

Presents new challenges to the debugger

COMP 512, Rice University

17

Procedure Abstraction

Idea

• Reocgnize common instruction sequences

• Replace them with (very) cheap calls

Experience

• Need to abstract register names & local labels

• Use suffix trees (bio-informatics)

• Produces smaller code with longer paths ~ 1% slower for each 2% smaller

• Wreaks havoc with the debugger

Cooper & McIntosh, PLDI 99, May 1999

COMP 512, Rice University

18

Putting It All Together

Key questions that a compiler writer must answer?

• What should the optimizer & back end do?

• How should the compiler represent the program?

Broad answers

• Figure out what the real problems are

• Learn everything you can about the target machine

• Attack each problem with a specific plan

• Separate concerns as long as possible

COMP 512, Rice University

19

Lessons

Experience is a tough teacher

• Understand what the code looks like On input to the compiler and on output from the compiler Hard to see problems in large procedures

• A good compiler need not do everything Study the code & figure out what’s needed Do those things well & do them thoroughly

• Hand simulation before implementation pays off Avoid implementing things that solve no problem

• Find the right level of abstraction for each optimization Constant propagation can be done at source level Redundancy elimination should be done at a low-level

COMP 512, Rice University

20

Lessons

Experience is a tough teacher

• Intermediate representations are important Determines level of exposed detail Every pass traverses it, so efficient manipulation is critical

• Code shape and name spaces are important Determine opportunities & size of data structures Has large impact on effectiveness of algorithms

• Compile-time space counts Limits size of procedures that can be compiled Compiler touches all that space More complex analyses take lots of space

COMP 512, Rice University

21

And that’s the end of my story ...

Comp 512

COMP 512, Rice University

22

Lessons

Experience is a tough teacher

• Separation of concerns is vital to progress Resource constraints from rearrangement Allocation from optimization Parsing from code generation

• Separation of concerns interferes with optimization Allocation & scheduling Evaluation order & redundancy elimination Code shape decisions affect later passes { Loop shape, case stmts.

Annotations (ILOC tags)