optimizing compilers for modern architectures more interprocedural analysis chapter 11, sections...

22
Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Upload: joel-griffith

Post on 17-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

More Interprocedural Analysis

Chapter 11, Sections 11.2.5 to end

Page 2: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Review

• We analyzed several interprocedural problems.—MOD -- set of variables modified in procedure—ALIAS -- set of aliased variables in a procedure.

• The problem of aliasing make interprocedural problems harder than global problems.

Page 3: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

SUBROUTINE FOO(N) INTEGER N,M CALL INIT(M,N) DO I = 1,P B(M*I + 1) = 2*B(1) ENDDOEND

SUBROUTINE INIT(M,N) M = NEND

If N = 0 on entry to FOO, the loop is a reduction.Otherwise, we can vectorize the loop.

Constant Propagation

• Propagating constants between procedures can cause significant improvements.

• Dependence testing can be made more precise.

Page 4: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

• Definition: Let s = (p,q) be a call site in procedure p, and let x be a parameter of q. Then , the jump function for x at s, gives the value of x in terms of parameters of p.

• The support of is the set of p-parameters that is dependent on.

J sx

J sxJ s

x

Constant Propagation

Page 5: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

• Instead of a Def-Use graph, we construct an interprocedural value graph:—Add a node to the graph for each jump function

—If x belongs to the support of , where t lies in the procedure q, then add an edge between and for every call site s = (p,q) for some p.

• We can now apply the constant propagation algorithm to this graph. —Might want to iterate with global propagation

J sxJ t

yJ s

x

J ty

Constant Propagation

Page 6: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

JαX

JβVJβ

U

JαY

The constant-propagation algorithm willEventually converge to above values.(might need to iterate with global)

1 2

-13

ExamplePROGRAM MAIN

INTEGER A,B

A = 1

B = 2

α CALL S(A,B)

END

SUBROUTINE S(X,Y)

INTEGER X,Y,Z,W

Z = X + Y

W = X - Y

β CALL T(Z,W)

END

SUBROUTINE T(U,V)

PRINT U,V

END

Page 7: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

PROGRAM MAIN INTEGER Aα CALL PROCESS(15,A) PRINT AENDSUBROUTINE PROCESS(N,B) INTEGER N,B,Iβ CALL INIT(I,N) CALL SOLVE(B,I)ENDSUBROUTINE INIT(X,Y) INTEGER X,Y X = 2*YENDSUBROUTINE SOLVE(C,T) INTEGER C,T C = T*10END

•Need a way of building•For x output of p, define to be

the value of x on return from p in terms of p-parameters.

•The support of is defined as above.

J γI

Rpx

Rpx

RINITX ={2* Y} RSOLVE

C ={T *10}

J γT =

RINITX (N) I ∈MOD(β)

undefinedotherwise

⎧ ⎨ ⎪

⎩ ⎪

J αN =15 J β

Y =N

RPROCESSB =

RSOLVEC (J γ

T (N)) C∈MOD(γ)

undefinedotherwise

⎧ ⎨ ⎪

⎩ ⎪

Jump Functions

Page 8: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

• The NotKilled problem is easily solved globally:

• To extend NotKilled to the procedure level, construct the reduced-control-flow graph for the procedure:—Vertices consist of procedure entry, procedure exit, and

call sites.—Every edge (x,y) in the graph is annotated with the set

THRU(x,y) of variables not killed on that edge.

NotKilled(b)= ∪c∈succ(b)

(THRU(b,c)∩ NotKilled(c))

Kill Analysis

Page 9: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Reduced Control GraphComputeReducedCFG(G)

remove back edges from G

for each successsor of entry node s, add (entry, s) to worklist

while worklist isn’t empty

remove a ready element (b,s) from the worklist

if s is a call site

add (s,t) to worlist for each sucessor t of s

otherwise if s isn’t the exit node

for each successor t of s

if THRU[b,t] undefined then THRU[b,t] {}

THRU[b,t] THRU[b,t] (THRU[b,s] THRU[s,t])

end

Page 10: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Kill Analysis

ComputeNKILL(p)

for each b in reduced graph in reverse top order

if b is exit node then NKILL[b] {all variables}

else

NKILL[b] {}

for each successor s of b

NKILL[b] NKILL[b] (NKILL[s] THRU[b,s]

if b is a call site (p,q) then

NKILL[b] NKILL[b] NKILL[q]

NKILL[p] NKILL[entry node]

end

Page 11: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Kill Analysis

• ComputeNKill only works in the absence of formal parameters.

• A more complicated algorithm takes parameter binding into account using a binding graph.

• This algorithm ignores aliasing for efficiency.

Page 12: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Symbolic Analysis

• Prove facts about variables other than constancy:—Find a symbolic expression for a variable in terms of other

variables.—Establish a relationship between pairs of variables at some

point in program.—Establish a range of values for a variable at a given point.

Page 13: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

[-∞60 [50:∞][1:100]

[-∞:100] [1:∞]

[-∞,∞]

• Jump functions and return jump functions return ranges.

• Meet operation is now more complicated.

• If we can bound number of times upper bound increases and lower bound decreases, the finite-descending-chain property is satisfied.

Range Analysis:

Symbolic Analysis

• Range analysis and symbolic evaluation can be solved using a lattice framework.

Page 14: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

DO I = 1,N CALL SOURCE(A,I) CALL SINK(A,I)ENDDO

We want to know if this loop carries a dependence.MOD and USE are some use, but notmuch.

Let be the set of locations in array modified on iteration I and set of locations used on iteration I. Then has a carried true dependence iff

MA(I )UA(I )

MA(I1)∩ UA(I 2) ≠∅ 1≤I1 <I2 ≤N

Array Section Analysis

• Consider the following code:

Page 15: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Array Section Analysis

• One possible lattice representation are sections of the form:

– A(I,L), A(I,*), A(*,L), A(*,*)

• The depth of the lattice is now on the order of the number of array subscripts., and the meet operation is efficient.

• A better representation is one in which upper and lower bounds for each subscript are allowed.

Page 16: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

SUBROUTINE SUB1(X,Y,P) INTEGER X,Y CALL P(X,Y)END

What procedure names can bepassed into P?

To avoid loss of precision, we need to record sets of procedure-name tuples when a subroutine has multiple procedure parameters.

Call Graph Construction

• This problem is complicated by procedure parameters:

Page 17: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Call Graph ConstructionProcedure ComputeProcParms ProcParms(p) {} for each procedure p for each call site s = (p,q) passing in procedure names Let t = <N1,N2, …, Nk> be procedure names passed in. worklist worklist {<t,q>} while worklist isn’t empty remove <t= <N1,N2, …, Nk>,p> from worklist ProcParms[p] ProcParms[p] {t} <P1,P2,…,Pk> are parameters bound to <N1,N2, …, Nk> for each call site (p,q) passing in some Pi Let u=<M2,M2,…,Ml> set of procedure names and instances of Pi passed into q if u is not in ProcParms[q] then worklist worklist {<u,q>}end

Page 18: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Inlining

• Inlining procedure calls has several advantages:– Eliminates procedure call overhead.– Allows more optimizations to take place

• However, a study by Mary Hall showed that overuse of inlining can cause slowdowns:

– Breaks compiler procedure assumptions.– Function calls add needed register spills.– Changing function forces global recompilation.

Page 19: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

PROCEDURE UPDATE(A,N,IS) REAL A(N) INTEGER I = 1,N A(I*IS-IS+1)=A(I*IS-IS+1)+PI ENDDOEND

If we knew that IS != 0 ata call, then loop can be vectorized.

If we know that IS != 0 at specific call sites, clone a vectorized version of the procedure and use it at those sites.

Procedure Cloning

• Often specific values of function parameters result in better optimizations.

Page 20: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

DO I = 1,N CALL FOO()ENDDO

PROCEDURE FOO() …END

CALL FOO()

PROCEDURE FOO() DO I = 1,N … ENDDOEND

Hybrid optimizations

• Combinations of procedures can have benefit.

• One example is loop embedding:

Page 21: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Whole Program

• Want to efficiently do interprocedural analysis without constant recompilation.

• This is a software engineering problem.

• Basic idea: every time the optimizer passes over a component, calculate information telling what other procedures must be rescanned.

• Include a feedback loop that optimizes until no procedures are out of date.

Page 22: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end

Optimizing Compilers for Modern Architectures

Summary

• Solution of flow sensitive probems:—Constant propagation—Kill analysis

• Solutions to related problems such as symbolic analysis and array section analysis.

• Other optimizations and ways to integrate these into whole-program compilation.