terminology, principles, and concerns, iii with examples from dom (ch 9) and dvnt (ch 10) copyright...

Terminology, Principles, and Concerns, III

With examples from DOM (Ch 9) and DVNT (Ch 10)

Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Comp 512Spring 2011

Last Lecture

• Extended Basic Blocks

• Superlocal value numbering> Treat each path as a single basic block> Use a scoped hash table & SSA names to make it efficient

COMP 512, Rice University

2

This Lecture

• Dominator Trees Computing dominator information Global data-flow analysis

• Dominator-based Value Numbering Enhance the Superlocal Value Numbering algorithm so that

it can cover more blocks

• Optimizing a loop nest Finding loop nests Loop unrolling as an initial transformation


3


4

This is in SSA Form

Superlocal Value Numbering

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

With all the bells & whistles

• Find more redundancy

• Pay little additional cost

• Still does nothing for F & G

Superlocal techniques

• Some local methods extend cleanly to superlocal scopes

• VN does not back up

• If C adds to A, it’s a problem


5

What About Larger Scopes?

We have not helped with F or G

• Multiple predecessors

• Must decide what facts hold in F and in G For G, combine B & F? Merging state is expensive Fall back on what’s known

G

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F


6

Dominators

Definitionsx dominates y if and only if every path from the entry of the

control-flow graph to the node for y includes x

• By definition, x dominates x

• We associate a DOM set with each node

• |DOM(x )| ≥ 1

Immediate dominators

• For any node x, there must be a y in DOM(x ) closest to x

• We call this y the immediate dominator of x

• As a matter of notation, we write this as IDOM(x )


7

Dominators

Dominators have many uses in analysis & transformation

• Finding loops

• Building SSA form

• Making code motion decisions

We’ll look at how to compute dominators later

A

B C G

FED

Dominator tree

Dominator sets

Back to the discussion of value numbering over larger scopes ...

*

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

Original idea: R.T. Prosser. “Applications of Boolean matrices to the analysis of flow diagrams,” Proceedings of the Eastern Joint Computer Conference, Spartan Books, New York, pages 133-138, 1959.


8

What About Larger Scopes?

We have not helped with F or G

• Multiple predecessors

• Must decide what facts hold in F and in G For G, combine B & F? Merging state is expensive Fall back on what’s known

• Can use table from IDOM(x ) to start x Use C for F and A for G Imposes a Dom-based application order

Leads to Dominator VN Technique (DVNT)

*

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F


9

Dominator Value Numbering

The DVNT Algorithm

• Use superlocal algorithm on extended basic blocks Retain use of scoped hash tables & SSA name space

• Start each node with table from its IDOM DVNT generalizes the superlocal algorithm

• No values flow along back edges (i.e., around loops)

• Constant folding, algebraic identities as before

Larger scope leads to (potentially) better results LVN + SVN + good start for EBBs missed by SVN


10

Dominator Value Numbering

m a + bn a + b

A

p c + dr c + d

B

r2 (r0,r1)y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E

e3 (e1,e2)

u2 (u0,u1)v a + bw c + dx e + f

F

DVNT advantages

•Find more redundancy

•Little additional cost

•Retains online character

DVNT shortcomings

•Misses some opportunities

•No loop-carried CSEs or constants


11

Computing Dominators

Critical first step in SSA construction and in DVNT

• A node n dominates m iff n is on every path from n0 to m Every node dominates itself n’s immediate dominator is its closest dominator, IDOM(n)†

DOM(n0 ) = { n0 }

DOM(n) = { n } (ppreds(n) DOM(p))

Computing DOM

• These simultaneous set equations define a simple problem in data-flow analysis

• Equations have a unique fixed point solution

• An iterative fixed-point algorithm will solve them quickly

†IDOM(n ) ≠ n, unless n is n0, by convention.

Initially, DOM(n) = N, n≠n0


12

Round-robin Iterative Algorithm

Termination

• Makes sweeps over the nodes

• Halts when some sweep produces no change

DOM(b0 ) Ø

for i 1 to NDOM(bi ) { all nodes in graph }

change true

while (change)change false

for i 0 to NTEMP { i } (xpred (b) DOM(x ))

if DOM(bi ) ≠ TEMP thenchange trueDOM(bi ) TEMP


13

Example

B1

B2 B3

B4 B5

B6

B7

B0

Flow Graph

Progress of iterative solution for DOM

Results of iterative solution for DOM

*


14

Example

Dominance Tree

Progress of iterative solution for DOM

Results of iterative solution for DOM

B1

B2 B3

B4 B5

B6

B7

B0

There are asymptotically faster algorithms.

With the right data structures, the iterative algorithm can be made extremely fast.

See Cooper, Harvey, & Kennedy, on the web site, or algorithm in Chapter 9 of EaC.

Aside on Data-Flow Analysis

The iterative DOM calculation is an example of data-flow analysis

• Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values

• Data-flow analysis almost always operates on a graph Problems are trivial in a basic block Global problems use the control-flow graph (or derivative) Interprocedural problems use call graph (or derivative)

• Data-flow problems are formulated as simultaneous equations Sets attached to nodes and edges One solution technique is the iterative algorithm

• Desired result is usually meet over all paths (MOP) solution “What is true on every path from the entry node?” “Can this event happen on any path from the entry?”


15Related to safety

Aside on Data-Flow Analysis

Why did the iterative algorithm work?

Termination

• The DOM sets are initialized to the (finite) set of nodes

• The DOM sets shrink monotonically

• The algorithm reaches a fixed point where they stop changing

Correctness

• We can prove that the fixed point solution is also the MOP

• That proof is beyond today’s lecture, but we’ll revisit it

Efficiency

• The round-robin algorithm is not particularly efficient

• Order in which we visit nodes is important for efficient solutions


16


17

Regional Optimization: Improving Loops

Compilers have always focused on loops

• Higher execution counts inside loop than outside loops

• Repeated, related operations

• Much of the real work takes place in loops (linear algebra)

• Several effects to attack in a loop or loop nest

• Overhead Decrease control-structure cost per iteration

• Locality Spatial locality ⇒ use of co-resident data Temporal locality ⇒ reuse of same data item

• Parallelism Move loops with independent iterations to outer position

Inner positions for vector hardware & SSE


Loop unrolling (the oldest trick in the book)

• To reduce overhead, replicate the loop body

Sources of improvement

• Less overhead per useful operation

• Longer basic blocks for local optimization


18

Doesn’t mess up spatial locality on either y or m (column-major order)


With loop nest, may unroll inner loop


19

do 60 j = 1, n2 do 50 i = 1 to n1 y(i) = y(i) + x(j) * m(i,j)50 continue60 continue

Critical inner loop from dmxpy in

Linpack

Doesn’t mess up reuse on x(j)

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 0) then do 49 i = 1, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) y(i+1) = y(i+1) + x(j) * m(i+1,j) y(i+2) = y(i+2) + x(j) * m(i+2,j) y(i+3) = y(i+3) + x(j) * m(i+3,j)50 continue60 continue


With loop nest, may unroll outer loop

• Trick is to unroll outer loop and fuse resulting inner loops Loop fusion combines the bodies of two similar loops


20



Linpack

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 1) then do 49 i, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) y(i) = y(i) + x(j+1) * m(i,j+1) y(i) = y(i) + x(j+2) * m(i,j+2) y(i) = y(i) + x(j+3) * m(i,j+3)50 continue60 continue

This is clearly

wrong


With loop nest, may unroll outer loop

• Trick is to unroll outer loop and fuse resulting inner loops


21



Linpack

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 1) then do 49 i, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) + x(j+1) * m(i,j+1) + x(j+2) * m(i,j+2) + x(j+3) * m(i,j+3)50 continue60 continue

Save on loads & stores of y(i)?

Spatial reuse in x and m

The author of Linpack, after much testing, chose outer loop unrolling.


• Other effects of loop unrolling

• Increases number of independent operations inside loop May be good for scheduling multiple functional units

• Moving consecutive accesses into same iteration Scheduler may move them together (locality in big

loop)

• May make cross-iteration redundancies obvious Expose address expressions in example to LVN

• May increase demand for registers Spills can overcome any benefits

• Can unroll to eliminate copies at end of loop Often rediscovered result of Ken Kennedy’s thesis

• Can change other optimizations Weights in spill code (Das Gupta’s example)


22


Many other loop transformations appear in the literature

• We will have a lecture devoted to them later in the course

• See also COMP 515 and the Allen-Kennedy book

Next class

• Examples of Global Optimization


23

terminology, principles, and concerns, iii with examples from dom (ch 9) and dvnt (ch 10) copyright...

Documents

rice universitycomp

idomx comp

superlocal value numberingwith

f gsuperlocal techniques

discussion of value

superlocal scopesvn

node x

node domx