terminology, principles, and concerns, iii with examples from dom (ch 9) and dvnt (ch 10) copyright...

23
Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011

Upload: beatrice-morris

Post on 19-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Terminology, Principles, and Concerns, III

With examples from DOM (Ch 9) and DVNT (Ch 10)

Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Comp 512Spring 2011

Page 2: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Last Lecture

• Extended Basic Blocks

• Superlocal value numbering> Treat each path as a single basic block> Use a scoped hash table & SSA names to make it efficient

COMP 512, Rice University

2

Page 3: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

This Lecture

• Dominator Trees Computing dominator information Global data-flow analysis

• Dominator-based Value Numbering Enhance the Superlocal Value Numbering algorithm so that

it can cover more blocks

• Optimizing a loop nest Finding loop nests Loop unrolling as an initial transformation

COMP 512, Rice University

3

Page 4: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

4

This is in SSA Form

Superlocal Value Numbering

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

With all the bells & whistles

• Find more redundancy

• Pay little additional cost

• Still does nothing for F & G

Superlocal techniques

• Some local methods extend cleanly to superlocal scopes

• VN does not back up

• If C adds to A, it’s a problem

Page 5: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

5

What About Larger Scopes?

We have not helped with F or G

• Multiple predecessors

• Must decide what facts hold in F and in G For G, combine B & F? Merging state is expensive Fall back on what’s known

G

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

Page 6: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

6

Dominators

Definitionsx dominates y if and only if every path from the entry of the

control-flow graph to the node for y includes x

• By definition, x dominates x

• We associate a DOM set with each node

• |DOM(x )| ≥ 1

Immediate dominators

• For any node x, there must be a y in DOM(x ) closest to x

• We call this y the immediate dominator of x

• As a matter of notation, we write this as IDOM(x )

Page 7: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

7

Dominators

Dominators have many uses in analysis & transformation

• Finding loops

• Building SSA form

• Making code motion decisions

We’ll look at how to compute dominators later

A

B C G

FED

Dominator tree

Dominator sets

Back to the discussion of value numbering over larger scopes ...

*

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

Original idea: R.T. Prosser. “Applications of Boolean matrices to the analysis of flow diagrams,” Proceedings of the Eastern Joint Computer Conference, Spartan Books, New York, pages 133-138, 1959.

Page 8: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

8

What About Larger Scopes?

We have not helped with F or G

• Multiple predecessors

• Must decide what facts hold in F and in G For G, combine B & F? Merging state is expensive Fall back on what’s known

• Can use table from IDOM(x ) to start x Use C for F and A for G Imposes a Dom-based application order

Leads to Dominator VN Technique (DVNT)

*

m0 a + bn0 a + b

A

p0 c + dr0 c + d

B

r2 (r0,r1)y0 a + bz0 c + d

G

q0 a + br1 c + d

C

e0 b + 18s0 a + bu0 e + f

D e1 a + 17t0 c + du1 e + f

E

e3 (e0,e1)

u2 (u0,u1)v0 a + bw0 c + dx0 e + f

F

Page 9: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

9

Dominator Value Numbering

The DVNT Algorithm

• Use superlocal algorithm on extended basic blocks Retain use of scoped hash tables & SSA name space

• Start each node with table from its IDOM DVNT generalizes the superlocal algorithm

• No values flow along back edges (i.e., around loops)

• Constant folding, algebraic identities as before

Larger scope leads to (potentially) better results LVN + SVN + good start for EBBs missed by SVN

Page 10: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

10

Dominator Value Numbering

m a + bn a + b

A

p c + dr c + d

B

r2 (r0,r1)y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E

e3 (e1,e2)

u2 (u0,u1)v a + bw c + dx e + f

F

DVNT advantages

•Find more redundancy

•Little additional cost

•Retains online character

DVNT shortcomings

•Misses some opportunities

•No loop-carried CSEs or constants

Page 11: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

11

Computing Dominators

Critical first step in SSA construction and in DVNT

• A node n dominates m iff n is on every path from n0 to m Every node dominates itself n’s immediate dominator is its closest dominator, IDOM(n)†

DOM(n0 ) = { n0 }

DOM(n) = { n } (ppreds(n) DOM(p))

Computing DOM

• These simultaneous set equations define a simple problem in data-flow analysis

• Equations have a unique fixed point solution

• An iterative fixed-point algorithm will solve them quickly

†IDOM(n ) ≠ n, unless n is n0, by convention.

Initially, DOM(n) = N, n≠n0

Page 12: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

12

Round-robin Iterative Algorithm

Termination

• Makes sweeps over the nodes

• Halts when some sweep produces no change

DOM(b0 ) Ø

for i 1 to NDOM(bi ) { all nodes in graph }

change true

while (change)change false

for i 0 to NTEMP { i } (xpred (b) DOM(x ))

if DOM(bi ) ≠ TEMP thenchange trueDOM(bi ) TEMP

Page 13: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

13

Example

B1

B2 B3

B4 B5

B6

B7

B0

Flow Graph

Progress of iterative solution for DOM

Results of iterative solution for DOM

*

Page 14: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

14

Example

Dominance Tree

Progress of iterative solution for DOM

Results of iterative solution for DOM

B1

B2 B3

B4 B5

B6

B7

B0

There are asymptotically faster algorithms.

With the right data structures, the iterative algorithm can be made extremely fast.

See Cooper, Harvey, & Kennedy, on the web site, or algorithm in Chapter 9 of EaC.

Page 15: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Aside on Data-Flow Analysis

The iterative DOM calculation is an example of data-flow analysis

• Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values

• Data-flow analysis almost always operates on a graph Problems are trivial in a basic block Global problems use the control-flow graph (or derivative) Interprocedural problems use call graph (or derivative)

• Data-flow problems are formulated as simultaneous equations Sets attached to nodes and edges One solution technique is the iterative algorithm

• Desired result is usually meet over all paths (MOP) solution “What is true on every path from the entry node?” “Can this event happen on any path from the entry?”

COMP 512, Rice University

15Related to safety

Page 16: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Aside on Data-Flow Analysis

Why did the iterative algorithm work?

Termination

• The DOM sets are initialized to the (finite) set of nodes

• The DOM sets shrink monotonically

• The algorithm reaches a fixed point where they stop changing

Correctness

• We can prove that the fixed point solution is also the MOP

• That proof is beyond today’s lecture, but we’ll revisit it

Efficiency

• The round-robin algorithm is not particularly efficient

• Order in which we visit nodes is important for efficient solutions

COMP 512, Rice University

16

Page 17: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

COMP 512, Rice University

17

Regional Optimization: Improving Loops

Compilers have always focused on loops

• Higher execution counts inside loop than outside loops

• Repeated, related operations

• Much of the real work takes place in loops (linear algebra)

• Several effects to attack in a loop or loop nest

• Overhead Decrease control-structure cost per iteration

• Locality Spatial locality ⇒ use of co-resident data Temporal locality ⇒ reuse of same data item

• Parallelism Move loops with independent iterations to outer position

Inner positions for vector hardware & SSE

Page 18: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Regional Optimization: Improving Loops

Loop unrolling (the oldest trick in the book)

• To reduce overhead, replicate the loop body

Sources of improvement

• Less overhead per useful operation

• Longer basic blocks for local optimization

COMP 512, Rice University

18

Page 19: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Doesn’t mess up spatial locality on either y or m (column-major order)

Regional Optimization: Improving Loops

With loop nest, may unroll inner loop

COMP 512, Rice University

19

do 60 j = 1, n2 do 50 i = 1 to n1 y(i) = y(i) + x(j) * m(i,j)50 continue60 continue

Critical inner loop from dmxpy in

Linpack

Doesn’t mess up reuse on x(j)

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 0) then do 49 i = 1, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) y(i+1) = y(i+1) + x(j) * m(i+1,j) y(i+2) = y(i+2) + x(j) * m(i+2,j) y(i+3) = y(i+3) + x(j) * m(i+3,j)50 continue60 continue

Page 20: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Regional Optimization: Improving Loops

With loop nest, may unroll outer loop

• Trick is to unroll outer loop and fuse resulting inner loops Loop fusion combines the bodies of two similar loops

COMP 512, Rice University

20

do 60 j = 1, n2 do 50 i = 1 to n1 y(i) = y(i) + x(j) * m(i,j)50 continue60 continue

Critical inner loop from dmxpy in

Linpack

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 1) then do 49 i, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) y(i) = y(i) + x(j+1) * m(i,j+1) y(i) = y(i) + x(j+2) * m(i,j+2) y(i) = y(i) + x(j+3) * m(i,j+3)50 continue60 continue

This is clearly

wrong

Page 21: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Regional Optimization: Improving Loops

With loop nest, may unroll outer loop

• Trick is to unroll outer loop and fuse resulting inner loops

COMP 512, Rice University

21

do 60 j = 1, n2 do 50 i = 1 to n1 y(i) = y(i) + x(j) * m(i,j)50 continue60 continue

Critical inner loop from dmxpy in

Linpack

do 60 j = 1, n2 nextra = mod(n1,4) if (nextra .ge. 1) then do 49 i, nextra, 1 y(i) = y(i) + x(j) * m(i,j)49 continue

do 50 i = nextra+1, n1, 4 y(i) = y(i) + x(j) * m(i,j) + x(j+1) * m(i,j+1) + x(j+2) * m(i,j+2) + x(j+3) * m(i,j+3)50 continue60 continue

Save on loads & stores of y(i)?

Spatial reuse in x and m

The author of Linpack, after much testing, chose outer loop unrolling.

Page 22: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Regional Optimization: Improving Loops

• Other effects of loop unrolling

• Increases number of independent operations inside loop May be good for scheduling multiple functional units

• Moving consecutive accesses into same iteration Scheduler may move them together (locality in big

loop)

• May make cross-iteration redundancies obvious Expose address expressions in example to LVN

• May increase demand for registers Spills can overcome any benefits

• Can unroll to eliminate copies at end of loop Often rediscovered result of Ken Kennedy’s thesis

• Can change other optimizations Weights in spill code (Das Gupta’s example)

COMP 512, Rice University

22

Page 23: Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved

Regional Optimization: Improving Loops

Many other loop transformations appear in the literature

• We will have a lecture devoted to them later in the course

• See also COMP 515 and the Allen-Kennedy book

Next class

• Examples of Global Optimization

COMP 512, Rice University

23