1 intermediate representation goals: –encode knowledge about the program –facilitate analysis...

27
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing semantic analysis HIR intermediate code gen. HIR optim LIR code gen. LIR

Post on 20-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

1

Intermediate representation

• Goals: – encode knowledge about the program– facilitate analysis– facilitate retargeting– facilitate optimization

scanningparsing

semantic analysis

HIR intermediatecode gen.

HIRoptim

LIR codegen.

LIR

2

Intermediate representation

• Components– code representation– symbol table– analysis information– string table

• Issues– Use an existing IR or design a new one?– How close should it be to the source/target?

3

IR selection

• Using an existing IR– cost savings due to reuse– it must be expressive and appropriate for

the compiler operations

• Designing an IR– decide how close to machine code it should

be– decide how expressive it should be– decide its structure– consider combining different kinds of IRs

4

IR classification: Level

• High-level – closer to source language– used early in the process– usually converted to lower form later on– Example: AST

5

IR classification: Level

• Medium-level – try to reflect the range of features in the

source language in a language-independent way

– most optimizations are performed at this level

• algebraic simplification• copy propagation• dead-code elimination• common subexpression elimination• loop-invariant code motion• etc.

6

IR classification: Level

• Low-level – very close to target-machine instructions– architecture dependent– useful for several optimizations

• loop unrolling• branch scheduling• instruction/data prefetching• register allocation• etc.

7

IR classification: Level

for i := op1 to op2 step op3

instructions

endfor

i := op1

if step < 0 goto L2

L1: if i > op2 goto L3

instructions

i := i + step

goto L1

L2: if i < op2 goto L3

instructions

i := i + step

goto L2

L3:

High-level

Medium-level

8

IR classification: Structure

• Graphical– trees, graphs– not easy to rearrange– large structures

• Linear– looks like pseudocode– easy to rearrange

• Hybrid– combine graphical and linear IRs– Example:

• low-level linear IR for basic blocks, and• graph to represent flow of control

9

(Basic blocks)

• Basic block = a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end.

10

(Basic blocks)

• Partitioning a sequence of statements into BBs– Determine leaders (first statements of BBs)

• the first statement is a leader• the target of a conditional is a leader• a statement following a branch is a leader

– For each leader, its basic block consists of the leader and all the statements up to but not including the next leader.

11

Linear IRs

• Sequence of instructions that execute in order of appearance

• Control flow is represented by conditional branches and jumps

• Common representations– stack machine code– three-address code

12

Linear IRs

• stack machine code– assumes presence of operand stack– useful for stack architectures, JVM– operations typically pop operands and push

results.– advantages

• easy code generation• compact form

– disadvantages• difficult to rearrange• difficult to reuse expressions

13

Linear IRs

• three-address code– compact– generates temp variables– level of abstraction may vary– loses syntactic structure– quadruples

• operator• up to two operands• destination

– triples• similar to quadruples but the results are not named

explicitly (index of operation is implicit name)– Implement as table, array of pointers, or list

14

Linear IRs

L1: i := 2

t1:= i+1

t2 := t1>0

if t2 goto L1

(1) 2

(2) i st (1)

(3) i + 1

(4) (3) > 0

(5) if (4), (1)Quadruples

Triples

15

Graphical IRs

• Parse tree• Abstract syntax tree

– high-level– useful for source-level information– retains syntactic structure– Common uses

• source-to-source translation• semantic analysis• syntax-directed editors

16

Graphical IRs

• Tree, for basic block– root: operator– up to two children: operands– can be combined

• Uses:– algebraic simplifications– may generate locally optimal code.

L1: i := 2

t1:= i+1

t2 := t1>0

if t2 goto L1

assgn, i add, t1 gt, t2

2 i 1 t1 02

assgn, i 1

add, t1 0

gt, t2

17

Graphical IRs

• Directed acyclic graphs (DAGs)– Like compressed trees

• leaves: variables, constants available on entry• internal nodes: operators

– annotated with variable names? • distinct left/right children

– Used for basic blocks (doesn't show control flow)

– Can generate efficient code.• Note: DAGs encode common expressions

– But difficult to transform – Better for analysis

18

Graphical IRs

• Generating DAGs– check whether an operand is already

present• if not, create a leaf for it

– check whether there is a parent of the operand that represents the same operation

• if not create one, then label the node representing the result with the name of the destination variable, and remove that label from all other nodes in the DAG.

19

Graphical IRs

• Directed acyclic graphs (DAGs)– Example

m := 2 * y * z n := 3 * y * z p := 2 * y - z

20

Graphical IRs

• Control flow graphs (CFGs)– Each node corresponds to a

• basic block, or – fewer nodes– may need to determine facts at specific points within

BB

• a single statement– more space and time

– Each edge represents flow of control

21

Graphical IRs

• Dependence graphs – Encode flow of values from definition to use– Nodes represent operations– Edges connect definitions to uses– Graph represents constraints on the

sequencing of operations– Built for specific optimizations, then

discarded

22

SSA form

• Static Single Assignment Form– Encodes information about data and control flow– Two constraints:

• each definition has a unique name• each use refers to a single definition

– all uses reached by a definition are renamed

– Example:x := 5 x0 := 5 x := x+1 becomes x1 := x0 + 1 y := x *2 y0 := x1 * 2

• What if we have a loop?

23

SSA form

• The compiler inserts special join functions (called -functions) at points where different control flow paths meet.

• Example:read(x) read(x0)if (x>0) if (x0>0) y:=5 y0 := 5else becomes else y:=10 y1 := 10x := y y2 := (y0, y1) x1 := y2

24

SSA form

• Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: i := i+1

25

SSA form

• Example 2: x := 0 x0 := 0i := 1 i0 := 1while (i<10) if (i0>=10) goto L2 x := x+i L1: x1:= (x0, x2) i := i+1 i1 := (i0, i2)

x2 := x1+i1 i2 := i1+1 if (i2<10) goto L1

L2: x3 := (x0, x2) i3 := (i0, i2)

26

SSA form

• Note: is not an executable function• A program is in SSA form if

– each variable is assigned a value in exactly one statement

– each use of a variable is dominated by the definition.

• point x dominates point y if every path from the start to y goes through x

27

SSA form

• Why use SSA?– explicit def-use pairs– no write-after-read and write-after-write

dependences– speeds up many dataflow optimizations

• But– too many temp variables, -functions– limited to scalars

• how to handle arrays?