spring 2014jim hogg - uw - cse - p501n-1 cse p501 – compiler construction compiler backend...

27
Spring 2014 Jim Hogg - UW - CSE - P501 N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers Allocation Instruction Selection Peephole Optimization Peephole Instruction Selection

Upload: kory-sharp

Post on 12-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-1

CSE P501 – Compiler Construction

Compiler Backend Organization

Instruction Selection Instruction Scheduling Registers Allocation

Instruction Selection Peephole Optimization Peephole Instruction Selection

Page 2: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-2

Source TargetFront End Back End

Scan

chars

tokens

AST

IR

AST = Abstract Syntax Tree

IR = Intermediate Representation

‘Middle End’

Optimize

Select Instructions

Parse

Convert

Allocate Registers

Emit

Machine Code

IR

IR

IR

IR

IR

A Compiler

Instruction Selection: processors don't support IR; need to convert IR into real machine code

Page 3: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-3

The Big Picture

Compiler = lots of fast stuff, followed by hard problems

Scanner: O(n) Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete

• Recall: approach in P501 to describing backend is 'survey' level

• A deeper dive would require a further full, 10-week course

Page 4: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Compiler Backend: 3 Parts

Select Instructions eg: Best x86 instruction to implement: vr17 = 12(arp) ?

Schedule Instructions eg: Given instructions a,b,c,d,e,f,g would the program run

faster if they were issued in a different order, such as d,c,a,b,g,e,f ?

Allocate Registers eg: which variables to store in registers? which to store in

memory?

Spring 2014 Jim Hogg - UW - CSE - P501 N-4

Page 5: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Instruction Selection is . . .

Spring 2014 Jim Hogg - UW - CSE - P501 N-5

Select Real Instructions

Mid-LevelIR

Low-LevelIR

• Specific to one chip/ISA• Use chip addressing modes• But still not decided on

registers

• TAC - Three Address Code• t1 a op b

• Tree or Linear• supply of temps• Array address calcs.

explicit• Optimizations all done• Storage locations

decided

Page 6: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Instruction Scheduling is . . .

Spring 2014 Jim Hogg - UW - CSE - P501 N-6

Schedule

• Execute in-order to get correct answer

abcdefgh

badfcghf

• Issue in new order• eg: memory fetch is

slow• eg: divide is slow• Overall faster• Still get correct

answer!

Originally devised for super-computersNow used everywhere:

• in-order procs - Atom, older ARM• out-of-order procs - newer x86

Compiler does 'heavy lifting' - reduce chip power

Page 7: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Register Allocation is . . .

Spring 2014 Jim Hogg - UW - CSE - P501 N-7

Allocate

• Virtual registers or temps

• supply

R1 = = R1push R1R1 = = R1pop R1

• Real machine registers• eg: EAX, EBP

• Very finite supply!• Enregister/Spill

After allocating registers, may do a second pass of scheduling to improve speed of spill code

Page 8: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Instruction selection chooses which target instructions (eg: for x86, for ARM) to use for each IR instruction

Selecting the best instructions is massively difficult because modern ISAs provide:

huge choice of instructions wide choice of addressing modes Eg: Intel's x64 Instruction Set Reference Manual = 1422

pages

Spring 2014 Jim Hogg - UW - CSE - P501 N-8

Instruction Selection

Page 9: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-9

Choices, choices ...

Most chip ISAs provide many ways to do the same thing:

eg: to set eax to 0 on x86 has several alternatives:

mov eax, 0 xor eax, eaxsub eax, eax imul eax, 0

Many machine instructions do several things at once – eg: register arithmetic and effective address calculation. Recall:

lea rdst, [rbase + rindex*scale + offset]

Page 10: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-10

Overview: Instruction Selection

Map IR into near-assembly code For MiniJava, we emitted textual assembly code Commercial compilers emit binary code directly

Assume known storage layout and code shape ie: optimization phases have already done their thing

Combine low-level IR operations into machine instructions (take advantage of addressing modes, etc)

Page 11: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-11

Criteria for Instruction Selection

Several possibilities Fastest Smallest Minimal power (eg: don’t use a function-unit if leaving it

powered-down is a win)

Sometimes not obvious eg: if one of the function-units in the processor is idle and

we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldn’t be chosen normally

(Some interaction with scheduling here…)

Page 12: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Instruction Selection: Approaches

Two main techniques:

1. Tree-based matches (eg: maximul munch algorithm)

2. Peephole-based generation

Spring 2014 Jim Hogg - UW - CSE - P501 N-12

• Note that we select instructions from the target ISA.• We do not decide on which physical registers to use. That comes

later during Register Allocation• We have a few generic registers: ARP, StackPointer, ResultReg

Page 13: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-13

Tree-Based Instruction Selection

ID<e,arp,4>

ID<f,arp,8>

e f

case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res)

How to generate target code for this simple tree?

We could use a template approach - similar to converting an AST into IR:

Resulting codegen is correct, but dumbDoesn't even cover: call-by-val, call-by-ref, enregistered, different data type, etc

Page 14: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-14

loadI 4 => r5

loadAO rarp, r5=> r6

loadI 8 => r7

loadAO rarp, r7=> r8

mult r6, r8 => r9

ID<e,arp,4>

ID<f,arp,8>

e f

loadAI rarp, 4 => r5

loadAI rarp, 8 => r6

mult r5, r6 => r7

Naive Ideal

case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res)

Template Code Generation

Page 15: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

IR (3-address Code) Tree-lets

Production ILOC Template5 ...6 Reg Lab load l => rn

8 Reg Num9 Reg Reg1 load r1 => rn

10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn

11 Reg + Reg1 Num2 loadAI r1, n2 => rn

14 Reg + Lab1 Reg2 loadAI r2, l1 => rn

15 Reg + Reg1 Reg2 add r1, r2 => rn

16 Reg + Reg1 Num2 addI r1, 22 => rn

19 Reg + Lab1 Reg2 addI r2, l1 => rn

20 ...

Spring 2014 Jim Hogg - UW - CSE - P501 N-15

Memory Dereference

+

Reg1 Num2

11

Rules for Tree-to-Target Conversion (prefix notation)

+ Reg1 Num2

Page 16: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Example: Tiling a tiny 4-node tree

Spring 2014 Jim Hogg - UW - CSE - P501 N-16

+

Lab@G

Num12

Load variable <c,@G,12>

• The tree above shows code to access variable, c, stored at offset 12 bytes from label G

• How many ways can we tile this tree into equivalent ILOC code?

Page 17: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Potential Matches

N-17

+

Lab@G

Num12

11

6+

Lab@G

Num12

14

8+

Lab@G

Num12

10

86+

Lab@G

Num12

10

86

+

Lab@G

Num12

9

6

16

+

Lab@G

Num12

9

19

8+

Lab@G

Num12

9

15

86+

Lab@G

Num12

9

15

86

<6,11> <8,14> <6,8,10> <8,6,10>

<6,16,9> <8,19,9> <6,8,15,9> <8,6,15,9>

loadI @G => ri

loadAI ri 12 => rj

loadI l2 => ri

loadAI ri @G => rj

loadI @G => ri

loadI l2 => rj

loadAO ri rj => rk

loadI l2 => ri

loadI @G => rj

loadAO ri rj => rk

Page 18: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-18

Tree-Pattern Matching

A given tiling “implements” a tree if:

covers every node in the tree, and overlap between any two tiles (trees) is limited to a single

node

If <node,op> is in the tiling, then node is also covered by a leaf in another operation tree in the tiling – unless it is the root

Where two operation trees meet, they must be compatible (ie: expect the same value in the same location)

Page 19: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

IR AST for Simple Expression

Spring 2014 Jim Hogg - UW - CSE - P501 N-19

=

-

Num4

Valarp

+

Num-16

Valarp

+

Num2

Num12

Val@G

+

a = b - 2 c

a local var, offset 4 from arpb call-by-ref paramc var, offset 12 from label @G

= + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4

Prefix form - same info as IR tree:

Page 20: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

IR AST as Prefix Text String

Spring 2014 Jim Hogg - UW - CSE - P501 N-20

= + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4

No parentheses? - don't need them: evaluate this expression from right-to-left, using simple stack machine

Production ILOC Template5 ...6 Reg Lab1 load l1 => rn

7 Reg Val18 Reg Num1 loadl n1 => rn

9 Reg Reg1 load r1 => rn

10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn

11 Reg + Reg1 Num2 loadAI r1, n2 => rn

14 Reg + Lab1 Reg2 loadAI r2, l1 => rn

15 Reg + Reg1 Reg2 add r1, r2 => rn

16 Reg + Reg1 Num2 addI r1, 22 => rn

19 Reg + Lab1 Reg2 addI r2, l1 => rn

20 ...

Rewrite Rules - Another BNF, but including ambiguity!

Page 21: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Select Instructions with LR Parser

Spring 2014 Jim Hogg - UW - CSE - P501 N-21

= + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4

Production ILOC Template5 ...6 Reg Lab1 load l1 => rn

7 Reg Val18 Reg Num1 loadl n1 => rn

9 Reg Reg1 load r1 => rn

10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn

11 Reg + Reg1 Num2 loadAI r1, n2 => rn

14 Reg + Lab1 Reg2 loadAI r2, l1 => rn

15 Reg + Reg1 Reg2 add r1, r2 => rn

16 Reg + Reg1 Num2 addI r1, 22 => rn

19 Reg + Lab1 Reg2 addI r2, l1 => rn

20 ...

• Use LR parser for Grammar to parse prefix expression• Ambiguous, so lots of parse-action conflicts: resolve with tie-

breaker rules:• Lowest cost (need to augment Grammar productions with

cost)• Favor large reductions over smaller ("maximal munch")

• reduce-reduce conflict - choose longer reduction• shift-reduce conflict - choose shift

• => largest number of prefix ops translated into machine instruction

Page 22: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Originally devised as the last optimization pass of a compiler

Examine a small, sliding window of target code (a few adjacent instructions) and optimize

Spring 2014 Jim Hogg - UW - CSE - P501 N-22

storeAI r1 => rarp, 8loadAI rarp, 8 => r15

Memory[rarp + 8] = r1

r15 = Memory[rarp + 8]

Reminder

Original

storeAI r1 => rarp, 8i2i r1 => r15

Memory[rarp + 8] = r1

r15 = r1

Optimized

Cooper&Torczon "ILOC"

Peephole Optimization

Page 23: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Peeps, 2

Spring 2014 Jim Hogg - UW - CSE - P501 N-23

addI r2, 0 => r7

mult r4, r7 => r10

r7 = r2 + 0r10 = r4 * r7

Reminder

Original

mult r4, r2 => r10 r10 = r4 * r2Optimized

jumpI L10L10: jumpI L20

Original

jumpI L20L10: jumpI L20

Optimized

Page 24: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Modern Peephole Optimizer

Spring 2014 Jim Hogg - UW - CSE - P501 N-24

Expander Simplifier Matcher

IR LIRLIR LIR

• Modern ISAs are enormous• Linear search for a match no longer fast

enough• So . . .

• Like a miniature compiler• Use for both peeps and instruction-selection

Page 25: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Instruction Selection via Peeps

Spring 2014 Jim Hogg - UW - CSE - P501 N-25

t1 = 2 ca = b - t1

r10 = 2r11 = @Gr12 = 12r13 = r11 + r12

r14 = M[r13]r15 = r10 * r14

r16 = -16r17 = rarp + r16

r18 = M[r17]r19 = M[r18]r20 = r19 - r15

r21 = 4r22 = rarp + r21

M[r22] = r20

LIR

r10 = 2r11 = @Gr14 = M(r11+r12)r15 = r10 * r14

r18 = M(rarp-16)r19 = M(r18)r20 = r19 - r15

M(rarp+4) = r20

Simplified LIR

a 4(arp) Local Variableb -16(arp) Call-by-reference parameterc 12(@G) Variable

IR

loadI 2 => r10loadI @G => r11loadAI r11, 12 => r14mult r10, r14 => r15loadAI rarp, -16 => r18load r18 => r19sub r19, r15 => r20storeAI r20 => rarp,4

LIR

14 Instructions

8 Instructions

Page 26: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

The Simplifier

Spring 2014 Jim Hogg - UW - CSE - P501 N-26

r10 = 2r11 = @Gr12 = 12

r11 = @Gr12 = 12r13 = r11 + r12

r11 = @Gr13 = r11 + 12r14 = M[r13]

r11 = @Gr14 = M[r11 + 12]r15 = r10 * r14

r14 = M[r11+12]r15 = r10 * r14

r16 = -16

r15 = r10 * r14

r16 = -16r17 = rarp + r16

r15 = r10 * r14

r17 = rarp - 16r18 = M[r17]

r15 = r10 * r14

r18 = M[rarp-16]r19 = M[r18]

r18 = M[rarp-16]r19 = M[r18]r20 = r19 - r15

r19 = M[r18]r20 = r19 - r15

r21 = 4

r20 = r19 - r15

r21 = 4r22 = rarp + r21

r20 = r19 - r15

r22 = rarp + 4M[r22] = r20

r20 = r19 - r15

M[r22] = r20

Roll

Const Prop + DCE

This example is simplified - never re-uses a virtual register: so we can delete the instruction after a constant propagation. In general, we would have to generate liveness info for each virtual register to perform safe DCE

Page 27: Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers

Spring 2014 Jim Hogg - UW - CSE - P501 N-27

Instruction Scheduling

Register Allocation

And more….

Next