cs 6354: pipelining / isascr4bd/6354/f2016/slides/lec05-slides-1up.pdf · ng if. in comparison,...

54
CS 6354: Pipelining / ISAs 7 September 2016 1

Upload: others

Post on 20-Oct-2019

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

CS 6354: Pipelining / ISAs

7 September 2016

1

Page 2: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Review: Memory Hierarchy

2

Page 3: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Review: Page Tables

CR3

3239404748555663 08162431 15 723

......

4K m

emor

y pa

ge

Linear address:

64 bit PDentry

......

page directory

......

PDPentry

page-directory-pointer table

64 bit PTentry

......

page table

......

PML4entry

PML4 table99

40*

9 9 12

sign extended

*) 40 bits aligned to a 4-KByte boundary

3

Page 4: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Review: Memory HierarchyOptimizations

adjust # caches, sizes, associativity, block size, …

adjust when virtual to physical translation happens

add victim caches, prefetching, etc.

cache blocking — reorder code for more reuse

overlap memory accesses and

4

Page 5: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Human pipeline: laundry

Washer

Dryer

FoldingTable

Washer

Dryer

FoldingTable

11:00 12:00 13:00 14:00

11:00 12:00 13:00 14:00

whites

whites

whites

colors

colors

colors

whites

whites

whites

colors

colors

colors

sheets

sheets

sheets

5

Page 6: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Human pipeline: laundry

Washer

Dryer

FoldingTable

Washer

Dryer

FoldingTable

11:00 12:00 13:00 14:00

11:00 12:00 13:00 14:00

whites

whites

whites

colors

colors

colors

whites

whites

whites

colors

colors

colors

sheets

sheets

sheets5

Page 7: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

The MIPS pipeline

Copyright © 2011, Elsevier Inc. All rights Reserved. 17

Figure C.28 The stall from branch hazards can be reduced by moving the zero test and branch-target calculation into the ID phase of the pipeline. Notice that we have made two important changes, each of which removes 1 cycle from the 3-cycle stall for branches. The first change is to move both the branch-target address calculation and the branch condition decision to the ID cycle. The second change is to write the PC of the instruction in the IF phase, using either the branch-target address computed during ID or the incremented PC computed during IF. In comparison, Figure C.22 obtained the branch-target address from the EX/MEM register and wrote the result during the MEM clock cycle. As mentioned in Figure C.22, the PC can be thought of as a pipeline register (e.g., as part of ID/IF), which is written with the address of the next instruction at the end of each IF cycle.

Figure: H&P Appendix C 6

Page 8: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS instruction execution (1)

add $1, $2, $3 ; reg[1] <− reg[2] + reg[3]

Instruction Fetch: read from instruction cache

IF/ID stores: instr., PC

Instruction Decode: read registers 2 and 3

ID/EX stores: reg[2], reg[3], instr., PC

Execute: compute reg[2] + reg[3]

EX/MEM stores: reg[2] + reg[3], instr., PC

Memory: do nothing

MEM/WB stores: reg[2] + reg[3], instr., PC

Write Back: write computed value into reg[1]7

Page 9: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS instruction execution (2)

sw r1, 100(r3) ; memory[100 + reg[3]] = reg[1]

Instruction Fetch: read from instruction cache

IF/ID stores: instr., PC

Instruction Decode: read registers 1 and 3

ID/EX stores: reg[1], reg[3], instr., PC

Execute: compute 100 + reg[3]

EX/MEM stores: 100 + reg[3], reg[1], instr., PC

Memory: store reg[1] into data @ 100 + reg[3]

MEM/WB stores: instr., PC

Write Back: do nothing8

Page 10: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

The MIPS pipeline

Copyright © 2011, Elsevier Inc. All rights Reserved. 17

Figure C.28 The stall from branch hazards can be reduced by moving the zero test and branch-target calculation into the ID phase of the pipeline. Notice that we have made two important changes, each of which removes 1 cycle from the 3-cycle stall for branches. The first change is to move both the branch-target address calculation and the branch condition decision to the ID cycle. The second change is to write the PC of the instruction in the IF phase, using either the branch-target address computed during ID or the incremented PC computed during IF. In comparison, Figure C.22 obtained the branch-target address from the EX/MEM register and wrote the result during the MEM clock cycle. As mentioned in Figure C.22, the PC can be thought of as a pipeline register (e.g., as part of ID/IF), which is written with the address of the next instruction at the end of each IF cycle.

Figure: H&P Appendix C 9

Page 11: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS instruction execution (1)

add $1, $2, $3 ; reg[1] <− reg[2] + reg[3]

Instruction Fetch: read from instruction cacheIF/ID stores: instr., PC

Instruction Decode: read registers 2 and 3ID/EX stores: reg[2], reg[3], instr., PC

Execute: compute reg[2] + reg[3]EX/MEM stores: reg[2] + reg[3], instr., PC

Memory: do nothingMEM/WB stores: reg[2] + reg[3], instr., PC

Write Back: write computed value into reg[1]10

Page 12: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS instruction execution (2)

sw r1, 100(r3) ; memory[100 + reg[3]] = reg[1]

Instruction Fetch: read from instruction cacheIF/ID stores: instr., PC

Instruction Decode: read registers 1 and 3ID/EX stores: reg[1], reg[3], instr., PC

Execute: compute 100 + reg[3]EX/MEM stores: 100 + reg[3], reg[1], instr., PC

Memory: store reg[1] into data @ 100 + reg[3]MEM/WB stores: instr., PC

Write Back: do nothing11

Page 13: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS executing

Copyright © 2011, Elsevier Inc. All rights Reserved. 3

Figure C.3 A pipeline showing the pipeline registers between successive pipeline stages. Notice that the registers prevent interference between two different instructions in adjacent stages in the pipeline. The registers also play the critical role of carrying data for a given instruction from one stage to the other. The edge-triggered property of registers—that is, that the values change instantaneously on a clock edge—is critical. Otherwise, the data from one instruction could interfere with the execution of another!

12

Page 14: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Pipeline Hazards

hazards stop pipeline from executing at full rate

structural hazards — not enough hardware

data hazards — value not computed soon enough

control hazards — instruction to execute not knownsoon enough

13

Page 15: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Functional Hazards

Copyright © 2011, Elsevier Inc. All rights Reserved. 4

Figure C.4 A processor with only one memory port will generate a conflict whenever a memory reference occurs. In this example the load instruction uses the memory for a data access at the same time instruction 3 wants to fetch an instruction from memory.

Figure: H&P Appendix C 14

Page 16: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-after-Write

add r1, r2, r3 ; r1 <− r2 + r3sub r4, r1, r5 ; r5 <− r1 − r5

add r1, r2, r3 sub r4, r1, r51 IF2 ID: read r2, r3 IF3 EX : temp1 ← r2 + r3 ID: read r1, r54 MEM EX : temp2 ← r1 - r55 WB: r1 ← temp MEM6 WB: r4 ← temp2

15

Page 17: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-after-Write

add r1, r2, r3 ; r1 <− r2 + r3sub r4, r1, r5 ; r5 <− r1 − r5

add r1, r2, r3 sub r4, r1, r51 IF2 ID: read r2, r3 IF3 EX : temp1 ← r2 + r3 ID: read r1, r54 MEM EX : temp2 ← r1 - r55 WB: r1 ← temp MEM6 WB: r4 ← temp2

15

Page 18: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-after-Write — Stall

add r1, r2, r3 ; r1 <− r2 + r3sub r4, r1, r5 ; r5 <− r1 − r5

add r1, r2, r3 sub r4, r1, r51 IF2 ID: read r2, r3 IF3 EX : temp1 ← r2 + r3 stall4 MEM stall5 WB: r1 ← temp1 stall6 ID: read r1, r57 EX : temp2 ← r1 + r58 MEM9 WB: r4 ← temp2

16

Page 19: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-after-Write — Stall

add r1, r2, r3 ; r1 <− r2 + r3sub r4, r1, r5 ; r5 <− r1 − r5

add r1, r2, r3 sub r4, r1, r51 IF2 ID: read r2, r3 IF3 EX : temp1 ← r2 + r3 stall4 MEM stall5 WB: r1 ← temp1 stall6 ID: read r1, r57 EX : temp2 ← r1 + r58 MEM9 WB: r4 ← temp2

16

Page 20: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Implementing Stalls

disable writing pipeline registersneed logic to detect conflicts

function of pipeline registers (instruction values)

17

Page 21: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-After-Write

Copyright © 2011, Elsevier Inc. All rights Reserved. 5

Figure C.6 The use of the result of the DADD instruction in the next three instructions causes a hazard, since the register is not written until after those instructions read it.

Figure: H&P Appendix C 18

Page 22: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Read-after-Write — Forward

add r1, r2, r3 ; r1 <− r2 + r3sub r4, r1, r5 ; r5 <− r1 − r5

add r1, r2, r3 sub r4, r1, r51 IF2 ID: read r2, r3 IF3 EX : temp1 ← r2 + r3 ID: read r1, r54 MEM EX : temp2 ← temp1 - r55 WB: r1 ← temp MEM6 WB: r4 ← temp2

19

Page 23: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Forwarding

Copyright © 2011, Elsevier Inc. All rights Reserved. 6

Figure C.7 A set of instructions that depends on the DADD result uses forwarding paths to avoid the data hazard. The inputs for the DSUB and AND instructions forward from the pipeline registers to the first ALU input. The OR receives its result by forwarding through the register file, which is easily accomplished by reading the registers in the second half of the cycle and writing in the first half, as the dashed lines on the registers indicate. Notice that the forwarded result can go to either ALU input; in fact, both ALU inputs could use forwarded inputs from either the same pipeline register or from different pipeline registers. This would occur, for example, if the AND instruction was AND R6,R1,R4.

Figure: H&P Appendix C 20

Page 24: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Implementing Forwarding

multiplexers for operand valuesneed logic to detect which one to use

function of pipeline registers (instruction values)

21

Page 25: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Implementing Forwarding

Copyright © 2011, Elsevier Inc. All rights Reserved. 16

Figure C.27 Forwarding of results to the ALU requires the addition of three extra inputs on each ALU multiplexer and the addition of three paths to the new inputs. The paths correspond to a bypass of: (1) the ALU output at the end of the EX, (2) the ALU output at the end of the MEM stage, and (3) the memory output at the end of the MEM stage.

Figure: H&P Appendix C 22

Page 26: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Limits of Forwarding

Copyright © 2011, Elsevier Inc. All rights Reserved. 8

Figure C.9 The load instruction can bypass its results to the AND and OR instructions, but not to the DSUB, since that would mean forwarding the result in “negative time.”

Figure: H&P Appendix C 23

Page 27: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Scheduling for Pipelineslw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]add r3, r1, r2 ; r3 <− r1 + r2lw r4, 8(r20) ; r4 <− MEM[8+r20]add r4, r4, r3 ; r4 <− r4 + r3sw r4, 8(r20) ; MEM[8+r20] <− r4lw r5, 12(r20) ; r5 <− MEM[12+r20]mul r5, r5, r4 ; r5 <− r5 * r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

converts intolw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]lw r4, 8(r20) ; r4 <− MEM[8+r20]lw r5, 12(r20) ; r5 <− MEM[12+r20]add r3, r1, r2 ; r3 <− r1 + r2add r4, r4, r3 ; r4 <− r4 + r3mul r5, r5, r4 ; r5 <− r5 * r4sw r4, 8(r20) ; MEM[8+r20] <− r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

24

Page 28: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Scheduling for Pipelineslw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]add r3, r1, r2 ; r3 <− r1 + r2lw r4, 8(r20) ; r4 <− MEM[8+r20]add r4, r4, r3 ; r4 <− r4 + r3sw r4, 8(r20) ; MEM[8+r20] <− r4lw r5, 12(r20) ; r5 <− MEM[12+r20]mul r5, r5, r4 ; r5 <− r5 * r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

converts intolw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]lw r4, 8(r20) ; r4 <− MEM[8+r20]lw r5, 12(r20) ; r5 <− MEM[12+r20]add r3, r1, r2 ; r3 <− r1 + r2add r4, r4, r3 ; r4 <− r4 + r3mul r5, r5, r4 ; r5 <− r5 * r4sw r4, 8(r20) ; MEM[8+r20] <− r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

24

Page 29: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Scheduling for Pipelineslw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]add r3, r1, r2 ; r3 <− r1 + r2lw r4, 8(r20) ; r4 <− MEM[8+r20]add r4, r4, r3 ; r4 <− r4 + r3sw r4, 8(r20) ; MEM[8+r20] <− r4lw r5, 12(r20) ; r5 <− MEM[12+r20]mul r5, r5, r4 ; r5 <− r5 * r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

converts intolw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]lw r4, 8(r20) ; r4 <− MEM[8+r20]lw r5, 12(r20) ; r5 <− MEM[12+r20]add r3, r1, r2 ; r3 <− r1 + r2add r4, r4, r3 ; r4 <− r4 + r3mul r5, r5, r4 ; r5 <− r5 * r4sw r4, 8(r20) ; MEM[8+r20] <− r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

24

Page 30: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Scheduling for Pipelineslw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]add r3, r1, r2 ; r3 <− r1 + r2lw r4, 8(r20) ; r4 <− MEM[8+r20]add r4, r4, r3 ; r4 <− r4 + r3sw r4, 8(r20) ; MEM[8+r20] <− r4lw r5, 12(r20) ; r5 <− MEM[12+r20]mul r5, r5, r4 ; r5 <− r5 * r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

converts intolw r1, 0(r20) ; r1 <− MEM[0+r20]lw r2, 4(r20) ; r2 <− MEM[4+r20]lw r4, 8(r20) ; r4 <− MEM[8+r20]lw r5, 12(r20) ; r5 <− MEM[12+r20]add r3, r1, r2 ; r3 <− r1 + r2add r4, r4, r3 ; r4 <− r4 + r3mul r5, r5, r4 ; r5 <− r5 * r4sw r4, 8(r20) ; MEM[8+r20] <− r4sw r5, 12(r20) ; r5 <− MEM[12+r20]

24

Page 31: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Next time: Scheduling

Weiss and Smith, “A study of scalar compilationtechniques for pipelined supercomputers”

theme: seperate dependencies from usefocus on loops

25

Page 32: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Control Hazard

need to decode instruction to know next instruction

Copyright © 2011, Elsevier Inc. All rights Reserved. 3

Figure C.3 A pipeline showing the pipeline registers between successive pipeline stages. Notice that the registers prevent interference between two different instructions in adjacent stages in the pipeline. The registers also play the critical role of carrying data for a given instruction from one stage to the other. The edge-triggered property of registers—that is, that the values change instantaneously on a clock edge—is critical. Otherwise, the data from one instruction could interfere with the execution of another!

next instruction known

next instruction needed

26

Page 33: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

MIPS Delay Slots

avoid control hazard by delaying branchadd $3, $4, $5 ; (1)beq $1, $2, label ; (2)add $5, $6, $7 ; (3) DELAY SLOTadd $6, $7, $8add $8, $9, $10

label:add $7, $8, $9 ; (4)

27

Page 34: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Branch Prediction

branch prediction — guess whether branch is taken

start guess immediately

clear pipeline registers if wrong

28

Page 35: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Speculation

when is it okay to guess

if we can undo guess if wrong

MIPS pipeline:IF — doesn’t change stateID — doesn’t change stateEX — doesn’t change stateMEM — changes memory!WB — changes registers!

undo: clear pipeline registers before MEM, set newPC

29

Page 36: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Speculation

when is it okay to guess

if we can undo guess if wrongMIPS pipeline:

IF — doesn’t change stateID — doesn’t change stateEX — doesn’t change stateMEM — changes memory!WB — changes registers!

undo: clear pipeline registers before MEM, set newPC

29

Page 37: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Static branch prediction

forwards not taken (fetch normally)

backwards taken (fetch target)

30

Page 38: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Dynamic branch predictionPC

NNNNTTNT

low-order bits

prediction

actualresult

lookup branch address in table

1-bit: Taken/Not taken

taken before ⇒ taken again

31

Page 39: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Dynamic branch predictionPC

NNNN

TNTNT

low-order bits

prediction

actualresult

lookup branch address in table

1-bit: Taken/Not taken

taken before ⇒ taken again

31

Page 40: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Dynamic branch prediction

refinement: 2 bits

Copyright © 2011, Elsevier Inc. All rights Reserved. 11

Figure C.18 The states in a 2-bit prediction scheme. By using 2 bits rather than 1, a branch that strongly favors taken or not taken—as many branches do—will be mispredicted less often than with a 1-bit predictor. The 2 bits are used to encode the four states in the system. The 2-bit scheme is actually a specialization of a more general scheme that has an n-bit saturating counter for each entry in the prediction buffer. With an n-bit counter, the counter can take on values between 0 and 2n – 1: When the counter is greater than or equal to one-half of its maximum value (2n – 1), the branch is predicted as taken; otherwise, it is predicted as untaken. Studies of n-bit predictors have shown that the 2-bit predictors do almost as well, thus most systems rely on 2-bit branch predictors rather than the more general n-bit predictors.

Figure: H&P Appendix C 32

Page 41: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Deeper Pipelines (1)

Copyright © 2011, Elsevier Inc. All rights Reserved. 19

Figure C.35 A pipeline that supports multiple outstanding FP operations. The FP multiplier and adder are fully pipelined and have a depth of seven and four stages, respectively. The FP divider is not pipelined, but requires 24 clock cycles to complete. The latency in instructions between the issue of an FP operation and the use of the result of that operation without incurring a RAW stall is determined by the number of cycles spent in the execution stages. For example, the fourth instruction after an FP add can use the result of the FP add. For integer ALU operations, the depth of the execution pipeline is always one and the next instruction can use the results.

Figure: H&P Appendix C 33

Page 42: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Deeper Pipelines (2)

Copyright © 2011, Elsevier Inc. All rights Reserved. 24

Figure C.44 The basic branch delay is 3 cycles, since the condition evaluation is performed during EX.Figure: H&P Appendix C 34

Page 43: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Microcoded pipelined CPU

35

Page 44: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Less registers? (1)

36

Page 45: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Less registers? + Seperate I-Cache?

37

Page 46: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

RISC factors

38

Page 47: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Factors favoring MIPS

operand specifier decoding — 1 cycle per on VAX

seperate floating point registers — seperate FPU

condition code RAW hazards

needless work by, e.g., CISC CALL/RET

filled delay slots

larger page size

larger range for brganches

39

Page 48: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Addressing modes on VAX

40

Page 49: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Addressing modes on VAX

ADDL3 @(R5)+[R6], @(R1)+[R2], @(R3)+[R4]

one instructionsix memory accesses, four register readsthree register writesMEM[MEM[R5]+R6] ← MEM[MEM[R1]+R2]

+ MEM[MEM[R3]+R4]R1 ← R1 + 4R3 ← R3 + 4R5 ← R5 + 4

41

Page 50: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

ISA design

lots of non-technical factors

42

Page 51: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Notable RISC V decisions

modular ISA design

optional variable length encoding (code size)

43

Page 52: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Justifications (1)

31 general-purpose registers + 0 register + pcusually 32-bit instructions

“it is impossible to encode a complete ISA with 16registers in 16-bit instructions using a 3-addressformat. Although a 2-address format would bepossible, it would increase instruction count and lowerefficiency. … A larger number of integer registers alsohelps performance on high-performance code,…”“The optional compressed 16-bit instruction formatmostly only accesses 8 registers”

44

Page 53: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Justifications (2)

“Decoding register specifiers is usualy on the critical path … sothe instruction format was chosen to keep all registers specifiersat the same position…”

45

Page 54: CS 6354: Pipelining / ISAscr4bd/6354/F2016/slides/lec05-slides-1up.pdf · ng IF. In comparison, Figure C.22 obtained the branch-t EX/MEM register lFigure hought of(e.g., ID/IF), which

Justifications (3)

no delay slotsno condition codes

“condition codes and branch delay slots, whichcomplicate higher performance implementations”

46