designing a pipelined cpu
DESCRIPTION
Read 4.7 for next class. Designing a Pipelined CPU. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Review -- Single Cycle CPU. Review -- Multiple Cycle CPU. Ifetch. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/1.jpg)
Designing a Pipelined CPU
Read 4.7 for next class
Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
![Page 2: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/2.jpg)
Review -- Single Cycle CPU
![Page 3: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/3.jpg)
Review -- Multiple Cycle CPU
![Page 4: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/4.jpg)
Review -- Instruction Latencies
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
•Single-Cycle CPU
•Multiple Cycle CPU
Ifetch Reg/Dec Exec WrAdd
![Page 5: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/5.jpg)
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
![Page 6: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/6.jpg)
Which of the statements below is true about a pipelined processor?
Selection Statement
A Instruction latency remains essentially unchanged from single-cycle (minus some overheads); Instruction throughput increases
B Instruction latency remains essentially unchanged from multi-cycle (minus some overheads); Instruction throughput increases
C Instruction latency improves by a factor of 5 over single-cycle (minus some overheads); Instruction throughput increases
D Instruction latency improves by a factor of 5 over multi-cycle (minus some overheads); Instruction throughput increases
E None of the above
Instruction Latencies and Throughput
![Page 7: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/7.jpg)
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
![Page 8: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/8.jpg)
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
![Page 9: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/9.jpg)
Instruction Latencies and Throughput•Single-Cycle CPU
•Multiple Cycle CPU
•Pipelined CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem WrLoad
Ifetch Reg/Dec Exec Mem WrLoad
![Page 10: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/10.jpg)
Pipelining Advantages
• Higher maximum throughput
• Higher utilization of CPU resources
• But, more complicated datapath, more complex control(?)
PI throughputVs. latency
![Page 11: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/11.jpg)
Pipelining ThroughputPeek Throughput
1. Longest instruction2. Average instruction3. Cycle time
PI Matching
Selection SC MC Pipeline
A 1 2 3B 1 2 1C 1 3 2D 3 1 2E None of the above
![Page 12: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/12.jpg)
A Pipelined Datapath
IF: Instruction fetch
ID: Instruction decode and register fetch
EX: Execution and effective address calculation
MEM: Memory access
WB: Write back
![Page 13: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/13.jpg)
Idea for a Pipelined Datapath
![Page 14: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/14.jpg)
Execution in a Pipelined Datapath
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
lw
lw
lw
lw
lw
IF ID EX MEM WB
IF ID EX MEM WB
![Page 15: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/15.jpg)
Execution in a Pipelined Datapath
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
lw
lw
lw
lw
lw
steadystate
steadystate
IF ID EX MEM WB
IF ID EX MEM WB
Make sure to pointOut Steady StateCPI = 1
![Page 16: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/16.jpg)
Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?
Selection Yes/No Reason (Choose BEST answer)
A Yes Decreasing R-type to 4 cycles improves instruction throughput
B Yes Decreasing R-type to 4 cycles improves instruction latency
C No Decreasing R-type to 4 cycles causes hazards
D No Decreasing R-type to 4 cycles causes hazards and doesn’t impact throughput
E No Decreasing R-type to 4 cycles causes hazards and doesn’t impact latency
Pipeline Stages
![Page 17: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/17.jpg)
Mixed Instructions in the Pipeline
IM Reg
AL
U Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6
lw
add
![Page 18: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/18.jpg)
Pipeline Principles
• All instructions that share a pipeline must have the same stages in the same order.– therefore, add does nothing during Mem stage
– sw does nothing during WB stage
• All intermediate values must be latched each cycle.
• There is no functional block reuse
IM Reg A
LU DM Reg
IF ID EX MEM WB
![Page 19: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/19.jpg)
Pipelined DatapathInstruction Fetch Instruction Decode/
Register FetchExecute/
Address CalculationMemory Access Write Back
registers!registers!
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 20: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/20.jpg)
The Pipeline in Executionadd $10, $1, $2 Instruction Decode/
Register FetchExecute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Draw active datapath throughexample
![Page 21: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/21.jpg)
The Pipeline in Executionlw $12, 1000($4) add $10, $1, $2 Execute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 22: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/22.jpg)
The Pipeline in Executionsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 23: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/23.jpg)
The Pipeline in ExecutionInstruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 24: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/24.jpg)
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 25: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/25.jpg)
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register Fetchsub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Did someone spot the error??Write Register is wrong
![Page 26: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/26.jpg)
The Pipeline in ExecutionInstruction Fetch Instruction Decode/
Register FetchExecute/
Address Calculationsub $15, $4, $1 lw $12, 1000($4)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 27: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/27.jpg)
The Pipeline, with controls But….
![Page 28: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/28.jpg)
Pipeline Control
Selection SC MC Pipeline
A X Y YB Y X XC X Y XD Y X YE None of the above
Control Logic
X. Combinational LogicY. FSM or Microprogram
![Page 29: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/29.jpg)
Pipelined Control
IF/I
D
ID/E
X
EX
/ME
M
ME
M/W
B
controlinstruction
![Page 30: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/30.jpg)
Pipelined Control Signals
Execution Stage Control Lines Memory Stage Control Lines Write Back Stage ControlLines
Instruction RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoRegR-Format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw x 0 0 1 0 0 1 0 xbeq x 0 1 0 1 0 0 0 x
![Page 31: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/31.jpg)
The Pipeline with Control Logic
![Page 32: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/32.jpg)
Is it really this simple?
![Page 33: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/33.jpg)
IM Reg
AL
U DM Reg
IM Reg
AL
U DM
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
What just happened here which is problematic (BEST ANSWER)?A. The register file is trying to read and write the same registerB. The ALU and data memory are both active in the same cycleC. A value is used before it is producedD. Both A and BE. Both A and C Neg edge!
![Page 34: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/34.jpg)
Data Hazards• When a result is needed in the pipeline before it is
available, a “data hazard” occurs.
IM Reg
AL
U DM Reg
IM Reg
AL
U DM
IM Reg
AL
U DM Reg
IM Reg A
LU DM Reg
IM Reg
AL
U DM Reg
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
R2 AvailableR2 Available
R2 NeededR2 Needed
Use red and black lines!
![Page 35: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/35.jpg)
Hazards continued
• What happens when...add $3, $10, $11
lw $8, 1000($3)
sub $11, $8, $7
Draw dependencies
![Page 36: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/36.jpg)
The Pipeline in Executionlw $8, 1000($3) add $3, $10, $11 Execute/
Address CalculationMemory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 37: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/37.jpg)
The Pipeline in Executionsub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 38: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/38.jpg)
The Pipeline in Executionadd $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Write Back
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
![Page 39: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/39.jpg)
Pipelining Key Points
• ET = IC * CPI * CT
• We achieve high throughput without reducing instruction latency.
• Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles).
• Pipelining uses combinational logic to generate (and registers to propagate) control signals.
• Pipelining creates potential hazards.
![Page 40: Designing a Pipelined CPU](https://reader035.vdocuments.net/reader035/viewer/2022062421/56812af9550346895d8edfc2/html5/thumbnails/40.jpg)
Summary comparison (MIPS designs)
CPI CT
Single Cycle
Multi-cycle
Pipeline