ece232: hardware organization and design · ece232: hardware organization and design lecture 13:...
TRANSCRIPT
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
ECE232: Hardware Organization and Design
Lecture 13: Pipelining
ECE232: Pipelining 2
Overview
§ Single-cycle MIPs datapath presented so far
§ Not overly efficient. Components of the datapath can be used more efficiently
§ Idea! • Put registers between stages of the datapath • Clock used to update register values • All stages perform an operation on every clock cycle
§ Pipelined datapath: the basis for almost all modern microprocessors!
ECE232: Pipelining 3
Speeding up through pipelining
§ Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes
• Dryer takes 30 minutes • “Folder” takes 30 minutes
• “Stasher” takes 30 minutes to put clothes into drawers
A B C D
ECE232: Pipelining 4
Sequential Laundry
§ Sequential laundry takes 8 hours for 4 loads § If they learned pipelining, how long would laundry take?
30 T a s k O r d e r
B
C D
A Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
6 PM 7 8 9 10 11 12 1 2 AM
ECE232: Pipelining 5
Pipelined Laundry: Start work ASAP
§ Pipelined laundry takes 3.5 hours for 4 loads!
T a s k O r d e r
12 2 AM 6 PM 7 8 9 10 11 1
Time
B C D
A 30 30 30 30 30 30 30
ECE232: Pipelining 6
Pipelining Lessons
§ Pipelining doesn’t help latency of single task, it helps throughput of entire workload
§ Multiple tasks operating simultaneously using different resources
§ Potential speedup = Number pipe stages
§ Pipeline rate limited by slowest pipeline stage
§ Unbalanced lengths of pipe stages reduces speedup
§ Time to “fill” pipeline and time to “drain” it reduces speedup
6 PM 7 8 9 Time
B C D
A 30 30 30 30 30 30 30
T a s k O r d e r
ECE232: Pipelining 7
MIPs Datapath § Datapath contains 5 stages § Instruction fetch (IF), Decode (ID), Execute (EX), Memory (Mem
), Writeback (W)
Stage 5 (W)!
PC Registers A!L!U!
Stage 1 (IF)! Stage 2 (ID)! Stage 3 (EX)!
Data!Memory!
Stage 4 (Mem)!
!Instruction!
Memory
§ Can I pipeline the MIPs stages?
ECE232: Pipelining 8
Pipelining Instructions
Time (in cycles)
Inst
ruct
ion
IF ID EX M W
IF ID EX M W
IF ID EX M W
IF ID EX M W
IF ID EX M
IF ID EX
Fetch = 200 ps Decode = 100 ps Execute = 200 ps Memory = 200 ps Write back = 100 ps
W
M W
What is the latency for this pipeline?
ECE232: Pipelining 9
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
ECE232: Pipelining 10
Why Pipeline? Because the resources are there!
I n s t r. O r d e r
Time (clock cycles)
Inst 1
Inst 2
Inst 3
Inst 5
Inst 4
AL
U
Im Reg Dm Reg
AL
U
Im Reg Dm Reg
AL
U
Im Reg Dm Reg A
LU
Im Reg Dm Reg
AL
U
Im Reg Dm Reg
ECE232: Pipelining 11
MIPS Pipelined Datapath § State registers between pipeline stages to isolate them
Read Address
Instruction Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
16 32
ALU
Shift left 2
Add
Data Memory
Address
Write Data
Read Data IF
etch
/Dec
Dec
/Exe
c
Exec
/Mem
Mem
/WB
IF:IFetch ID:Dec EX:Execute MEM: MemAccess
WB: WriteBack
System Clock
Sign Extend
Inst 1 Inst 2 Inst 3 Inst 4 Inst 5
ECE232: Pipelining 12
Pipeline Hazards § Data hazards: an instruction uses the result of a previous
instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2)
§ Control hazards: the address of the next instruction to be executed depends on a previous instruction
BEQ R1,R2,CONT SUB R6,R7,R8 …
CONT: ADD R3,R4,R5
§ Structural hazards: two instructions need access to the same resource
• e.g., single memory shared for instruction fetch and load/store
ECE232: Pipelining 13
I n s t r. O r d e r
Time (clock cycles)
lw
Inst 1
Inst 2
Inst 4
Inst 3
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg A
LU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
Structural Hazard
Reading data from memory
Reading instruction from memory
§ Fix with separate instruction and data memories (I$ and D$)
ECE232: Pipelining 14
Data Hazards
Time (in cycles)
F
Inst
ruct
ion
D EX M W
F D EX M W
Write Data to R1 Here
Get data from R1 Here ADD R1, R2, R3 SUB R4, R1, R5
ECE232: Pipelining 16
Additional Way to “Fix” a Data Hazard
I n s t r. O r d e r
add $1,…
ALU
IM Reg DM Reg
sub $4,$1,$5
and $6,$1,$7 A
LU
IM Reg DM Reg
ALU
IM Reg DM Reg
by forwarding
xor $4,$1,$5
or $8,$1,$9
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
Time
ECE232: Pipelining 17
Internal data forwarding
I n s t r. O r d e r
add $1,…
ALU
IM Reg DM Reg
sub $4,$1,$5
and $6,$1,$7 A
LU
IM Reg DM Reg
ALU
IM Reg DM Reg
Fix data hazards by forwarding
results to where they are needed
xor $4,$1,$5
or $8,$1,$9
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
ALU-to-ALU forwarding vs. full forwarding
Time
ECE232: Pipelining 18
Forwarding with Load-use Data Hazards
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
ALU
IM Reg DM Reg
§ sub needs to stall § Will still need one stall cycle even with forwarding
I n s t r. O r d e r
lw $1,4($2)
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9
Time
ECE232: Pipelining 19
Control Hazard
Time (in cycles)
F
Inst
ruct
ion
D EX M W
F D EX M W
Destination Available Here
Need Destination Here JR R25 ...
XX: ADD ...
Simple solution: Flush Instruction fetch until branch resolved
ECE232: Pipelining 20
Summary
§ Pipelined processors are fundamental. • Spend the time to understand why pipelining is important
§ The use of pipelining greatly improves microprocessor performance • The “clock” for microprocessors is about 3 GHz today
§ Hazards can be a difficult concept • Convince yourself with examples • Next time: Control hazards!