ece 341 lecture # 12

22
ECE 341 Lecture # 12 Instructor: Zeshan Chishti [email protected] November 10, 2014 Portland State University

Upload: others

Post on 30-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 341 Lecture # 12

ECE 341

Lecture # 12

Instructor: Zeshan Chishti

[email protected]

November 10, 2014

Portland State University

Page 2: ECE 341 Lecture # 12

Lecture Topics

• Basic Processing Unit

– Control Signals

– Hardwired Control

• Datapath control signals

• Dealing with memory delay

• Pipelining

– Basic Concept

– Pipeline Organization

• References: – Chapter 5: Section 5.5 and 5.6 (Pages 172-178 of textbook)

– Chapter 6: Sections 6.1 and 6.2 (Pages 194-196 of textbook)

Page 3: ECE 341 Lecture # 12

Control Signals

• Control signals govern the operation of a processor’s components

• Control circuitry examines the instruction in IR and generates the control signals needed to execute the instruction

• Examples of decisions made by control signals:

– Which registers (if any) are enabled for writing?

– Which input is selected by a multiplexer?

– What operation is performed by the ALU?

• Some control signals depend only on instruction type, while others depend on both the instruction type and current processing step

Page 4: ECE 341 Lecture # 12

Control Signals for Datapath

Inter-stage registers transfer data from one stage to the next in every cycle => no need for a control signal, since these registers are always enabled

Need control signals for register file, multiplexers and ALU

C_select

LINK

IR21-17 IR26-22

RF_write

IR26-22

IR31-27

B_select

Y_select

ALU_op

Register File

A B

C Address A

Address B Address C

Imm. Value

RA RB

ALU InA InB

Out

MuxB 0 1

RZ RM

RY

MuxY 0 2 1

Return Address

Memory Data

Memory Address

5

5

MuxC 0 2 1

5

2

k

.

.

.

Condition signals

2

Page 5: ECE 341 Lecture # 12

Control Signals for Memory Interface and IR

IR_enable

Extend

MEM_read

MEM_write

MA_select

MFC

Immediate 2

MuxB and MuxINC

IR

MuxMA 0 1

RZ PC

Processor Memory Interface

To cache and main memory

Address

RM

Data

Page 6: ECE 341 Lecture # 12

Control Signals for Instruction Address Generator

PC_enable

PC_select

INC_select

MuxPC 0 1

RA

PC-Temp

Adder

MuxY (Return Address)

PC MuxINC 0 1

Immediate Value (Branch Offset) 4

Page 7: ECE 341 Lecture # 12

Hardwired Control

• How does the processor ensure that the control signals needed to execute an instruction are generated in the correct sequence and at the right time?

• Two basic approaches: (1) Hardwired control, (ii) Microprogrammed control

• The control signals depend on the current processing step for an instruction

• Question: How do we keep track of the current step of an instruction?

• Answer: use a step counter

• Question: What other factors do the control signals depend on?

• Answer: – Contents of instruction register

– The result of a computation (or comparison) operation

– External input signals, such as interrupt requests

Page 8: ECE 341 Lecture # 12

Generation of Control Signals

Decoder sets one of the INSi outputs, based on

instruction type

One of the outputs is set, based on current

processing step

Page 9: ECE 341 Lecture # 12

Generation of Control Signals (cont.)

• Example: Consider the fetch stage (stage 1) of the five-stage hardware

• Step counter asserts the signal T1

• Control circuitry:

– sets MA_select signal to 1 to select PC contents as memory address

– activates Mem_Read to initiate a memory read operation

– activates IR_enable to load the data returned from memory into IR, when MFC is asserted

– sets Inc_Select to 0, PC_select to 1 and asserts PC_enable to increment PC by 4 at the end of step T1

Page 10: ECE 341 Lecture # 12

Datapath Control Signals

• Setting of control signals can be determined by examining the actions taken in each execution step of every instruction

• Example 1: RF_write signal is set to 1 in step T5 during an instruction that writes data into the register file:

RF_write = T5.(ALU + Load + Call)

where ALU, Load and Call stand for arithmetic/logic instructions, load instructions and subroutine call instructions respectively

– RF_write is a function of both the timing and instruction signals

• Example 2: The multiplexer B_select is a function of only the instruction and does not need to change from one timing step to the other

B_select = Immediate

where Immediate stands for all instructions that use an immediate operand

Page 11: ECE 341 Lecture # 12

Dealing with Memory Delay

• The step counter is usually incremented at the end of every clock cycle

• However, a step in which a Mem_Read or Mem_Write is issued does not end until the MFC signal is asserted

• Step counter should not be incremented until the MFC signal is asserted

• Counter_enable signal controls whether the step counter is incremented

• Let WMFC be a control signal that represents the need to wait for memory – WMFC is activated only in those steps in which the Wait_for_MFC command is issued

Counter_enable = NOT(WMFC) + MFC

• We must also ensure that PC is incremented only once when the instruction fetch step is extended for more than one clock cycle – PC_enable signal controls if the PC is incremented or not

PC_enable = T1.MFC + T3.BR

where BR stands for all the branch instructions

Page 12: ECE 341 Lecture # 12

Pipelining

Page 13: ECE 341 Lecture # 12

Overview

• So far, we have assumed that only one instruction is being processed by the multi-stage hardware at any point of time

• How do we decrease the execution time of a program?

• One possibility is to use faster circuits to implement the processor

– This approach will decrease the execution time of each instruction

• Another possibility is to arrange the processor hardware in such a way that multiple instructions can be processed at the same time. This approach is called pipelining

Pipelining does not change the time needed to perform a single instruction, but it increases the number of instructions performed

per second (instruction completion rate or throughput)

Page 14: ECE 341 Lecture # 12

Traditional Pipelining Concepts

• Laundry Example

• Four loads of laundry need

to be washed, dried and folded

• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

A B C D

Page 15: ECE 341 Lecture # 12

Traditional Pipelining Concepts (cont.)

• Sequential laundry takes 6 hours for 4 loads

• If pipelining is used, how long would laundry take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Time

Page 16: ECE 341 Lecture # 12

Traditional Pipelining Concepts (cont.)

• Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

T

a

s

k

O

r

d

e

r

Time

30 40 40 40 40 20

Page 17: ECE 341 Lecture # 12

Traditional Pipelining Concepts (cont.)

• Pipelining doesn’t reduce the time taken by an individual task, it improves the throughput of entire workload

• Task completion rate limited by slowest pipeline stage

• Potential speedup = Number of pipeline stages

• Unbalanced lengths of pipeline stages reduces speedup

• Time to “fill” pipeline and time to “drain” it further reduces speedup

A

B

C

D

6 PM 7 8 9

T

a

s

k

O

r

d

e

r

Time

30 40 40 40 40 20

Page 18: ECE 341 Lecture # 12

Pipelining in a 5-stage Processor

At any given time, a different instruction is being processed by each pipeline stage How do we ensure that each stage has the correct inputs that it needs to process a particular instruction?

Time

Fetch Decode Compute Memory Writeback

Fetch Decode Compute Memory Writeback

Fetch Decode Compute Memory Writeback

Clock Cycle 1 2 3 4 5 6 7

Ij

Ij+1

Ij+2

Page 19: ECE 341 Lecture # 12

Information needed by an instruction is carried through the pipeline, as the instruction proceeds from one stage to the next This information is held in inter-stage buffers Read the details of each inter-stage buffer in Section 6.2

Information needed to process the instruction

Page 20: ECE 341 Lecture # 12

Pipeline Performance

• Example: A program consisting of 500 instructions is executed on a 5-stage processor. How many cycles would be required to complete the program, (i) without pipelining, (ii) with pipelining? Assume ideal overlap in case of pipelining.

• Solution:

Without pipelining: Each instruction will require 5 cycles. There will be no overlap amongst successive instructions.

Number of cycles = 500 * 5 = 2500

With pipelining: Each pipeline stage will process a different instruction every cycle. First instruction will complete in 5 cycles, then one instruction will complete in every cycle, due to ideal overlap.

Number of cycles = 5 + ((500-1)*1) = 504

• Speedup for ideal pipelining = 2500/504 = 4.96 (or approx. 5)

Page 21: ECE 341 Lecture # 12

Pipeline Performance (cont.)

• The potential increase in performance resulting from pipelining is proportional to the number of pipeline stages

• However, this increase would be achieved only if

– all pipeline stages require the same time to complete, and

– there is no interruption throughout program execution

• Unfortunately, this is not true

– there are times when an instruction cannot proceed from one stage to the next in every clock cycle

Page 22: ECE 341 Lecture # 12

Pipeline Performance (cont.)

Ij+2

F D C M W

F D C M W

Clock Cycle 1 2 3 4 5 6 7

Ij

Ij+1

8 9

Time

F D C M W

• Assume that Instruction Ij+1 is stalled in the decode stage for two extra cycles • This will cause Ij+2 to be stalled in the fetch stage, until Ij+1 proceeds • New instructions cannot enter the pipeline until Ij+2 proceeds past the fetch stage after cycle 5 => execution time increases by two cycles