eecc550 - shaaban #1 lec # 4 winter 2003 12-16-2003 cpu organization datapath design:...

55
EECC550 - Shaaban EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2 CPU Organization CPU Organization Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs): (e.g., Registers, ALU, Shifters, Logic Units, ...) Ways in which these components are interconnected (buses connections, multiplexors, etc.). How information flows between components. Control Unit Design: Logic and means by which such information flow is controlled. Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). Hardware description with a suitable language, possibly using Register Transfer Notation (RTN).

Post on 20-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 4 Winter 2003 12-16-2003

CPU OrganizationCPU Organization• Datapath Design:

– Capabilities & performance characteristics of principal Functional Units (FUs):

– (e.g., Registers, ALU, Shifters, Logic Units, ...)– Ways in which these components are interconnected (buses

connections, multiplexors, etc.).– How information flows between components.

• Control Unit Design:– Logic and means by which such information flow is controlled.– Control and coordination of FUs operation to realize the targeted

Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram).

• Hardware description with a suitable language, possibly using Register Transfer Notation (RTN).

Page 2: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#2 Lec # 4 Winter 2003 12-16-2003

Major CPU Design StepsMajor CPU Design Steps1 Using independent RTN, write the micro-operations

required for target ISA instructions.

2 Construct the datapath required by the micro-operations identified in step 1.

3 Identify and define the function of all control signals needed by the datapath.

3 Control unit design, based on micro-operation timing and control signals identified:- Combinational logic: For single cycle CPU.- Hard-Wired: Finite-state machine implementation.- Microprogrammed.

(Chapter 5.1 - 5.3)

Page 3: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#3 Lec # 4 Winter 2003 12-16-2003

CPU Design & Implantation ProcessCPU Design & Implantation Process• Top-down Design:

– Specify component behavior from high-level requirements (ISA).

• Bottom-up Design:– Assemble components in target technology to establish critical timing

(hardware delays).

• Iterative refinement:– Establish a partial solution, expand and improve.

Datapath Control

ProcessorInstruction SetArchitecture (ISA):ProvidesRequirements

Reg. File Mux ALU Reg Mem Decoder Sequencer

Cells Gates

Page 4: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#4 Lec # 4 Winter 2003 12-16-2003

Datapath Design StepsDatapath Design Steps• Write the micro-operation sequences required for a number of

representative target ISA instructions using independent RTN.

• Independent RTN statements specify: the required datapath components and how they are connected.

• From the above, create an initial datapath by determining possible destinations for each data source (i.e registers, ALU).– This establishes connectivity requirements (data paths, or connections)

for datapath components.– Whenever multiple sources are connected to a single input, a

multiplexor of appropriate size is added.

• Find the worst-time propagation delay in the datapath to determine the datapath clock cycle (CPU clock cycle).

• Complete the micro-operation sequences for all remaining instructions adding datapath components + connections/multiplexors as needed.

Page 5: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#5 Lec # 4 Winter 2003 12-16-2003

MIPS Instruction FormatsMIPS Instruction Formats

• op: Opcode, operation of the instruction.

• rs, rt, rd: The source and destination register specifiers.

• shamt: Shift amount.

• funct: Selects the variant of the operation in the “op” field.

• address / immediate: Address offset or immediate value.

• target address: Target address of the jump instruction.

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

R-Type

I-Type: ALULoad/Store, Branch

J-Type: Jumps

Page 6: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#6 Lec # 4 Winter 2003 12-16-2003

MIPS R-Type (ALU) Instruction FieldsMIPS R-Type (ALU) Instruction Fields

• op: Opcode, basic operation of the instruction. – For R-Type op = 0

• rs: The first register source operand.• rt: The second register source operand.• rd: The register destination operand.• shamt: Shift amount used in constant shift operations.• funct: Function, selects the specific variant of operation in the op

field.

OP rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

R-Type: All ALU instructions that use three registers

add $1,$2,$3

sub $1,$2,$3

and $1,$2,$3or $1,$2,$3

Examples:

Destination register in rd Operand register in rt

Operand register in rs

Page 7: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#7 Lec # 4 Winter 2003 12-16-2003

MIPS ALU I-Type Instruction FieldsMIPS ALU I-Type Instruction FieldsI-Type ALU instructions that use two registers and an immediate value Loads/stores, conditional branches.

• op: Opcode, operation of the instruction.

• rs: The register source operand.

• rt: The result destination register.

• immediate: Constant second operand for ALU instruction.

OP rs rt immediate

6 bits 5 bits 5 bits 16 bits

add immediate: addi $1,$2,100

and immediate andi $1,$2,10

Examples:

Result register in rtSource operand register in rs

Constant operand in immediate

Page 8: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#8 Lec # 4 Winter 2003 12-16-2003

MIPS Load/Store I-Type Instruction FieldsMIPS Load/Store I-Type Instruction Fields

• op: Opcode, operation of the instruction.– For load op = 35, for store op = 43.

• rs: The register containing memory base address. • rt: For loads, the destination register. For stores, the source

register of value to be stored. • address: 16-bit memory address offset in bytes added to base

register.

OP rs rt address

6 bits 5 bits 5 bits 16 bits

Store word: sw 500($4), $3

Load word: lw $1, 30($2)

Examples:

Offset base register in rs

source register in rt

Destination register in rt Offsetbase register in rs

Signed addressoffset in bytes

Page 9: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#9 Lec # 4 Winter 2003 12-16-2003

MIPS Branch I-Type Instruction FieldsMIPS Branch I-Type Instruction Fields

• op: Opcode, operation of the instruction.

• rs: The first register being compared

• rt: The second register being compared.

• address: 16-bit memory address branch target offset in words added to PC to form branch address.

OP rs rt address

6 bits 5 bits 5 bits 16 bits

Branch on equal beq $1,$2,100

Branch on not equal bne $1,$2,100

Examples:

Register in rsRegister in rt offset in bytes equal to

instruction field address x 4

Signed addressoffset in words

Page 10: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#10 Lec # 4 Winter 2003 12-16-2003

MIPS J-Type Instruction FieldsMIPS J-Type Instruction Fields

• op: Opcode, operation of the instruction.– Jump j op = 2– Jump and link jal op = 3

• jump target: jump memory address in words.

J-Type: Include jump j, jump and link jal

OP jump target

6 bits 26 bits

Branch on equal j 10000

Branch on not equal jal 10000

Examples:

Jump memory address in bytes equal toinstruction field jump target x 4

jump target = 2500

4 bits 26 bits 2 bits

0 0PC(31-28)

Effective 32-bit jump address: PC(31-28),jump_target,00

FromPC+4

Page 11: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#11 Lec # 4 Winter 2003 12-16-2003

A Subset of MIPS InstructionsA Subset of MIPS InstructionsADD and SUB:

addU rd, rs, rt

subU rd, rs, rt

OR Immediate:

ori rt, rs, imm16

LOAD and STORE Word

lw rt, rs, imm16

sw rt, rs, imm16

BRANCH:

beq rs, rt, imm16

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Page 12: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#12 Lec # 4 Winter 2003 12-16-2003

Basic Instruction Processing StepsBasic Instruction Processing Steps

Obtain instruction from program storage

Determine instruction type

Obtain operands from registers

Compute result value or status

Store result in register/memory if needed

(usually called Write Back).

Update program counter to address

of next instruction } Commonsteps for all instructions

Instruction

Fetch

Instruction

Decode

Execute

Result

Store

Next

Instruction

Instruction Mem[PC]

PC PC + 4

Done by Control Unit

Page 13: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#13 Lec # 4 Winter 2003 12-16-2003

Overview of MIPS Instruction Micro-operationsOverview of MIPS Instruction Micro-operations• All instructions go through these common steps:

– Send program counter to instruction memory and fetch the instruction. (fetch) Instruction Mem[PC]

– Update the program counter to point to next instruction PC PC + 4– Read one or two registers, using instruction fields. (decode)

• Load reads one register only.

• Additional instruction execution actions (execution) depend on the instruction in question, but similarities exist:– All instruction classes use the ALU after reading the registers:

• Memory reference instructions use it for address calculation.• Arithmetic and logic instructions (R-Type), use it for the specified

operation.• Branches use it for comparison.

• Additional execution steps where instruction classes differ:– Memory reference instructions: Access memory for a load or store.– Arithmetic and logic instructions: Write ALU result back in register.– Branch instructions: Change next instruction address based on comparison.

Page 14: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#14 Lec # 4 Winter 2003 12-16-2003

A Single Cycle MIPS CPU ImplementationA Single Cycle MIPS CPU Implementation

Design target: A single-cycle per instruction MIPS CPU implementation

All micro-operations of an instruction are to be carried out in a single system clock cycle. Cycles Per Instruction = CPI = 1

CPI = 1

Page 15: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#15 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

R[rd] R[rs] + R[rt] Add register rs to register rt result in register rd

R-Type Example:R-Type Example:Micro-Operation Sequence For ADDUMicro-Operation Sequence For ADDU

OP rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

addU rd, rs, rt

Independent RTN ?

Page 16: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#16 Lec # 4 Winter 2003 12-16-2003

Initial Datapath ComponentsInitial Datapath Components

Two state elements (memory) needed to store and access instructions:1 Instruction memory: • Only read access (by user code). No read control signal.

2 Program counter (PC): 32-bit register.• Written at end of every clock cycle: No write control signal.

3 32-bit Adder: To compute the the next instruction address.

InstructionWord

32 3232

32

3232

32

Three components needed by: Instruction Fetch: Instruction Mem[PC]

Program Counter Update: PC PC + 4

Page 17: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#17 Lec # 4 Winter 2003 12-16-2003

More Datapath ComponentsMore Datapath Components

Register File:

• Contains all ISA registers.• Two read ports and one write port.• Register writes by asserting write control signal• Writes are edge-triggered.• Can read and write to the same register in the same clock cycle.

ISA Register File Main 32-bit ALU

32

32

32

32

32

32-bit Arithmetic and Logic Unit (ALU)

Page 18: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#18 Lec # 4 Winter 2003 12-16-2003

• Register File consists of 32 registers:– Two 32-bit output busses: busA and busB– One 32-bit input bus: busW

• Register is selected by:– RA (number) selects the register to put on busA (data):

busA = R[RA]– RB (number) selects the register to put on busB (data):

busB = R[RB]– RW (number) selects the register to be written

via busW (data) when Write Enable is 1Write Enable: R[RW] busW

• Clock input (CLK) – The CLK input is a factor ONLY during write operations.– During read operation, it behaves as a combinational logic block:

• RA or RB valid => busA or busB valid after “access time.”

Register File DetailsRegister File Details

Clk

busW

Write Enable

3232

busA

32busB

5 5 5

RW RA RB

32 32-bitRegisters

Page 19: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#19 Lec # 4 Winter 2003 12-16-2003

A Possible Register File Implementation

5-to-32Decoder

Register 0Write

Data In

DataOut

Register 1Write

Data In

DataOut

Register 30Write

Data In

DataOut

Register 31Write

Data In

DataOut

......

32-to-1 MUX

01

3031

32...

5

32-to-1 MUX

01

3031

32...

5

.

..

.

32

32

32

32

.

.

.

...

0

1

3031

5

.

.

.

RegisterRead Data 1(Bus A)

RegisterRead Data 2(Bus B)

Read Register 1 (RA)

Read Register 2 (RB)

Register Write Data (Bus W)

32

Register Write Enable (RegWrite)

WriteRegister RW

Clk

busW

Write Enable

3232

busA

32

busB

5 5 5RW RA RB

32 32-bitRegisters

(Appendix B - The Basics of Logic Design)

Page 20: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#20 Lec # 4 Winter 2003 12-16-2003

Idealized MemoryIdealized Memory

• Memory (idealized)

– One input bus: Data In.

– One output bus: Data Out.

• Memory word is selected by:

– Address selects the word to put on Data Out bus.

– Write Enable = 1: address selects the memoryword to be written via the Data In bus.

• Clock input (CLK):

– The CLK input is a factor ONLY during write operation,

– During read operation, this memory behaves as a combinational logic block:

• Address valid => Data Out valid after “access time.”

• Ideal Memory = Short access time.

Clk

Data In

Write Enable

32 32DataOut

Address

Page 21: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#21 Lec # 4 Winter 2003 12-16-2003

Building The DatapathBuilding The Datapath

Portion of the datapath used for fetching instructionsand incrementing the program counter.

Instruction Fetch& PC Update:

PC PC + 4

Instruction Mem[PC]

32

32

32

32

Page 22: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#22 Lec # 4 Winter 2003 12-16-2003

Simplified Datapath For MIPS Simplified Datapath For MIPS R-Type InstructionsR-Type Instructions

Components and connections as specified by RTN statement

R[rd] R[rs] + R[rt]

32

32

32

32

rs

rt

rd

R[rs]

R[rt]

FromInstructionMemory

Page 23: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#23 Lec # 4 Winter 2003 12-16-2003

More Detailed Datapath More Detailed Datapath For R-Type InstructionsFor R-Type Instructions

With Control Points IdentifiedWith Control Points Identified

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs RtRd

AL

U

R[rd] R[rs] + R[rt]

R[rs]

R[rt]

Page 24: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#24 Lec # 4 Winter 2003 12-16-2003

R-Type Register-Register TimingR-Type Register-Register Timing

32Result

ALUctr

Clk

busW

RegWr

3232

busA

32busB

5 5 5

Rw Ra Rb32 32-bitRegisters

Rs RtRd

AL

U

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memory Access Time

Old Value

New Value

RegWr Old Value

New Value

Delay through Control Logic

busA, B

Register File Access TimeOld Value

New Value

busWALU Delay

Old Value

New Value

Old Value

New Value

New ValueOld Value

Register WriteOccurs Here

Page 25: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#25 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

R[rt] R[rs] OR ZeroExt[imm16] OR register rs with immediate field zero extended to 32 bits, result in register rt

Logical Operations with Immediate Example Example::

Micro-Operation Sequence For ORIMicro-Operation Sequence For ORI

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

ori rt, rs, imm16

Page 26: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#26 Lec # 4 Winter 2003 12-16-2003

Datapath For Logical Datapath For Logical Instructions With ImmediateInstructions With Immediate

R[rt] R[rs] OR ZeroExt[imm16]

32

Result

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs Rt

RtRdRegDst

ZeroE

xt

Mu

x

Mux

3216imm16

ALUSrc

AL

U

01

R[rs]

R[rt]

2x1 MUX (width 5 bits)

2x1 MUX (width 32 bits)

Page 27: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#27 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

R[rt] Mem[R[rs] + SignExt[imm16]] Immediate field sign extended to 32 bits and added to register rs to form memory load address,

word at load address to register rt

Load Operations ExampleExample::

Micro-Operation Sequence For LWMicro-Operation Sequence For LW

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

lw rt, rs, imm16

Data Memory

Instruction Memory

Effective Address

Address offsetin bytes

Page 28: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#28 Lec # 4 Winter 2003 12-16-2003

Additional Datapath Components For Additional Datapath Components For Loads & StoresLoads & Stores

Inputs: for address and write (store) dataOutput for read (load) result

16-bit input sign-extended into a 32-bit value at the output

For SignExt[imm16]

32

32

32

Page 29: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#29 Lec # 4 Winter 2003 12-16-2003

Datapath For LoadsDatapath For Loads

R[rt] Mem[R[rs] + SignExt[imm16]]

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs

RtRdRegDst

Exten

der

Mu

x

Mux

3216

imm16

ALUSrc

ExtOp

Clk

Data InWrEn

32

Adr

DataMemory

32

AL

U

MemWr Mu

x

MemtoReg

1 0

0

1

0

1

R[rs]

R[rt]

Base Address register

EffectiveAddress

Offset

Page 30: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#30 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

Mem[R[rs] + SignExt[imm16]] R[rt] Immediate field sign extended to 32 bits and added to register rs

to form memory store address,register rt written to memory at store address.

Store Operations ExampleExample::

Micro-Operation Sequence For SWMicro-Operation Sequence For SW

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

sw rt, rs, imm16

Effective Address

Address offsetin bytes

Page 31: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#31 Lec # 4 Winter 2003 12-16-2003

Datapath For StoresDatapath For Stores

Mem[R[rs] + SignExt[imm16]] R[rt]

ALUSrcExtOp

32

ALUctr

Clk

busW

RegWr

32

32

busA

32

busB

55 5

Rw Ra Rb

32 32-bitRegisters

Rs

Rt

Rt

Rd

RegDst

Exten

der

Mu

x

Mux

3216imm16

Clk

Data InWrEn

32

Adr

DataMemory

MemWr

AL

U32

Mu

xMemtoReg

1 0

0

1

0

1

R[rs]

R[rt]

Base Address register

EffectiveAddress

OffsetR[rt]

Page 32: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#32 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

Zero R[rs] - R[rt] Calculate the branch condition R[rs] == R[rt]

(i.e R[rs] - R[rt] = 0 )

Zero : PC PC + 4 + ( SignExt(imm16) x 4 ) Calculate the next

instruction’s PC address

Conditional Branch ExampleExample::

Micro-Operation Sequence For BEQMicro-Operation Sequence For BEQ

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

beq rs, rt, imm16

Branch Target

PC Offset in words

Page 33: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#33 Lec # 4 Winter 2003 12-16-2003

Datapath For Branch Instructions

ALU to evaluate branch conditionNew adder to compute branch target:

• Sum of incremented PC and the sign-extended lower 16-bits on the instruction.

Main ALU Evaluates Branch Condition (subtract)

R[rs]

R[rt]

Zero flag =1if R[rs] = R[rt] = 0

New 32-bit Adder (Third ALU)for Branch Target

Page 34: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#34 Lec # 4 Winter 2003 12-16-2003

More Detailed Datapath More Detailed Datapath For Branch OperationsFor Branch Operations

Clk

busW

RegWr

32

busA

32

busB

5 5 5

Rw Ra Rb

32 32-bitRegisters

Rs Rt

Eq

ual

?

Zero

32

imm16

PCClk

00

Ad

der

Mu

x

Ad

der

4 PCSrc

PC

Ext

Instruction Address

32

BranchZero

PC+4

BranchTarget

0

1Main ALU

(subtract)

Branch Target ALU New 2X1 32-bitMUX to select next PC valueSign extend

shift left 2

PC

Page 35: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#35 Lec # 4 Winter 2003 12-16-2003

Combining The Datapaths For Memory Combining The Datapaths For Memory Instructions and R-Type InstructionsInstructions and R-Type Instructions

Highlighted muliplexors and connections added to combine the datapaths of memory and R-Type instructions into one datapath(This is book version ORI not supported)

rs

rtR[rs]

R[rt]

R[rt]

0

1

1

0

rt/rdMUX not shown

Page 36: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#36 Lec # 4 Winter 2003 12-16-2003

Instruction Fetch Datapath Added toInstruction Fetch Datapath Added toALU R-Type and Memory Instructions DatapathALU R-Type and Memory Instructions Datapath

R[rs]

R[rt]1

0

0

1

rs

rt

PC+ 4

PC

(This is book version ORI not supported, no zero extend of immediate needed)

rt/rdMUX not shown

Page 37: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#37 Lec # 4 Winter 2003 12-16-2003

A Simple Datapath For The MIPS ArchitectureA Simple Datapath For The MIPS ArchitectureDatapath of branches and a program counter multiplexor are added.

Resulting datapath can execute in a single cycle the basic MIPS instruction:

- load/store word - ALU operations - Branches

1

0

0

1

PC +4

Branch Target

rt/rdMUX not shown

(This is book version ORI not supported, no zero extend of immediate needed)

Page 38: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#38 Lec # 4 Winter 2003 12-16-2003

Single Cycle MIPS DatapathSingle Cycle MIPS DatapathNecessary multiplexors and control lines are identified here:

(This is book version ORI not supported, no zero extend of immediate needed)

Page 39: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#39 Lec # 4 Winter 2003 12-16-2003

Putting It All Together: A Single Cycle DatapathPutting It All Together: A Single Cycle Datapathim

m16

32

ALUctr

Clk

busW

RegWr

32

32

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LU

Zero

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

PCSrc

PC

Ext

Adr

InstMemory

BranchZero

0

1

PC+4

BranchTarget

R[rs]

R[rt]

(Includes ORI)

MainALU

Page 40: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#40 Lec # 4 Winter 2003 12-16-2003

Instruction Word Mem[PC] Fetch the instruction

PC PC + 4 Increment PC

PC PC(31-28),jump_target,00 Update PC with jump address

Adding Support For Jump::

Micro-Operation Sequence For Jump: JMicro-Operation Sequence For Jump: J

OP Jump_target

6 bits 26 bits

j jump_target

jump target = 2500

4 bits 26 bits 2 bits

0 0

PC(31-28)

JumpAddress

Jump addressin words

Page 41: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#41 Lec # 4 Winter 2003 12-16-2003

Datapath For JumpDatapath For Jump

32

PC

Clk

00

Mu

x

PCSrc

imm16

Ad

der

Ad

der

4

PC

Ext

Next Instruction Address

Mu

x

JUMP

Shift left 2jump_target

Instruction(15-0)

Instruction(25-0)

32

26

PC+4(31-28)

28 32

4

32 0

1

PC+4

BranchZero

BranchTarget

JumpAddress

PC

Page 42: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#42 Lec # 4 Winter 2003 12-16-2003

RegDst ALUctrALUSrc MemtoReg

Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRsRt

Adr

InstructionMemory

DATA PATHDATA PATH

ExtOp MemWr

Control UnitControl Unit

Op

<21:25>

Fun

BranchRegWr

<0:25>

Jump_target

Jump

Control Lines

Page 43: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#43 Lec # 4 Winter 2003 12-16-2003

Single Cycle MIPS Datapath Extended To Handle Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit AddedJump with Control Unit Added

(This is book version ORI not supported, no zero extend of immediate needed)

Page 44: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#44 Lec # 4 Winter 2003 12-16-2003

Control Signal GenerationControl Signal Generation

add sub ori lw sw beq jump

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUctr<2:0>

1

0

0

1

0

0

0

x

Add

1

0

0

1

0

0

0

x

Subtract

0

1

0

1

0

0

0

0

Or

0

1

1

1

0

0

0

1

Add

x

1

x

0

1

0

0

1

Add

x

0

x

0

0

1

0

x

Subtract

x

x

x

0

0

0

1

x

xxx

func

op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010Appendix A10 0000See 10 0010 Don’t Care

Control lines for Main ALU

Page 45: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#45 Lec # 4 Winter 2003 12-16-2003

The Concept of Local DecodingThe Concept of Local Decoding

R-type ori lw sw beq jump

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUop<N:0>

1

0

0

1

0

0

0

x

“R-type”

0

1

0

1

0

0

0

0

Or

0

1

1

1

0

0

0

1

Add

x

1

x

0

1

0

0

1

Add

x

0

x

0

0

1

0

x

Subtract

x

x

x

0

0

0

1

x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

MainControl

op

6

ALUControl(Local)

func

N

6ALUop

ALUctr

3

AL

U

Page 46: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#46 Lec # 4 Winter 2003 12-16-2003

Local Decoding of “func” FieldLocal Decoding of “func” Field

R-type ori lw sw beq jump

ALUop (Symbolic) “R-type” Or Add Add Subtract xxx

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx

MainControl

op

6

ALUControl(Local)

func

N

6ALUop

ALUctr

3

funct<5:0> Instruction Operation

10 0000

10 0010

10 0100

10 0101

10 1010

add

subtract

and

or

set-on-less-than

ALUctr<2:0> ALU Operation

000

001

010

110

111

Add

Subtract

And

Or

Set-on-less-than

ALUctrA

LU

(From Chapter 4)

Page 47: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#47 Lec # 4 Winter 2003 12-16-2003

The Truth Table for ALUctrThe Truth Table for ALUctr

R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtract

ALUop<2:0> 1 00 0 10 0 00 0 00 0 01

ALUop func

bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>

0 0 0 x x x x

ALUctrALUOperation

Add 0 1 0

bit<2> bit<1> bit<0>

0 x 1 x x x x Subtract 1 1 0

0 1 x x x x x Or 0 0 1

1 x x 0 0 0 0 Add 0 1 0

1 x x 0 0 1 0 Subtract 1 1 0

1 x x 0 1 0 0 And 0 0 0

1 x x 0 1 0 1 Or 0 0 1

1 x x 1 0 1 0 Set on < 1 1 1

funct<3:0> Instruction Op.

0000

0010

0100

0101

1010

add

subtract

and

or

set-on-less-than

Page 48: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#48 Lec # 4 Winter 2003 12-16-2003

The Truth Table For The Main ControlThe Truth Table For The Main Control

R-type ori lw sw beq jump

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUop (Symbolic)

1

0

0

1

0

0

0

x

“R-type”

0

1

0

1

0

0

0

0

Or

0

1

1

1

0

0

0

1

Add

x

1

x

0

1

0

0

1

Add

x

0

x

0

0

1

0

x

Subtract

x

x

x

0

0

0

1

x

xxx

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

ALUop <2> 1 0 0 0 0 x

ALUop <1> 0 1 0 0 0 x

ALUop <0> 0 0 0 0 1 x

Page 49: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#49 Lec # 4 Winter 2003 12-16-2003

Example: RegWrite Logic EquationExample: RegWrite Logic Equation

R-type ori lw sw beq jump

RegWrite 1 1 1 0 0 0

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

RegWrite = R-type + ori + lw

= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)

+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)

+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jump

RegWrite

Page 50: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#50 Lec # 4 Winter 2003 12-16-2003

PLA Implementation of the Main ControlPLA Implementation of the Main Control

op<0>

op<5>. .op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

op<5>. .<0>

R-type ori lw sw beq jumpRegWrite

ALUSrc

MemtoReg

MemWrite

Branch

Jump

RegDst

ExtOp

ALUop<2>

ALUop<1>

ALUop<0>

Page 51: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#51 Lec # 4 Winter 2003 12-16-2003

Clk

PC

Rs, Rt, Rd,Op, Func

Clk-to-Q

ALUctr

Instruction Memoey Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA

Register File Access Time

Old Value New Value

busB

ALU Delay

Old Value New Value

Old Value New Value

New ValueOld Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

MemtoReg Old Value New Value

Address Old Value New Value

busW Old Value New

Delay through Extender & Mux

RegisterWrite Occurs

Data Memory Access Time

Worst Case Timing (Load)Worst Case Timing (Load)

Page 52: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#52 Lec # 4 Winter 2003 12-16-2003

Instruction Timing ComparisonInstruction Timing Comparison

PC Inst Memory mux ALU Data Mem mux

PC Reg FileInst Memory mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Reg File

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

setup

Page 53: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#53 Lec # 4 Winter 2003 12-16-2003

Simplified Single Cycle Datapath Timing• Assuming the following datapath/control hardware components delays:

– Memory Units: 2 ns

– ALU and adders: 2 ns

– Register File: 1 ns

– Control Unit < 1 ns

• Ignoring Mux and clk-to-Q delays, critical path analysis:

Instruction Memory

Register Read

Main ALU

Data Memory

Register Write

PC + 4 ALU

Branch Target ALU

Control Unit

Time

0 2ns 3ns 4ns 5ns 7ns 8ns

Critical Path(Load)

Obtained from low-level target technology implementation of components

Page 54: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#54 Lec # 4 Winter 2003 12-16-2003

Performance of Single-Cycle (CPI=1) CPUPerformance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays:

– Memory Units: 2 ns– ALU and adders: 2 ns– Register File: 1 ns

• The delays needed for each instruction type can be found :

• The clock cycle is determined by the instruction with longest delay: The load in this case which is 8 ns. Clock rate = 1 / 8 ns = 125 MHz• A program with I = 1,000,000 instructions executed takes:

Execution Time = T = I x CPI x C = 106 x 1 x 8x10-9 = 0.008 s = 8 msec

Instruction Instruction Register ALU Data Register Total Class Memory Read Operation Memory Write Delay

ALU 2 ns 1 ns 2 ns 1 ns 6 ns

Load 2 ns 1 ns 2 ns 2 ns 1 ns 8 ns

Store 2 ns 1 ns 2 ns 2 ns 7 ns

Branch 2 ns 1 ns 2 ns 5 ns

Jump 2 ns 2 ns

Load has longest delay of 8 nsthus determining the clock cycle of the CPU to be 8ns

Nano second, ns = 10-9 second

Page 55: EECC550 - Shaaban #1 Lec # 4 Winter 2003 12-16-2003 CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional

EECC550 - ShaabanEECC550 - Shaaban#55 Lec # 4 Winter 2003 12-16-2003

Drawback of Single Cycle ProcessorDrawback of Single Cycle Processor• Long cycle time:

– Cycle time must be long enough for the load instruction:PC’s Clock -to-Q +

Instruction Memory Access Time +

Register File Access Time +

ALU Delay (address calculation) +

Data Memory Access Time +

Register File Setup Time +

Clock Skew

• All instructions must take as much time as the slowest– Cycle time for load is longer than needed for all other instructions.

• Real memory is not as well-behaved as idealized memory– Cannot always complete data access in one (short) cycle.

• Does not allow overlap of instruction processing (instruction pipelining, chapter 6).