tailoring the 32-bit alu to mips - computer sciencezhangs/csc446/lec9.pdftailoring the 32-bit alu to...

14
1 Tailoring the 32-Bit ALU to MIPS MIPS ALU extensions Overflow detection: Carry into MSB XOR Carry out of MSB Branch instructions Shift instructions Slt instruction Immediate instructions ALU performance Performance vs. cost Carry lookahead adder Implementation alternatives Branch Instructions beq $t5, $t6, L Use subtraction: (a-b) = 0 implies a = b Add hardware to test if the result is 0 OR all 32 results and invert the OR output ZERO = (Result 1 + Result 2 + .. + Result 31 ) Note: Signal ZERO is a 1 when the result is zero!

Upload: others

Post on 16-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

1

Tailoring the 32-Bit ALU to MIPS

• MIPS ALU extensions• Overflow detection:

• Carry into MSB XOR Carry out of MSB

• Branch instructions• Shift instructions• Slt instruction

• Immediate instructions• ALU performance

– Performance vs. cost– Carry lookahead adder

• Implementation alternatives

Branch Instructions• beq $t5, $t6, L

– Use subtraction: (a-b) = 0 implies a = b– Add hardware to test if the result is 0– OR all 32 results and invert the OR output

ZERO = (Result1 + Result2 + .. + Result31)

•Note: Signal ZERO is a 1 when the result is zero!

Page 2: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

2

Branch Support

1 (A = B)

0 otherwise

Shift instructions• SLL, SRL, and SRA• We need a data line for a shifter (L and R)• However, shifters are much more easily implemented

at the transistor level (outside the ALU)• Barrel shifters

x3 x2 x1 x0

x3 x2 x1 x0 x2 x1 x0 0 0 x3 x2 x1

Output, x Output, x<<1 Output, x>>1

Diagonal closed switch pattern controlled by the control unit

Page 3: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

3

Immediate Instructions• First input to ALU is

the first register (rs)• Second input

– Data from register (rt)

– Zero- or sing-extended immediate

• Add a mux at second input of ALU

IR:

Control Unit

0 1

Sign extend

1632

ALU

Zero

OverflowResult

Registers

Memory address

rs rt

Slt Instruction• Slt rd, rs, rt

• A < B => A – B < 01. Perform subtraction using full adder2. Check highest-order bit (sign bit)3. Sign bit tells us whether A < B

• New input line (Less) goes directly to mux• New control code for slt• Result for slt is not the output from ALU

– Need a new 1-bit ALU for the most significant bit• It has a new output line (Set) used only for slt• (Overflow detection logic is also associated with this bit)

0000 0000 0000 0000 0000 0000 0000 000rrd:1 if (rs < rt)

0 else

Page 4: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

4

Slt Support

First bit (LSB)Sign bit

What is the control code for slt?

Overview

I- instruction

32-bit memory address

Page 5: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

5

ALU Performance• Hardware executes in parallel

• Is a 32-bit ALU as fast as a 1-bit ALU?• Speed vs. Cost

– Fewer sequential gates vs. number of gates

• Two extremes to do addition– Ripple carry and sum-of-products

• How could you get rid of the ripple?• carry-look-ahead adder

c1 = b0c0 + a0c0 + a0b0c2 = b1c1 + a1c1 + a1b1 c2 = c2(a0,b0,c0,a1,b1)

c3 = b2c2 + a2c2 + a2b2 c3 = c3(a0,b0,c0,a1,b1,a2,b2)

c4 = b3c3 + a3c3 + a3b3 c4 = c4(a0,b0,c0,a1,b1,a2,b2,a3,b3)

Not feasible! Too many inputs to the gates

Conclusions• We can build an ALU to support the MIPS ISA

– Key Idea: Use multiplexer to select ALU output– Subtraction uses two’s complement addition– Replicate1-bit ALU to produce 32-bit ALU

• Important points about hardware– All of the gates in the ALU work in parallel– The speed of a gateis affected by the number of inputs– Speed of a circuitis affected by the number of gates in

series(on the critical path or the deepest level of logic)

• Our primary focus: (conceptual)– Clever changes to organizationcan improve performance

(similar to using better algorithms in software)

Page 6: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

6

Review: 32-bit ALU

1-bit ALU

Requirements:Control codes �operations

Datapath

IR:

Control Unit

0 1

Sign extend

1632

ALU

Zero

OverflowResult

Registers

Memory address

rs rt

Datapath for•ALU instructions•lw/sw instructions•Imm instructions•Branch instructions

Page 7: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

7

3.3 Multiplication• More complicated than addition

– Accomplished via shifting and addition

– Requires more time and chip area

• 3 versions of pencil-and-paper algorithm0010 (multiplicand)

__x_1011 (multiplier)

0010 1 -> copy & shift0010 1 -> copy & shift0000 0 -> shift0010 . 1 -> copy & shift00010110 Sum Partial Products

(multiplicand to left)

First Version (V.1)

0 0 1 01 0 1 1

0 0 0 0 0 0 1 00 0 0 0 0 1 0 00 0 0 0 1 0 0 00 0 0 1 0 0 0 00 0 0 1 0 1 1 0

Page 8: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

8

V.1: Hardware

Product (64 bits)

Multiplicand (64 bits)

Control test

64-bit ALUMultiplier (32 bits)

Shift right

Shift left

Write

Multiplier0

Problems:• half of the bits of multiplicand are always 0•Wasteful, slow

V.1: Hardware

Page 9: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

9

Steps• Unsigned multiplication: Shift-and-add

– Generate one partial product for each digit in the multiplier

– Partial product =

– Total product = sum of (left shifted) partial products

– The multiplication of two n-bit binary integers results in a product of up to 2n bits in length

– Signed multiplication– Convert them to positive numbers and remember the

original signs. Need to extend sign of the product

– there are better techniques

0 If multiplier digit = 0

Multiplicand If multiplier digit = 1

Second Version (V.2)

MultiplierShift right

Write

32 bits

64 bits

32 bits

Shift right

Multiplicand

32-bit ALU

Product Control test

Done

1. Test Multiplier0

1a. Add multiplicand to the left half of the product and place the result in

the left half of the Product register

2. Shift the Product register right 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

0 0 0 0 0 0 0 01 0 0 1 00 0 0 0 0 0 0 1 0 0 0 01 0 0 1 10 0 0 0 0 0 0 1 1 0 0 00 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 01 0 0 1 01 1 0 0 0 0 0 1 0 1 1 0

Multiplier0

Product

0 0 1 0x 1 0 1 1

Page 10: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

10

Final Version (V.3)

ControltestWrite

32bits

64bits

Shift rightProduct

Multiplicand

32-bit ALU

Done

1. Test Product0

1a. Add multiplicand to the left half of the product and place the result in

the left half of the Product register

2. Shift the Product register right 1 bit

32nd repetition?

Start

Product0 = 0Product0 = 1

No: < 32 repetitions

Yes: 32 repetitions

0 0 0 0 1 0 1 10 0 1 0 1 0 1 1 0 0 0 1 0 1 0 10 0 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 10 0 1 0 1 1 0 1 0 0 0 1 0 1 1 0

Product

0 0 1 0x 1 0 1 1

General View

A31 . . . A0 Q31 . . . Q0

M31 . . . M0

Add

ControlShiftright

C

Multiplicand

Multiplier

32-bit ALU

C A Q M

0 0000 1101 1011

0 1011 1101 10110 0101 1110 1011

0 0010 1111 1011

0 1101 1111 10110 0110 1111 1011

1 0001 1111 10110 1000 1111 1011

Initial values

AddShift

AddShift

AddShift

Shift

1

2

3

4

1011 Multiplicand (11)

x 1101 Multiplier (13)

Product (143)

Page 11: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

11

MIPS Multiplication

• Special purpose registers for the result (Hi, Lo)• Two multiply instructions

– Mult: signed– Multu: unsigned

• mflo, mfhi – move contents from Hi, Lo to general purpose registers (GPRs)

• No overflow detection in hardware=> Software overflow detection

• Hi must be 0 for multu or the replicated sign of Lo for mult

Faster Multiplier

• Uses multiple adders– Cost/performance tradeoff

� Can be pipelined� Several multiplication performed in parallel

Page 12: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

12

3.4 Division

0 0 0 0 1 1 0 1

1 0 1 1 1 0 0 1 0 0 1 11 0 1 1

0 0 1 1 1 01 0 1 1

0 0 1 1 1 11 0 1 1

1 0 0

Quotient

DividendDivisor

Partial remainders

Remainder

• Long division of unsigned binary integers

Dividend = Quotient * Divisor + Reminder

Division Hardware

Initially dividend

Initially divisor in left half

Page 13: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

13

Optimized Divider

• One cycle per partial-remainder subtraction

• Looks a lot like a multiplier!– Same hardware can be used for both

MIPS• Multiply and divide use existing hardware

– ALU and shifter

• Extra hardware: 64-bit register able to SLL/SRA

– Hi contains the remainder (mfhi )– Lo contains the quotient (mflo)

• Instructions– Div: signed divide– Divu: unsigned divide

• MIPS ignores overflow?• Division by 0 must be checked in software

Page 14: Tailoring the 32-Bit ALU to MIPS - Computer Sciencezhangs/CSC446/Lec9.pdfTailoring the 32-Bit ALU to MIPS • MIPS ALU extensions • Overflow detection: • Carry into MSB XOR Carry

14

MIPS Processor

LoHi

SLL/SRA

IR:

Control Unit

0 1

Sign extend

1632

ALU

ZeroOverflow

Registers

Memory address0 1 2

Sub

Operation

M