ee141 guest lecture-no floorplan - university of...

4

Click here to load reader

Upload: duongnga

Post on 12-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EE141 Guest Lecture-no floorplan - University of …bwrcs.eecs.berkeley.edu/.../Lecture27-GuestLecture-6up.pdfEE141 1 EE141 1 EECS141 EE141-Spring 2006 Digital Integrated Circuits

EE141

1

EE1411

EECS141

EE141EE141--Spring 2006Spring 2006Digital Integrated Digital Integrated CircuitsCircuits

Design of an Execution UnitDesign of an Execution Unit

Luke TsaiLuke TsaiAMDAMD

EE1412

EECS141

OutlineOutlineIntroductionWhat is the Execution Unit?High Level Design ConsiderationsCircuit Design of an Barrel Shifter“Real Life” Designs

EE1413

EECS141

IntroductionIntroduction

If you love EE141…Consider a career in Microprocessor DesignAll aspects and variety of circuit designMaximum complexityLeading Edge Technology

EE1414

EECS141

What is an What is an Execution Execution Unit (EX)?Unit (EX)?

EE1415

EECS141

A Classical Processor A Classical Processor Block DiagramBlock Diagram

Instruction Fetch (IF)

Decode (DE)

Scheduler (SC)

Execution Unit (EX)

Load-Store (LS)

Floating Point (FPU)

Memory(L2 Cache)

EE1416

EECS141

The EX Unit Implements the The EX Unit Implements the Integer Instruction SetInteger Instruction Set

Add* R1, R2Sub R1, R2Mult R1, R2Div R1, R2ROL R1, R2SAR R1, R2CLZ R1

Instruction Fetch (IF)

Decode (DE)

Scheduler (SC)

Execution Unit (EX)

Load-Store (LS)

Floating Point (FPU)

Memory(L2 Cache)

*X86 notation. The first register is both a source and the destination

Page 2: EE141 Guest Lecture-no floorplan - University of …bwrcs.eecs.berkeley.edu/.../Lecture27-GuestLecture-6up.pdfEE141 1 EE141 1 EECS141 EE141-Spring 2006 Digital Integrated Circuits

EE141

2

EE1417

EECS141

Interface to the SCInterface to the SC

Instruction Fetch (IF)

Decode (DE)

Scheduler (SC)

Execution Unit (EX)

Load-Store (LS)

Floating Point (FPU)

Memory(L2 Cache)

The SC issues instructions to the EXOut-of-order SC needs to check for source dependency

Dependency

No Dependency,Can Issue in Parallel

Add R1, R2

Sub R3, R1

Mult R4, R2

.

EE1418

EECS141

Interface to the LSInterface to the LS

Instruction Fetch (IF)

Decode (DE)

Scheduler (SC)

Execution Unit (EX)

Load-Store (LS)

Floating Point (FPU)

Memory(L2 Cache)

For Load/Store Ops, EX generates address for the LS, which in turn sends/receives Data to/from EX.

Address generation to load data return is a classical critical path in processor design

Add R1, [R2]

Sub [R3], R1

Mult [R4], [R2]

Load

Store

Load-Op-Store

EE1419

EECS141

A Typical Block Diagram of EXA Typical Block Diagram of EX

Execution Unit

Multi-portedRegister File

ALU0

ALU

1..N

AG

en1.

.N

Ad

der

Shift

er

Mul

t

Div

/CLZ

/Pop

cnt

Result Bus

Operand Bus

Byp

ass

EE14110

EECS141

High Level High Level Design Design ConsiderationsConsiderations

EE14111

EECS141

Meeting the Performance TargetMeeting the Performance TargetIPC: How each instr is executed

What EX unit and how many each to buildFrequency

What type of circuit stylePower

How much energy per operationArea

Silicon real estate is expensiveThe design point is based on trade-offs of the above criteria

EE14112

EECS141

MicroMicro--Architecture ConsiderationsArchitecture ConsiderationsPipelineInterface with the Scheduler

How to handle Out-of-order ExecutionInterface with the LS unit

How many cycle for Agen-Data loop?How to suppress speculative execution when load data is invalid?

Page 3: EE141 Guest Lecture-no floorplan - University of …bwrcs.eecs.berkeley.edu/.../Lecture27-GuestLecture-6up.pdfEE141 1 EE141 1 EECS141 EE141-Spring 2006 Digital Integrated Circuits

EE141

3

EE14113

EECS141

Physical Design ConsiderationsPhysical Design ConsiderationsOperand Bypass

Bypass condition occurs when an operand of an instruction scheduled to be executed in cycle n is generated in the immediate preceding cycle (n-1).The data of this operand do not reside in the register file and need to be bypassed from one of the result buses.

Bypass ConditionAdd* R1, R2

Sub R3, R1

Mult R4, R2

* Actual execution sequence (not program order)

EE14114

EECS141

Physical Design ConsiderationsPhysical Design ConsiderationsFloorplan

Floorplan of an EX unit is very crucial piece of design decision. It impacts:

– Bus length (frequency, power)– Datapath pitch (frequency, power, area)– Bypass Scheme (area, power)

EE14115

EECS141

Circuit Design Circuit Design of an Barrel of an Barrel ShifterShifter

EE14116

EECS141

What is a Barrel Shifter?What is a Barrel Shifter?Performs a shift or rotate on the full/partial data

Example: 8 bit shifter

Input Bit PositionRot Left 1Rot Right 1

Logical Shift Left 2Arithmetic Shift Left 2

Logical Shift Right 3Arithmetic Shift Right 3

7 6 5 4 3 2 1 06 5 4 3 2 1 0 70 7 6 5 4 3 2 15 4 3 2 1 0 L L (= mult by 4)5 4 3 2 1 0 L L (Same as above)L L L 7 6 5 4 37 7 7 7 6 5 4 3L = Low (zero)

EE14117

EECS141

Barrel Shifter DesignBarrel Shifter DesignObserve: Any input bit could be passed to ALL output bit positions.

Therefore: the shifter is nothing but a giant NxN mux, where N is the width of data.The mux select is the one-hot decode of the shift amount.7 6 5 4 3 2 1 0

3 3 3 3 3 3 3 3

7 6 5 4 3 2 1 0

3 3 3 3 3 3 3 3

EE14118

EECS141

Barrel Shifter ImplementationsBarrel Shifter Implementations1. Single-stage NxN mux

Fewest gates between input and outputMost number of select signals (largest load for shift amount)

2. Multi-stage MuxMore stage = more gates between input and outputReduction in select signal is a diminishing return

– For 64 bit shifts:1 stage = 64 selects2 stages (8x8) = 16 selects (75% reduction)3 stages (4x4x4) = 12 selects (25% reduction)

3. Mux ImplementationLow swing passgateFull Swing Domino

Page 4: EE141 Guest Lecture-no floorplan - University of …bwrcs.eecs.berkeley.edu/.../Lecture27-GuestLecture-6up.pdfEE141 1 EE141 1 EECS141 EE141-Spring 2006 Digital Integrated Circuits

EE141

4

EE14119

EECS141

Barrel Shifter ArrayBarrel Shifter ArrayInputs

Inputsturn 90o

Outputs

Selects

Connection

One-Stage Mux Two-Stage MuxInputs

Inter-mediate

OutputsConnection

Selects

EE14120

EECS141

Barrel Shifter Additional ComplexityBarrel Shifter Additional Complexity1. Partial Shifts/Rotates

X86 Instruction Set supports 8(L/H)/16/32/64 bit shifts

2. Shift differs from RotateShifts fills in zeros or the sign bit => How do you build a barrel shifter that does both shift and rotate?

3. Rotate could include the Carry bitX86 supports RCL/RCR (Rotate with Carry Left/Right) => A 64-bit RCL requires a 65-bit barrel shifter!

EE14121

EECS141

““Real LifeReal Life””DesignsDesigns

EE14122

EECS141

Robustness and ReliabilityRobustness and ReliabilityRobustness: Higher Yield=Higher Profit Margin

Circuit needs to function across PVT variationChip target yield of 70% could require EX yield of 99%What works in spice (w/o PVT) may not work in real life

ReliabilityIn addition to simulation for speed, real design also checks

– Noise– IR Drop– Electro-Migration– Inductive Effects– …

EE14123

EECS141

Process VariationProcess VariationMajor Culprits: Threshold, Channel Length, Channel Width

In 45nm, Vth ~ +- 150mV, ΔL ~ +- 15%, ΔW ~ +- 10% (for min devices). (Idsat/Idoff relationships to variation non-linear. Try it in spice.)Matching devices/paths: sense-amp, analog, memory cell stability, clock treeIncreases Leakage: 80% of chip leakage caused by 20% of devices: limits usage of dynamic circuitSlows down critical pathsWorse hold-time requirements

EE14124

EECS141

Voltage/Temperature VariationsVoltage/Temperature VariationsIntroduce more timing variationsIncrease NoiseWorsen cross chip matching (e.g. Clock tree)Degrade reliability 1.072 V

1.103 V

1.224 V

1.194 V

1.134 V

1.164 V