itec 352 lecture 21 pipelining. review questions? homework 3 on wed. jvm vs assembly similarities /...

21
ITEC 352 Lecture 21 Pipelining

Upload: antony-poole

Post on 23-Dec-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

ITEC 352

Lecture 21Pipelining

Pipelining

Review

• Questions?• Homework 3 on Wed.• JVM vs Assembly Similarities /

Differences

Pipelining

Outline

• Pipelining–Motivation– Fits with execution model– Examples

Pipelining

CISC Vs. RISC

• A long time back when memory costed – The focus of most computer architects

• (E.g., Intel and Motorola) • Support fewer instructions that performed more complicated computations.• E.g., addld [a], [b], [c] would be a complex instruction that replaced: ld [a], %r1ld [b], %r2addcc %r1, %r2, %r3st %r3, [c]

Why? Complex instructions Shorter programs Smaller memory needed.

However, memory became cheaper. So architects started thinking about techniques to use more memory that would speed up computations.

Pipelining

CISC Vs. RISC (2)

• Solution: RISC (Reduced Instruction Set Computer)– E.g., ARC instructions. – The instructions are similar in complexity, i.e., they take more-or-less similar number of

clock cycles to execute… • So architects thought why not have every RISC instruction execute in one CPU cycle

using a technique called pipelining. • To use pipeling, instructions must be of similar complexity, ie.., every instruction

must require more or less the same number of CPU cycles to execute.• However, what is the complexity of an instruction?

• Trivia: – Apple’s original processor (PowerPC) based on Motorola chips: RISC

processor– Intel: stuck to CISC – even today!

Slide © Prem Uppuluri, Derived from Murdocca and Heuring

Pipelining

Pipelining and RISC

• The complexity of an instruction can be based on the number of steps it takes to execute the fetch-execute cycle.

• Recall the Fetch-Execute cycle (every instruction goes through a fetch-execute cycle)– Fetch instruction from memory to register.– Decode the opcode– Fetch operands from memory to register– Execute operation– Store result back into memory.

• We said: “this is how every instruction is executed by the control unit”.– Ahem…this is not entirely true! It is almost true: each class of instruction has

slightly different stages in the fetch-execute cycle.

Slide © Prem Uppuluri, Derived from Murdocca and Huering

Pipelining

Complete ARC Instruction and PSR Formats

Pipelining

Arithmetic Instructions

• Arithmetic instructions in RISC have these following 5 stages:– Fetch the instruction from memory– Decode the instruction– Fetch the operands from the register file– Apply the operands to the ALU– Write the result back to the register file.

• E.g., take addcc %r1, %r2, r3– Trace the 5 stages on this instruction as exercise.

© Prem Uppuluri

Pipelining

RISC branch instruction

• Branch instructions have the following stages– Fetch the instruction from memory– Decode the instruction– Fetch the components of the address from the instruction or register

file– Apply the components of the address to the ALU– Copy the resulting effective address into the PC (program counter).

• Exercise; Trace the stages for the instruction: be 2048

© Prem Uppuluri

Pipelining

Load/Store

• Load and store instructions have the following stages– Fetch the instruction from the memory– Decode the instruction– Fetch the components of the address from the instruction or register

file– Apply the components to the ALU– Apply the resulting effective address to memory along with a read or

write signal. If write the data item to be written must be retrieved from the register file.

• Exercise: Trace the ld %r1, %r2, %r3 instruction through these stages.

© Prem Uppuluri

Pipelining

Summarizing…

• The fetch-execute stages differ across the different instructions….– But they have similarities. All the instructions have the

following stages: • Instruction fetch• Decode• Operand Fetch• ALU operation• Result writeback (to memory, from memory or to register

depending on the type of instruction).

– So computer architects decided to break the control unit into 5 parts – each part for one stage of the fetch execute cycle.

© Prem Uppuluri

Pipelining

RISC control unit

• RISC processors have 5 hardware units: – Each corresponding to one stage of the fetch-execute

cycle.

• E.g., the “Fetch instruction” hardware part of the control unit, fetches instruction while “Fetch operand” hardware fetches the operands.

• These hardware units can execute in parallel. • Each hardware unit takes 1 CPU tick to execute.• How does this help?

© Prem Uppuluri

Pipelining

RISC control unit

FetchInstr.

Decodeopcode

Fetchoperands

ExecuteInstr.

Storeresult

A pipeline of five hardware units (together form the control unit). An instruction moves from left to right in the pipeline. Whenever the clock ticks, each unit passes the instruction to the next.

© Prem Uppuluri. Derived from Doug Comer

Pipelining

RISC pipelining

• E.g., life of an instruction:

CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5 1 inst1 2 inst1 3 inst1 4 inst1 5 inst1In instruction moves through the pipeline in 5 clock ticks. So in effect, it takes 5 CPU

cycles for an instruction to execute. However, consider this: “all the units can work in parallel”. Can we use this fact to speed up the instruction, i.e., can we make the instruction execute faster than 5 cycles?

© Prem Uppuluri, Derived from Doug Comer

Pipelining

RISC pipelining: speeding up instructions

CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5 1 inst1 2 inst2 inst1 3 inst1 4 inst1 5 inst1

© Prem Uppuluri, Derived from Doug Comer

When Unit2 in clock cycle 2 is executing inst1, Unit1 is idle… why not start using this unit to execute the next instruction: inst2

Pipelining

Pipeline filling

• E.g., CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5 1 inst1 2 inst2 inst1 3 inst3 inst2 inst14 inst4 inst3 inst2 inst15 inst5 inst4 inst3 inst2 inst16 inst6 inst5 inst4 inst3 inst27 inst7 inst6 inst5 inst4 inst38 inst8 inst7 inst6 inst5 inst4

After the pipeline is filled up (in CPU cycle 5, after every CPU cycle, one instruction is getting executed. This is called Instruction Level Pipelining (ILP).

© Prem Uppuluri, Derived from Doug Comer

Pipelining

Class Discussion

• Implement the pipeline for the following:

srl %r3, %r5addcc %r1, 10, %r1ld %r2, %r4subcc %r3, %r1, %r4be label

© Prem Uppuluri, Derived from Doug Comer

Pipelining

Commercial processors

• Intel Pentium Pro: one of the first to provide speculative executions.– 12 pipeline stages.

• Intel Pentium 4: went from 10 to 20 pipeline stages– Why did Intel do this?

• Increase in pipelining less work per clock cycle the clock cycle time can be reduced clock cycle speeds are increased.

• Hence, intel could now support 1.2 Ghz+ speeds.

Pipelining

Commercial Processors

• Video by Apple (Apple promotional material): http://www.youtube.com/watch?v=PKF9GOE2q38

• Intel Pentium Pro: one of the first to provide speculative executions.– 12 pipeline stages.

• NEXT: Things that effect a pipelines performance.– Pipeline “bubbles”.

Pipelining

Discussion

• Pipelining is not always efficient. Sometimes an instruction depends on its previous instruction’s results.– Implement the pipeline for the following:

srl %r3, %r5addcc %r1, 10, %r1ld %r1, %r2subcc %r2, %r4, %r4

• E.g., CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5

© Prem Uppuluri, Derived from Doug Comer

Pipelining

Review

• Pipelining