computer architecture pipelined processor
DESCRIPTION
Computer Architecture Pipelined Processor. Ola Flygt Växjö University http://w3.msi.vxu.se/users/ofl/ [email protected] +46 470 70 86 49. Outline. 5.1 Basic concept 5.2 Design space of pipelines 5.3 Overview of pipelined instruction processing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/1.jpg)
Computer Computer ArchitectureArchitecturePipelined Processor
Ola FlygtVäxjö University
http://w3.msi.vxu.se/users/ofl/[email protected]
+46 470 70 86 49
![Page 2: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/2.jpg)
Outline
5.1 Basic concept5.2 Design space of pipelines5.3 Overview of pipelined instruction
processing5.4 Pipelined execution of integer and
Boolean instructions5.5 Pipelined processing of loads and
stores
CH01
![Page 3: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/3.jpg)
5.1.1 Principle of pipelining
![Page 4: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/4.jpg)
Principle of pipeline.
![Page 5: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/5.jpg)
Processing of a sequence of instructions using a basic
pipeline
![Page 6: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/6.jpg)
Pipelined and unpipelined processing
![Page 7: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/7.jpg)
5.1.2 General structure of pipelines
![Page 8: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/8.jpg)
Structure and pipelined operation of the Fx unit of the
IBM Power1
![Page 9: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/9.jpg)
Pipeline Performance Measures
Cycle time: tc
is determined by the worst-case processing time of the longest stage
Repetition Rate: R the shortest possible time interval between
subsequent independent instructions in the pipeline
Performance potential of a pipeline: P P = 1/(R * tc)
PowerPC603 FP double Mul. e.g. R = 2, tc = 12 nsec
P = 1/(R * tc) = 1/(2*12nec) = 44.6 MFLOPS
![Page 10: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/10.jpg)
Performance: RAW-dependent
Latency: specifies the amount of time that the result of a particular
instruction takes to become available in the pipeline for a subsequent dependent instruction.
Define-use latency (1 to 100 cycles) mul r1, r2, r3 add r5, r1, r4
Load-use latency (1 to 3 cycles, sometimes much more) load r1, x add r5, r1, r2
Stalled: the immediately following RAW-dependent instruction has to be stalled in the pipeline for n-1 cycle
![Page 11: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/11.jpg)
Improve Performance
Multiple-operation instructionsHP PA 7100
FMPYADD RM1, RM2, RM3, RA1, RA2RM3RM1*RM2 RA2RA1+RA2
PowerPCFMA for performing (A*C) + B
![Page 12: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/12.jpg)
5.1.4 Application scenarios of pipelines
![Page 13: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/13.jpg)
5.2 Design space of pipelines
key aspect of the design space of pipeline
![Page 14: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/14.jpg)
5.2.2 Basic layout of a pipeline
Design space of the overall stage layout
![Page 15: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/15.jpg)
Increasing parellelism by raising the number of pipeline stages
![Page 16: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/16.jpg)
Eight-stage pipeline
![Page 17: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/17.jpg)
Problems arise for more stages
data and control dependencies occur more frequentlystalled and wait for datareload pipe in case of branch
subtask becomes less balances (in execution time)cycle time is determined by the worst-case
processing time of the longest stage In most case
5-10 stages
![Page 18: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/18.jpg)
Pipeline example: DEC 21064
![Page 19: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/19.jpg)
Layout of the stage sequence
![Page 20: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/20.jpg)
Bypasses (data forwarding in RAW)
Unless special arrangements are made, the results of the operation instruction is written into the register file, or into the memory, and then it is fetched from there as a source operand.
![Page 21: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/21.jpg)
Principle of bypassing in define-use and load-use conflicts
![Page 22: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/22.jpg)
Possibilities for the timing of pipeline operation
![Page 23: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/23.jpg)
5.3 Overview: pipelined instruction processing
![Page 24: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/24.jpg)
Declaration of Logical Pipeline: e.g. Powerpc 601
![Page 25: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/25.jpg)
Detailed Specification of each of the pipelines
![Page 26: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/26.jpg)
Implementation of instruction pipelines (v.s. logical)
![Page 27: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/27.jpg)
Layout of physical pipelines
![Page 28: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/28.jpg)
Multiplicity of pipelines
![Page 29: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/29.jpg)
Preserveing sequential consistency
![Page 30: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/30.jpg)
Preserving sequential consistency, implementation
e.g.
![Page 31: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/31.jpg)
Preserving sequential consistency, e.g.
![Page 32: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/32.jpg)
Case studies: Pentium
Logic layout of Pentium’s pipelines
![Page 33: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/33.jpg)
Case studies: PowerPC 604
![Page 34: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/34.jpg)
5.4 (Specific) Pipelines execution:Integer and Boolean instructions
(FX)
![Page 35: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/35.jpg)
RISC pipelines 4 or 5 stages
C
![Page 36: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/36.jpg)
Traditional FX pipeline of RISC processors
![Page 37: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/37.jpg)
Logical to Physical: e.g. PowerPC601 using a single
universal FX unit
![Page 38: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/38.jpg)
Layout of the 5 stages FX and L/S pipelines in the MIPS R4200
![Page 39: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/39.jpg)
CISC pipeline 6 or 5 stages
![Page 40: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/40.jpg)
Traditional CISC pipeline: The execution of register-memory
instruction
![Page 41: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/41.jpg)
CISC pipeline: Execution of register-register and load/store
instructions
![Page 42: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/42.jpg)
CISC pipeline 5 stage: recycling E/C stage
![Page 43: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/43.jpg)
Implementation of FX units: how many
![Page 44: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/44.jpg)
Trend in increasing the performance
![Page 45: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/45.jpg)
5.5 (Specific) Pipelines execution: loads and
stores
![Page 46: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/46.jpg)
5.5.3 Load-use delay: RICS pipelines
![Page 47: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/47.jpg)
Load-use delay: MIPS
![Page 48: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/48.jpg)
Load-use delay: CISC
![Page 49: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/49.jpg)
Handling Load-use delay
Basic approaches to cope with a load-use delay
![Page 50: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/50.jpg)
Remove Load-use delay
![Page 51: Computer Architecture Pipelined Processor](https://reader030.vdocuments.net/reader030/viewer/2022033100/56814834550346895db5548b/html5/thumbnails/51.jpg)
Remove Load-use delay: bringing forward the calculation of virtual address: for slow
cache