out-of-order superscalar cpu
TRANSCRIPT
![Page 1: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/1.jpg)
OutOut--ofof--Order Order Superscalar CPUSuperscalar CPU
May 6th, 2005Cliff Frey and Vicky Liu
6.884 Final Project PresentationMay 6th, 2005
![Page 2: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/2.jpg)
6.884 Final Project PresentationMay 6th, 2005
The Basis of Our Design
- Allows out-of-order execution
- Instructions wait in Reservation Stations
- Execute instructions once operands have been computed
- Can reorder WAW and WAR
Tomasulo’s Algorithm
![Page 3: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/3.jpg)
6.884 Final Project PresentationMay 6th, 2005
The Basis of Our Design
- In Decode stage, each instruction result is assigned a Tag
- Each register maps to a Value or to a Tag
- When a result is computed, result and tag are broadcast
- All instances of the Tag are updated with the computed value
- Updates RegFile and Reservation Stations
Tomasulo’s Algorithm
![Page 4: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/4.jpg)
6.884 Final Project PresentationMay 6th, 2005
High Level Design
Fetch UnitDecodeRenaming Register File
The major components
Reservation StationsFunctional UnitsCommon Data Bus
![Page 5: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/5.jpg)
6.884 Final Project PresentationMay 6th, 2005
High Level Design
Fetch Decode Execute Write Back(CDB)
![Page 6: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/6.jpg)
6.884 Final Project PresentationMay 6th, 2005
BlueSpec Rule & Module Design
![Page 7: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/7.jpg)
6.884 Final Project PresentationMay 6th, 2005
- Unresolved branches stall decode stage
- Memory operations need to be in order
- Back to back dependent adds take 2 cycles
High Level Design Issues
![Page 8: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/8.jpg)
6.884 Final Project PresentationMay 6th, 2005
Design Exploration: Supporting Precise Exceptions
- Register File contents can be lost
- external changes need to ordered
Short-comings of Tomasulo’s algorithm
![Page 9: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/9.jpg)
6.884 Final Project PresentationMay 6th, 2005
Design Exploration: Supporting Precise Exceptions
… instructions before the excepting instruction, execute normally
… instructions after and including the excepting instruction do not change any programmer visible state of the processor
A Processor Supports Precise Exceptions If…
![Page 10: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/10.jpg)
6.884 Final Project PresentationMay 6th, 2005
Design Exploration: Supporting Precise Exceptions
… instructions before the excepting instruction, execute normally
… instructions after and including the excepting instruction do not change any programmer visible state of the processor
A Processor Supports Precise Exceptions If…
- Register File contents can be lost
- external changes need to ordered
Short-comings of Tomasulo’s algorithm
![Page 11: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/11.jpg)
6.884 Final Project PresentationMay 6th, 2005
Original High Level Design
![Page 12: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/12.jpg)
6.884 Final Project PresentationMay 6th, 2005
Updated High Level Design
Our Solution- Minimal changes to original design - Reorder Buffer (ROB) and Commit stage- Architectural Register File- External changes made at commit time
![Page 13: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/13.jpg)
6.884 Final Project PresentationMay 6th, 2005
Updated High Level Design
![Page 14: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/14.jpg)
6.884 Final Project PresentationMay 6th, 2005
Handling Exceptions
ROB
Undo
Set PC to interrupt vector (0x1100)Exception PC stored in coprocessor register EPCCorrect speculative results in Rename Register FileClear cached information in Functional Units
![Page 15: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/15.jpg)
6.884 Final Project PresentationMay 6th, 2005
Other Features to Get High Performance
- Speculative fetch
- external changes need to ordered
- memory unit can handle many requests at a time
Implemented Features
Unimplemented Features- Branch prediction and target buffering
- Speculative execution
![Page 16: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/16.jpg)
6.884 Final Project PresentationMay 6th, 2005
A Closer Look at the Load/Store Unit
result.get()
Mem
![Page 17: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/17.jpg)
6.884 Final Project PresentationMay 6th, 2005
BlueSpec Stories: Conflicting Rules
![Page 18: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/18.jpg)
BlueSpec Stories: The Fix
Possible Solutions- One rule for every possible data path
6.884 Final Project PresentationMay 6th, 2005
- Use config regs everywhere- Be slow and blame BlueSpec =P
![Page 19: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/19.jpg)
BlueSpec Stories: The Fix
Possible Solutions
Our Solutions- Homemade completion buffer
- Make methods write to RWires
- Write “magic” rule to handle all combination of cases
6.884 Final Project PresentationMay 6th, 2005
- One rule for every possible data path- Use config regs everywhere- Be slow and blame BlueSpec =P
![Page 20: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/20.jpg)
Bypassing from writeback to decode
6.884 Final Project PresentationMay 6th, 2005
![Page 21: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/21.jpg)
6.884 Final Project PresentationMay 6th, 2005
An Excerpt from our Trace Output
Back to back, nondependent adds
Fetch Decode Execute Writeback CommitF | [ ] - - |F |00001000=0 ADD [ ] - - |F |00001004=1 ADD [ 0] - - |F |00001008=2 ADD [ 1] A-0 -00000001 |
|0000100c=3 ADD [ 2] A-1 -00000001 | 0| [ 3] A-2 -00000002 | 1| [ ] A-3 -00000002 | 2| [ ] - - | 3
![Page 22: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/22.jpg)
6.884 Final Project PresentationMay 6th, 2005
An Excerpt from our Trace Output
Decode add mem BR WB commit001398 LW r1, r10 [ |M | ] |00139c ADDI r2, r2, -4 [ |M LW | ] |0013a0 SLT r1, r11, r1 [ADDI |M | ] |0013a4 BEQZ r1, 0x13d8 [ |M | ]ADDI |0013a8 SUBI r3, r12, -1 [ |M LW| ] |
[SUBI |M | ]LW |[SLT |M | ]SUBI |LW[ |M | ]SLT |ADDI[ |M |BEQZ] |SLT[ |M | ]BEQZ |[ |M | ] |BEQZ *taken![ |M | ] |SUBI
Instruction stream with reordering
![Page 23: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/23.jpg)
6.884 Final Project PresentationMay 6th, 2005
Synthesis Results
Clock speed = 4ns
Area = .38 mm2
![Page 24: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/24.jpg)
6.884 Final Project PresentationMay 6th, 2005
Design Choices and Performance
Configurable ParametersResizing reservation stations
Number of slots in ROB and the Fetch Unit buffer
Different functional unit setup
Easily support multicycle functional units
![Page 25: Out-of-Order Superscalar CPU](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fb71552e268c58cd5e38fe/html5/thumbnails/25.jpg)
6.884 Final Project PresentationMay 6th, 2005
Design Choices and Performance
Resizing reservation stations
Number of slots in ROB and the Fetch Unit buffer
Different functional unit setup
Easily support multicycle functional units
Configurable Parameters
Branches and stores really hurt performance
Achieved IPC ≈ .5 on vector-add and quicksort
Performance