![Page 1: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/1.jpg)
Approaching Ideal NoC Latency with Pre-Configured
Routes
George Michelogiannakis, Dionisios Pnevmatikatos and Manolis Katevenis
Institute of Computer Science (ICS)Foundation for Research & Technology - Hellas (FORTH)
P.O.Box 1385, Heraklion, Crete, GR-711-10 GREECEEmail: {mihelog,pnevmati,kateveni}@ics.forth.gr
![Page 2: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/2.jpg)
May 2007 ICS-FORTH, Greece 2
Introduction Problem: Latency NoCs impose. Motivation: Latency introduced to every
communication pair. Past work: Achieves 1 cycle/hop at 500 MHz. We extend speculation to routing decisions. Goal: Approach buffered wire latency.
Fraction of cycle/hop.
![Page 3: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/3.jpg)
May 2007 ICS-FORTH, Greece 3
Our Approach
400 ps good scenario; 1 cycle otherwise.
130 nm library
![Page 4: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/4.jpg)
May 2007 ICS-FORTH, Greece 4
Preferred Paths Each output has one preferred input. This pref. I/O pair is connected by a
single pre-enabled tri-state driver. Pre-enabling is crucial:
200 ps pre-enabled mux; 500 ps otherwise.
Later check if flits correctly forwarded. Thus, preferred paths are formed.
Reconfigurable at run-time. Custom routes (shapes) allowed.
![Page 5: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/5.jpg)
May 2007 ICS-FORTH, Greece 5
Switch Architecture - Output
400
ps1
cycleInput FIFOs. Selectable when non-
empty, or flit to be enqueued.
Pref. path pre-enabled tri-
states.
Routing logic tri-state.
Config. & arbitration logic. Stores pref. path
config. & arbitrates.
![Page 6: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/6.jpg)
May 2007 ICS-FORTH, Greece 6
Switch Architecture - Input Dead flits:
Incorrectly eagerly forwarded. Terminated at end of
preferred path. Switch resembles a
buffered crossbar.Decides if flit needs
to be enqueued.
![Page 7: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/7.jpg)
May 2007 ICS-FORTH, Greece 7
Routing Algorithm
Deterministic routing employed. Non-preferred paths follow XY
routing. We slightly modify XY routing to
handle preferred paths: Flit correctly eagerly forwarded if it
approaches the destination in any axis. Flit considered dead otherwise.
![Page 8: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/8.jpg)
May 2007 ICS-FORTH, Greece 8
Routing Characteristics Flits in preferred
paths may not follow XY routing.
Duplicate copies of a flit may be delivered.
XY routing. Pref. paths.
D
S
![Page 9: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/9.jpg)
May 2007 ICS-FORTH, Greece 9
NoC Topology – Bar Floorplan
Application: Tiled CPU and RAM blocks.
Each switch is 6x6 and serves 4 PEs.
![Page 10: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/10.jpg)
May 2007 ICS-FORTH, Greece 10
Bar Floorplan
Would be 8x12: Vertical links drive
address inputs. 2 PE data ports
served by 1 switch port.
![Page 11: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/11.jpg)
May 2007 ICS-FORTH, Greece 11
Cross Floorplan
![Page 12: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/12.jpg)
May 2007 ICS-FORTH, Greece 12
Layout Results 130 nm
implementation library. Typical case.
Pref. path latency: 300-420 ps. 450-500 ps (incl. 1mm).
1 cycle/node otherwise.
Past work: 1 cycle/node at 500 MHz.
Clock frequency 667 MHz
Flit width 39
FIFO lines 2
Number of FIFOs 30
Bar area overhead
13%
Cross area overhead
18%
Number of cells 15 K
Number of gates 45 K
Total dynamic power
80 mW
![Page 13: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/13.jpg)
May 2007 ICS-FORTH, Greece 13
Advanced Issues
Deadlock & livelock freedom. Constraints to prevent circle. Keep NoC functional in any case.
Out-of-order delivery of flits in the same packet. Apply reconfiguration at a “safe”
time. Adaptive routing.
![Page 14: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/14.jpg)
May 2007 ICS-FORTH, Greece 14
Future Work
Synchronization issues – A flit may arrive at any time. Impose preferred path constraints. Implement switch asynchronously.
Evaluation in complete system. Implement fault-tolerance.
![Page 15: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814881550346895db58da3/html5/thumbnails/15.jpg)
May 2007 ICS-FORTH, Greece 15
Conclusion We approach ideal latency.
By pre-enabled tri-state paths. Our NoC is a generalized “mad-
postman” [C. R. Jesshope et al, 1989].
Our NoC is easily generalized – topology may need to be changed.
Past NoC research can be applied for further optimizations.