approaching ideal noc latency with pre-con fi gured routes
DESCRIPTION
Approaching Ideal NoC Latency with Pre-Con fi gured Routes. George Michelogiannaki s Master’s Thesis Thesis Advisor: Prof. Manolis Katevenis Computer Science Department University of Crete. The Future is CMPs –> OCINs Critical. 2006. 2007.5. 2009. 2010.5. 2012. 2013.5? 2015?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/1.jpg)
Module
Module
Module
Module
Module
Module
Module
Module
Module Module Module
Module
Module
Module
R
R
R R R
R
RR R R R
R R
R
Module
R
R
R
George Michelogiannakis
Master’s ThesisThesis Advisor: Prof. Manolis Katevenis
Computer Science DepartmentUniversity of Crete
Approaching Ideal NoC Latency with Pre-Configured
Routes
![Page 2: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/2.jpg)
The Future is CMPs –> OCINs Critical
NOCS: 2
2006 2007.5 2009
2010.5 2012 2013.5? 2015?
![Page 3: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/3.jpg)
SoCs Grow in Complexity
June 2007 CSD - UOC, Heraklion, Greece 3
![Page 4: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/4.jpg)
What are NoCs? On-chip structured communication
infrastructure. i.e. the solution!
Composed of routers (switches), channels (wires), network interface logic.
NoC approach inspired by macro networks success.
Some concepts shared.
June 2007 CSD - UOC, Heraklion, Greece 4
![Page 5: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/5.jpg)
NoC Cost, Channels & Workload
Resource limitation. Cost is Si area and power. Wires plentiful. Many long, wide channels. Buffers limited.
Worrying area & power overhead numbers!
Different constraints motivate some surprising differences in design. NoCs usually developed for a specific
application set.June 2007 CSD - UOC, Heraklion, Greece 5
![Page 6: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/6.jpg)
Packet Format
Packet divided into flits (wormhole routing).
First flit address or request.
Data flits may follow. They contain no address information.
June 2007 CSD - UOC, Heraklion, Greece 6
![Page 7: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/7.jpg)
Reference NoC Topology
2D mesh. Router 5x5. Relevant issues:
Routing. Floorplan. Topology. Fault-tolerance. Virtual Channels.
June 2007 CSD - UOC, Heraklion, Greece 7
![Page 8: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/8.jpg)
June 2007 CSD - UOC, Heraklion, Greece 9
Virtual Channel Routers
![Page 9: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/9.jpg)
June 2007
Our Work - Introduction Problem: Latency NoCs impose. Motivation: Latency introduced to every
communication pair. Past work: Achieves 1 cycle/hop at 500 MHz. We extend speculation to routing decisions. Goal: Approach buffered wire latency.
Fraction of cycle/hop.
11CSD - UOC, Heraklion, Greece
![Page 10: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/10.jpg)
June 2007
Our Approach
400 ps good scenario; 1 cycle otherwise.
130 nm library
12CSD - UOC, Heraklion, Greece
![Page 11: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/11.jpg)
Preliminary Simulation ResultsDynamic multiplexer
I/O wires Typical c.
Worst c.
5 mm 514 ps 808 ps
2 mm 590 ps 893 ps
Pre-configured multiplexer
I/O wires Typical c.
Worst c.
5 mm 240 ps 368 ps
2 mm 254 ps 378 ps
June 2007 CSD - UOC, Heraklion, Greece 13
Latency Latency: 2 – 2.5 times lower
![Page 12: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/12.jpg)
June 2007
Preferred Paths
Each output has one preferred input. This pref. I/O pair is connected by a
single pre-enabled tri-state driver. Pre-enabling is crucial. Later check if flits correctly
forwarded. Thus, preferred paths are formed.
Reconfigurable at run-time. Custom routes (shapes) allowed.
14CSD - UOC, Heraklion, Greece
![Page 13: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/13.jpg)
Opportunities for (Re-)configurability
Uniform allocation of memory blocks to processors
Non-uniform allocation of memory blocks to
processors
M M
M
M
M
M
M
M
M M
M
M M
M M
MM
MM
M
M M
M
M
M
M
MM
M
M
M
M
M
P
M
M
M
M
MM
P
M
M
M
M
M
M
M
P
M
M
M
M M
M
M P
M
M
P
M
P
M
P
M
P
M
M
M
M
M
M
June 2007 15CSD - UOC, Heraklion, Greece
![Page 14: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/14.jpg)
June 2007
Switch Architecture - Output
400
ps1
cycleInput FIFOs. Selectable when non-
empty, or flit to be enqueued.
Pref. path pre-enabled tri-
states.
Routing logic tri-state.
Config. & arbitration logic. Stores pref. path
config. & arbitrates.
16CSD - UOC, Heraklion, Greece
![Page 15: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/15.jpg)
June 2007
Switch Architecture - Input Dead flits:
Incorrectly eagerly forwarded. Terminated at end of
preferred path. Switch resembles a
buffered crossbar.Decides if flit needs
to be enqueued.
17CSD - UOC, Heraklion, Greece
![Page 16: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/16.jpg)
Input Queueing Suboptimal
Dead flits enqueued in FIFOs. Impact non-preferred flits. Wasted power.
VCs wouldhelp.
June 2007 CSD - UOC, Heraklion, Greece 18
![Page 17: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/17.jpg)
June 2007
Routing Algorithm
Deterministic routing employed. Non-preferred paths follow XY
routing. We slightly modify XY routing to
handle preferred paths: Flit correctly eagerly forwarded if it
approaches the destination in any axis. Flit considered dead otherwise.
19CSD - UOC, Heraklion, Greece
![Page 18: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/18.jpg)
June 2007
Routing Characteristics Flits in preferred
paths may not follow XY routing.
Duplicate copies of a flit may be delivered.
XY routing. Pref. paths.
D
S
20CSD - UOC, Heraklion, Greece
![Page 19: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/19.jpg)
June 2007
Routing Characteristics Out-of-order
delivery is disallowed. By applying new
configuration at a “safe” time.
XY routing. Pref. paths.
D
S
21CSD - UOC, Heraklion, Greece
![Page 20: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/20.jpg)
Adaptive Routing Many benefits to offer.
Extra challenging: dead flit classification. An output may switch to “adaptive
mode”. According to application-dictated factors.
Then routes according to the adaptive algorithm. Not previously-decided packets.
Flits routed this way are not dead.June 2007 CSD - UOC, Heraklion, Greece 23
![Page 21: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/21.jpg)
Deadlock-Freedom
XY routing deadlock-free. What we added to it: Preferred paths.
Provide constraints to prevent circles. Networks remains functional in any case.
Adaptive routing. Depends on exact algorithm.
June 2007 CSD - UOC, Heraklion, Greece 24
![Page 22: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/22.jpg)
RAM Blocks 4096 x 32 chosen
to balance latency & area efficiency: Larger blocks were
disproportionally slower.
Smaller blocks imposed greater area overhead.
SP area efficient.
TP 75% and DP 100% larger.
June 2007 CSD - UOC, Heraklion, Greece 25
![Page 23: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/23.jpg)
Simple 2D Mesh Topology 5x5 switches. One for each PE. Empty space
between PEs. Equally far away
from A/D pins. Does not take
advantage of environment.June 2007 CSD - UOC, Heraklion, Greece 26
![Page 24: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/24.jpg)
2D Mesh with 2 Subnetworks
Divide the network. Request, reply? X, Y axis?
Switches in front of pins.
Need to be interconnected.
Still 1 switch/PE.June 2007 CSD - UOC, Heraklion, Greece 27
![Page 25: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/25.jpg)
Rotated RAM Blocks Switches every
two X axes. Half the number!
Switches slightly larger.
Our final topology an optimization of this.
June 2007 CSD - UOC, Heraklion, Greece 28
![Page 26: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/26.jpg)
June 2007
NoC Topology – Bar Floorplan
Each switch is 6x6 and serves 4 PEs.
29CSD - UOC, Heraklion, Greece
![Page 27: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/27.jpg)
June 2007
Bar Floorplan
Would be 8x12: Vertical links drive
address inputs. 2 PE data ports
served by 1 switch port.
30CSD - UOC, Heraklion, Greece
![Page 28: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/28.jpg)
June 2007
Cross Floorplan
31CSD - UOC, Heraklion, Greece
![Page 29: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/29.jpg)
June 2007
Switch P&R Results 130 nm
implementation library. Typical case.
Pref. path latency: 300-420 ps. 450-500 ps (incl. 1mm).
1 cycle/node otherwise.
Past work: 1 cycle/node at 500 MHz.
Clock frequency 667 MHz
Flit width 39
FIFO lines 2
Number of FIFOs 30
Bar area overhead
13%
Cross area overhead
18%
Number of cells 15 K
Number of gates 45 K
Total dynamic power
80 mW32CSD - UOC, Heraklion, Greece
![Page 30: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/30.jpg)
June 2007
Future Work
Synchronization issues – A flit may arrive at any time. Impose preferred path constraints. Implement switch asynchronously.
Evaluation in complete system. Implement fault-tolerance.
33CSD - UOC, Heraklion, Greece
![Page 31: Approaching Ideal NoC Latency with Pre-Con fi gured Routes](https://reader036.vdocuments.net/reader036/viewer/2022062305/56814881550346895db58e2c/html5/thumbnails/31.jpg)
June 2007
Conclusion We approach ideal latency.
By pre-enabled tri-state paths. Our NoC is a generalized “mad-
postman” [C. R. Jesshope et al, 1989].
Our NoC is easily generalized – topology may need to be changed.
Past NoC research can be applied for further optimizations.
34CSD - UOC, Heraklion, Greece