Download - FPGA Routing
![Page 1: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/1.jpg)
FPGA Routing
Dr. Philip BriskDepartment of Computer Science and Engineering
University of California, Riverside
CS 223
![Page 2: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/2.jpg)
2
Routing Resource Graph (RRG)
www.eecg.toronto.edu/~aling/ece1718/project/fang/route_rr_graph.png
2-LUT
in1 in2
out
wire1
wire2
wire3 wire4source
sink
out
wire4
wire2
in2in1
wire1
wire3
![Page 3: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/3.jpg)
3
FPGA Routing• Disjoint-path problem (on the RRG); NP-complete
• Input:– Graph G(V, E)– Set of sources S = {s1, s2, …, sm}
– Set of sets of sinks T = {T1, T2, …, Tn}, Ti = {ti1, ti
2, …, tik}
• Solution
– Finds paths from each source si to all sinks in Ti
– Paths emanating from different sinks must be disjoint (cannot shared any vertices or edges)
• Objective(s)– Minimize delay, wirelength, etc.
![Page 4: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/4.jpg)
4
Disjoint and Non-disjoint Paths
s1
s2
t21
t11
![Page 5: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/5.jpg)
5
Disjoint and Non-disjoint Paths
This route is illegal!
s1
s2
t21
t11
Two nets are routed on the same FPGA segment(remember, RRG vertices represent wires and CLB I/Os)
![Page 6: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/6.jpg)
6
Disjoint and Non-disjoint Paths
This route is legal!
s1
s2
t21
t11
![Page 7: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/7.jpg)
7
LUT Input Equivalence (1/3)
s1
s2
2-LUTin1
in2
s1
s2
2-LUTin1
in2
LUT inputs are interchangeable
![Page 8: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/8.jpg)
8
LUT Input Equivalence (2/3)
s1
s2
2-LUTin1
in2
s1
s2
2-LUTin1
in2
s1
s2 t21 = in2
t11 = in1 t2
1 = in1
t11 = in2
s1
s2
Overly restrictive disjoint-path problem formulation
![Page 9: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/9.jpg)
9
LUT Input Equivalence (3/3)
s1
s2
2-LUTin1
in2
s1
s2
2-LUTin1
in2
Represent all inputs of a LUT as one RRG sink, t
ts1
s2
![Page 10: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/10.jpg)
PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs
Larry McMurchie and Carl EbelingInternational Symposium on FPGAs, 1995
![Page 11: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/11.jpg)
Architecture and CAD for Deep Submicron FPGAs
(Section 2.2.3, pp. 25-30)
Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999
![Page 12: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/12.jpg)
Basic Plan
• Route nets one-by-one– Each net is routed by a priority-driven breadth-first
search– Multiple nets may use the same nodes and edges– Node/edge costs depend on current usage and past
usage history• Repeat for a fixed number of iterations (say 200)– Success: A disjoint routing solution for for all nets– Failure: No disjoint solution found after a fixed number
of iterations
![Page 13: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/13.jpg)
Routability Cost Function (per Node)
• C(n) = p(n) × (b(n) + h(n))
– b(n): base cost of routing through node n• Typically the intrinsic delay of the circuit element corresponding to
the node
– h(n): history cost of routing through node n, based on previous router iterations
– p(n): depends on the number of signals presently routed through node n in the current iteration
![Page 14: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/14.jpg)
Timing-Routability Cost Function
![Page 15: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/15.jpg)
First-Order Congestion
s1
s3
s2
At1
t3
t2
B
C
1
1
1
2
3
4
3
1
1
1
2
3
4
3
![Page 16: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/16.jpg)
First-Order Congestion
s1
s3
s2
At1
t3
t2
B
C
1
1
1
2
3
4
3
1
1
1
2
3
4
3
All paths route through B
• Increase the penalty cost of B per iteration due to sharing
• In a future iteration, it will be cheaper to route signal 1 through A, rather than B
• In a future iteration, it will be cheaper to route signal 2 through C, rather than B
• Requires gradual increase in cost of sharing nodes
![Page 17: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/17.jpg)
Second-order Congestion
s1
s3
s2
At1
t3
t2
B
C
1
2
2
1
1
2
1
2
1
1
Routing Order: 1, 2, 3
![Page 18: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/18.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
1
2
2
1
1
2
1
2
1
1
s1
s3
s2
t1
t3
t2
![Page 19: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/19.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
1
2
2
1
1
2
1
2
1
1
s1
s3
s2
t1
t3
t2
![Page 20: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/20.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
1
2
2
1
1
2
1
2
1
1
Two paths route through C
• Increasing the history cost through C will push signal 2 onto an alternative path through B
s1
s3
s2
t1
t3
t2
![Page 21: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/21.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
1
2
2
3
3
2
1
2
3
3
Two paths route through C
• Increasing the history cost through C will push signal 2 onto an alternative path through B
s1
s3
s2
t1
t3
t2
![Page 22: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/22.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
1
2
2
3
3
2
1
2
3
3
Two paths route through B
• Increasing the history cost through B will push signal 1 onto an alternative path through A
• Increasing the history cost through B will push signal 2 back onto the path through C
s1
s3
s2
t1
t3
t2
![Page 23: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/23.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
3
4
2
3
3
4
3
2
3
3
Two paths route through B
• Increasing the history cost through B will push signal 1 onto an alternative path through A
• Increasing the history cost through B will push signal 2 back onto the path through C
s1
s3
s2
t1
t3
t2
![Page 24: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/24.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
3
4
2
3
3
4
3
2
3
3
Two paths route through C
• Increasing the history cost through C will push signal 2 back through B
s1
s3
s2
t1
t3
t2
![Page 25: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/25.jpg)
Second-order Congestion
Routing Order: 1, 2, 3A
B
C
3
4
2
5
5
4
3
2
5
5
Two paths route through C
• Increasing the history cost through C will push signal 2 back through B
s1
s3
s2
t1
t3
t2
![Page 26: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/26.jpg)
Pseudocode
Priority driven BFS• Computed from the source
to every sink of net(i)
Connect the newly found path to the current sink to the partly-computed routing tree RT(i) for net I
![Page 27: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/27.jpg)
Timing-Routability Cost Function
Represents the estimated criticality of the path from source I to sink j on net(i)• net(i) may fan out to multiple sinks with varying criticality, depending on placement
Delay of critical path
![Page 28: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/28.jpg)
Negotiated A* Routing for FPGAs
Russell TessierFifth Canadian Workshop on Field Programmable
Devices, 1998
![Page 29: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/29.jpg)
Basic Idea
![Page 30: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/30.jpg)
A* Cost Function
• Cost at a node ni
fi = gi + di
gi = fi-1 + ci
Cost of routing from the source to ni
Estimated cost of routing from ni to the sink
Total cost of previous path
Cost of the next candidate node
![Page 31: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/31.jpg)
A* vs. BFS
A*: fi = gi + di = (fi-1 + ci) + di
BFS: fi = gi
Hybrid: fi = (1 - α)gi + αdi
= (1 - α)(fi-1 + ci) + αdi
![Page 32: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/32.jpg)
Good Ideas, But…
• Algorithm and implementation assumed disjoint switch block– i.e., disjoint global routing domains
• Key issue:– Which routing domains (reachable from the
source) are explored in what order?
![Page 33: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/33.jpg)
A Fast Routability-driven Router for FPGAs
Jordan S. Swartz, Vaughn Betz, Jonathan RoseInternational Symposium on FPGAs, 1998
![Page 34: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/34.jpg)
Architecture and CAD for Deep Submicron FPGAs
(Section 4.3, pp. 76-80)
Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999
![Page 35: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/35.jpg)
Speed Enhancements
• Directed depth-first search– Similar to A*, but arguably more aggressive– Key Issues:
• Cost function• Sink (target) ordering for multi-terminal nets
• Route classification– Low-stress– Difficult– Impossible
![Page 36: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/36.jpg)
Cost Function
Cost(n) = C(n) + Costprev + α(ΔD)– Costprev: cost of track segments used to reach the
current segment– C(n): (base) cost of using the track segment
• Increases more rapidly than the original PathFinder algorithm– (ΔD): estimated cost of routing from the current track
segment to the destination • Based on Manhattan (XY) distance)
– α: Direction factor• BFS: α = 0• Directed search: α > 0
![Page 37: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/37.jpg)
Updated Node (Base) Cost Function
• C(n) = p(n) × (b(n) + h(n))– b(n): base cost – h(n): history cost– p(n): usage cost
• C(n) = p(n) × b(n) × h(n) + Bendcost(n, m)– Multiplying b(n) and h(n) eliminates normalization– Penalize global routes that take a lot of turns• These routes are less likely to use long wires
![Page 38: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/38.jpg)
Penalty Cost Function
• Not specified in the original PathFinder paper
p(n) = 1 + max(0, [occupancy(n) – capacity(n)] × pfac)• occupancy(n): number of nets using resource n• capacity(n): maximum number of nets that can use
resource n (typically 1)• pfac: 0.5 for the first iteration; increase by 1.5x to 2x each
subsequent iteration
![Page 39: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/39.jpg)
History Cost Function
• Not specified in the original PathFinder paper
• occupancy(n): number of nets using resource n• capacity(n): maximum number of nets that can use
resource n (typically 1)• hfac: constant; any value between 0.2 and 1 works well
![Page 40: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/40.jpg)
Target Selection
Closest Sink First• Uses fewer track segments
Furthest Sink First• Uses more track segments
![Page 41: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/41.jpg)
Net Order
• Route nets in order of decreasing fan-out
• High fanout nets tend to span the whole FPGA– Easier to route when there is less congestion do to
other nets routed earlier
• Low fanout nets tend to be more localized– Relatively easy to route, even in the presence of
some congestion
![Page 42: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/42.jpg)
Binning
• Only the portions of the net close to the target sink should be expanded– E.g., only expand within
Bin 4 in this example
• Shown to be effective for sinks with more than 50 targets
![Page 43: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/43.jpg)
Bounding Box
• Define a bounding box around source and sinks• Restrict the route for each net to no more than 3
channels outside of the bounding box
![Page 44: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/44.jpg)
Routing High-fanout Nets
![Page 45: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/45.jpg)
Difficulty Prediction
• Since you can’t know Wmin first without routing, use an estimate based on a wirelength prediction model
![Page 46: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/46.jpg)
Routability Results
![Page 47: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/47.jpg)
Routing Times as W increases
![Page 48: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/48.jpg)
Routing Time vs. W (clma Benchmark)
![Page 49: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/49.jpg)
BFS vs. Directed Search (w/Binning)
![Page 50: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/50.jpg)
Difficulty Prediction
• Westimate took less than 1 second to compute
• Some prediction mistakes will happen
• Prediction accuracy was 84%
![Page 51: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/51.jpg)
Fuzzy Prediction
![Page 52: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/52.jpg)
Architecture and CAD for Deep Submicron FPGAs(Section 2.2.4, pp. 31-34,
Section 4.4, pp. 80-89, Section 4.6.4, pp. 99-102 )
Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999
VPR Timing-Driven Router
![Page 53: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/53.jpg)
Example Timing Graph
![Page 54: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/54.jpg)
Example Timing Graph
12ns
![Page 55: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/55.jpg)
Timing and Slack
Process vertices in forward topological order
Process vertices in reverse topological order(Required time for any output node is Dmax)
How much can I slow down this circuit element (node) without increasing the critical path?
![Page 56: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/56.jpg)
Elmore Delay of a Source-Sink Path
• Td,i: intrinsic delay of element i (buffer); 0 otherwise
• Ri: equivalent resistance of element i (Rwire, Rbuf, Rpass)
• C(subtreei): total downstream capacitance not isolated by buffers
![Page 57: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/57.jpg)
Equivalent Circuits
![Page 58: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/58.jpg)
Example: M Wire Segments connected by Buffers
Linear and Elmore Delay Models agree!
![Page 59: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/59.jpg)
Example: Chain of Pass Transistors
Linear
Quadratic
![Page 60: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/60.jpg)
Fanout
![Page 61: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/61.jpg)
Introduce Elmore Delay into VPR-PathFinder’s Node Cost Function
Parameters that control how slack affects congestion-delay tradeoff
![Page 62: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/62.jpg)
… And to Support Directed Search• Expected cost to route from a wire segment n to the target
sink j
– The connection will be completed using wires of the same type as node n • Length, connecting switch type, etc.
– There is no congestion along the shortest path to the target • p(m) and h(m) are both 1 for all nodes m on the path that completes the
connection.
![Page 63: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/63.jpg)
Why Optimize Elmore Delay?
![Page 64: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/64.jpg)
Loops
![Page 65: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/65.jpg)
Timing-Driven Router
![Page 66: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/66.jpg)
Timing-Driven Routing of High-Fanout Nets
Xun Chen, Jianwen Zhu, Minxuan ZhangInternational Conference on Field Programmable
Logic and Applications, 2011
![Page 67: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/67.jpg)
Motivation (for Binning, Earlier)
![Page 68: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/68.jpg)
Angle-based Pruning
Route the trunk first, then each leaf
NetDecomposition
![Page 69: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/69.jpg)
Results
![Page 70: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/70.jpg)
Routing Algorithms for FPGAs with Sparse Intra-Cluster Routing Crossbars
Yehdhih Moctar, Guy Lemieux, Philip BriskInternational Conference on Field Programmable
Logic and Applications, 2012
![Page 71: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/71.jpg)
71
Full Crossbar Intra-cluster Routing2-LUT BLE
in1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
![Page 72: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/72.jpg)
72
RRG
Full Crossbar Intra-cluster Routing
RRG
![Page 73: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/73.jpg)
73
Full Crossbar RRG
Don’t model the RRG inside each CLB!
![Page 74: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/74.jpg)
74
Sparse Crossbar Intra-cluster Routing (1/6)
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
• RRG must include the intra-cluster routing
![Page 75: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/75.jpg)
75
Sparse Crossbar Intra-cluster Routing
RRG
![Page 76: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/76.jpg)
76
Sparse Crossbar RRG
RRG encompasses the entire FPGA
![Page 77: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/77.jpg)
77
Why is this Hard?
• Key Issues relating to RRG size– May run out of memory • If you are not routing in the cloud
– Long router runtimes• Large search space leads to slow convergence• Thrashing
• Objective– Route without expanding the whole RRG
![Page 78: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/78.jpg)
78
Routing Algorithms• Extensions to PathFinder routing algorithm
– Routes one net at a time via wavefront expansion
• Baseline– Extend the RRG with intra-cluster routing at each CLB
• Selective RRG Expansion (SERRGE)– Route nets to CLB inputs– Dynamically expand the RRG to finish the route
• Partial Pre-Routing (PPR)– Pre-route all nets within CLBs– Complete global routes to CLB inputs
![Page 79: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/79.jpg)
79
Baseline Router2-LUT BLE
in1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
• Inputs of the same LUT are equivalent!
Global RRG
![Page 80: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/80.jpg)
80
SERRGE in Action (1/8)
Start with a global RRG
s
t
![Page 81: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/81.jpg)
81
SERRGE in Action (2/8)
s
t
Route from s’s CLB to t’s CLB
![Page 82: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/82.jpg)
82
SERRGE in Action (3/8)
s
t
Locally expand the RRG with part of t’s CLB
![Page 83: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/83.jpg)
83
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Maintain one copy of the intra-cluster topology
SERRGE in Action (4/8)
t
![Page 84: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/84.jpg)
84
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Expand the fanout of the CLB input being routed
SERRGE in Action (5/8)
t
![Page 85: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/85.jpg)
85
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Complete the route if possible; if not, try again
SERRGE in Action (6/8)
t
![Page 86: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/86.jpg)
86
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Track used CLB and BLE inputs; remember the route
SERRGE in Action (7/8)
t
![Page 87: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/87.jpg)
87
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Deallocate unused RRG resources
SERRGE in Action (8/8)
t
![Page 88: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/88.jpg)
88
SERRGE Summary• Global routing uses standard PathFinder
• Selectively expand portions of the RRG to complete routes from CLB inputs to BLE inputs
• Storage Requirement– Global RRG (standard PathFinder requirement)– One copy of the intra-cluster routing topology– All intra-cluster routes computed thus far
• Fairly challenging to implement!
![Page 89: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/89.jpg)
89
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Process CLBs one at a time
PPR in Action (1/4)
t1
t2
![Page 90: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/90.jpg)
90
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
Compute local routes for each BLE
PPR in Action (2/4)
t1
t2
![Page 91: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/91.jpg)
91
2-LUT BLEin1
in2
2-LUT BLEin1
in2
CLB inputs
Local feedbacks
CLB inputs now form equivalence classes.
PPR in Action (3/4)
t1
t2
![Page 92: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/92.jpg)
92
PPR in Action (4/4)
Perform global routing w/equivalence class constraints on CLB inputs
s
t
![Page 93: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/93.jpg)
93
PPR Summary
• Pre-route each CLB
• Propagate LUT equivalences to CLB inputs– A baseline router on the full-blown RRG would still need to
satisfy BLE input-pin equivalence constraints
• Route as normal
• Much easier implementation than SERRGE
![Page 94: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/94.jpg)
Critical Path Delayac
_ctr
l
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
axgeommean
012345678
p = 40%
ns
ac_c
trl
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
ax
geomean
012345678
p = 75%
ns
PPR SERRGE Baseline
![Page 95: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/95.jpg)
RRG Size (MB)ac
_ctr
l
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
axgeommean
0
10
20
30
40
50
60
p = 40%
MB
ac_c
trl
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
ax
geomean
0
10
20
30
40
50
60
p = 75%
MB
PPR SERRGE Baseline
![Page 96: FPGA Routing](https://reader031.vdocuments.net/reader031/viewer/2022012919/568156c4550346895dc455e9/html5/thumbnails/96.jpg)
Router Runtime (s)ac
_ctr
l
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
axgeomean
0
200
400
600
800
1000
1200
p = 40%
Seco
nds
ac_c
trl
aes_
core
des_
area
mem
_ctr
l
pci_
brid
ge32 sp
i
syst
emca
es
syst
emcd
es
usb_
func
t
wb_
conm
ax
geomean
0
200
400
600
800
1000
1200
Seco
nds
PPR SERRGE Baseline
p = 75%