![Page 1: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/1.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
ECE 636
Reconfigurable Computing
Lecture 13
Mid-term I Review
![Page 2: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/2.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
SRAM-based FPGA
• SRAM bits can be programmed many times
• Each programming bit takes up five transistors
• Larger device area reduces speed versus EPROM and antifuse.
Read or Write
Data
Q
Q
Programming Bit I1I2
P1
P2P3P4
Out
2-Input LUT
![Page 3: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/3.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Field Programmable Gate Array
![Page 4: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/4.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Connection Box Flexibility
• Fc -> How many tracks does an input pin connect to?
• If logic cluster is small, FC is large FC = W
• If logic cluster is large, Fc can be less.
- Approximately 0.2W for Xilinx XC4000EX, Virtex
LogicCluster
IO pin
Tracks
OutT0 T1 T2
T0T1T2
Out
FC = 3T0 T1 T2
![Page 5: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/5.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Switchbox Flexibility
• Switch box provides optimized interconnection area.
• Flexibility found to be not as important as FC
• Six transistors needed for FS= 3
0
1
0
1
0 1
0 1
![Page 6: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/6.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Switchbox Issues
![Page 7: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/7.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Bidirectional vs Directional
![Page 8: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/8.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Directional Wiring: Outputs can use switch block muxes
Dir Architecture
Single-driverWiring!!!
New connectivityconstraint
![Page 9: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/9.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Fine-grained Approach
• For 4-input LUTs 16 bits of information available
• Can be chained together through programmable network.
• Decoder and multiplexer an issue.
• Flexibility is a key aspect.
Addr
A D
A D
16X1
16X1LUT1
LUT2
![Page 10: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/10.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Hill Climbing Algorithms
• To avoid getting trapped in local minima, consider “hill-climbing” approach
• Need to accept worse solutions or make “bad” moves to get global minima.
• Acceptance is probabalistic. Only accept cost-increasing moves some of the time.
Cost
Solution space
![Page 11: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/11.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Maze Routing
• Evaluate shortest feasible paths based on a cost function• Like row-based device global route allocates channel
bandwidth not specific solutions. • Formulate cost function as needed to address desired
goal.
L
L
C
S
![Page 12: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/12.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Routing Tradeoffs
• Bias router to find first, best route.
• Vary number of node expansions using:
pcosti = (1 – a) x pcosti-1 + ncosti + a x disti
![Page 13: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/13.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Architectural Limitation
• Routing architecture necessitates domain selection.
• Bigger effect for multi-fanout nets
![Page 14: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/14.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Pathfinder
• Use a non-decreasing history value to represent congestion.
• Similarities to multi-commodity flow
• Can be implemented efficiently but does require substantial run time
• Only update after an interation.
ci = (1 + hn * hfac) * (1 + pn * pfac) + bn, n-1
![Page 15: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/15.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Bipartitioning
• Perhaps biggest problem in multi-FPGA design is partitioning
• Partitioner must deal with logic and pin constraints.
• Could simultaneously attempt partitioning across all devices. Even “simple” algorithms are O(n3)
• Better to recursively bipartition circuit.
![Page 16: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/16.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
KLFM Partitioning
• Identify nodes to swap to reduce overall cut size
• Lock moved nodes
• Algorithm continues until no un-locked node can be moved without violating size constraints
Bin 1 Bin 2
![Page 17: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/17.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
KLFM Partitioning
• Key issue is implementing node costs in lists that can be easily accessed and updated.
• Many extensions to consider to speed up overall optimization
• Reasonably easy to implement in software
![Page 18: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/18.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Partition Preprocessing: Clustering
• Identify bin size
• Choose a seed block (node)
• Identify node with highest connectivity to join cluster
• Terminate when cluster size met.
• In practical terms cluster size of 4 works best
![Page 19: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/19.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Clustering
• Technology mapping before partitioning is typically ineffective since frequently area is secondary to interconnect
• Frequently bipartitioning continues after unclustering as well.
Cluster
KLFM
uncluster KLFM
• This allows for additional fine-grain moves.
![Page 20: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/20.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Logic Replication
• Attempt to reduce cutset by replicating logic.
• Every input of original cell must also input the replicated cell.
• Replication can either be integrated into the partitioning process or used as a post-process technique.
![Page 21: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/21.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Logic Emulation
• Emulation takes a sizable amount of resources
• Compilation time can be large due to FPGA compiles
• One application: also direct ties to other FPGA computing applications.
![Page 22: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/22.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Are Meshes Realistic?
• The number of wires leaving a partition grows with Rent’s Rule
P = KGB
• Perimeter grows as G0.5 but unfortunately most circuits grow at GB where B > 0.5
• Effectively devices highly pin limited
• What does this mean for meshes?
![Page 23: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/23.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Multi-FPGA Software
• Missing high-level synthesis
• Global placement and routing similar to intra-device CAD
![Page 24: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/24.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Virtual Wires
• Overcome pin limitations by multiplexing pins and signals
• Schedule when communication will take place.
![Page 25: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/25.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Virtual Wires Software Flow
• Global router enhanced to include scheduling and embedding.
• Multiplexing logic synthesized from FPGA logic.
![Page 26: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/26.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Why Compiling C is Hard
° General Language
° Not Designed For Describing Hardware
° Features that Make Analysis Hard• Pointers
• Subroutines
• Linear code
° C has no direct concept of time
° C (and most procedural languages) are inherently sequential• Most people think sequentially.
• Opportunities primarily lie in parallel data
![Page 27: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/27.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Variables
° Handel-C has one basic type - integer
° May be signed or unsigned
° Can be any width, not limited to 8, 16, 32 etc.
Variables are mapped to hardware registers.
void main(void){
unsigned 6 a;a=45;
}
1 0 1 1 0 1 = 0x2da =
LSBMSB
![Page 28: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/28.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
DeepC Compiler• Consider loop based
computation to be memory limited
• Computation partitioned across small memories to form tiles
• Inter-tile communication is scheduled
• RTL synthesis performed on resulting computation and communication hardware
![Page 29: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/29.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
DeepC Compiler• Parallelizes compilation across multiple tiles
• Orchestrates communication between tiles
• Some dynamic (data dependent) routing possible.
![Page 30: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/30.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Control FSM• Result for each tile is a datapath, state machine,
and memory block
![Page 31: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/31.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Striped Architecture
• Same basic approach, pipelined communication, incremental modification• Functions as a linear pipeline• Each stripe is homogeneous to simplify computation• Condition codes allow for some control flexibility
FPGAFabric
Control Unit
Configuration Cache
Configuration Control &Next Addr
Address
Condition Codes Microprocessor
Interface
![Page 32: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/32.jpg)
Lecture 13: Mid-term 1 Review October 22, 2013
Piperench Internals
• Only multi-bit functional units used• Very limited resources for interconnect to neighboring programming
elements• Place and route greatly simplied
![Page 33: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/33.jpg)
Convolutional Encoder Accepts information bits as a continuous stream Operates on the current b-bit input, where b
ranges from 1 to 6 and some number of immediately preceding b-bit inputs to produce V output bits, V > b
FF FF
+
+
1
0 1
0
0
b =1, V =2
![Page 34: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/34.jpg)
Definitions Constraint Length
Number of successive b-bit groups of information bits for each encoding operation
Denoted by K Code Rate (or) Rate
b/V Typical values
K : 7 Rate : 1/2, 1/3
![Page 35: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/35.jpg)
The Viterbi Algorithm Finds a bit-sequence in the set of all
possible transmitted bit-sequences that most closely resembles the received data.
Maximum likelihood algorithm Each bit received by decoder associated
with a measure of correctness. Practical for short constraint length
convolutional codes
![Page 36: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/36.jpg)
00
10
11
01
0/00
1/11
1/01
1/10
0/01
0/11
1/00
0/10
State diagram State
Encoder memory Branch
k/ij,where i and j
representthe output bitsassociated with input bit k
![Page 37: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/37.jpg)
Trellis Diagram
00
01
10
11
00 00 00
11 1111
11
10
01
10
01
00
10
T=0 T=1 T=2 T=3
ENC IN : 0 1 0ENC OUT : 00 11 10RECEIVED: 00 11 11
Accumulated metric
2+2,3+0 : 3
0+1,3+1 : 1
2+0,3+1 : 2
0+1,3+1 : 1
0 0
3
2
2
3 1
3
0 2
1
K = 3Rate ½
Total number of states = 2K-1
![Page 38: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/38.jpg)
Adaptive Viterbi Algorithm
Motivation Extremely large memory and logic for Viterbi
Algorithm Fewer number of paths retained Reduced memory and computation
Definitions Path – Bit sequence Path metric or cost – Accumulated error metric of a
path Survivor – Path which is retained for the
subsequent time step
![Page 39: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/39.jpg)
Adaptive Viterbi AlgorithmCriterion for path survival
1. A threshold T is introduced such that a path is retained if and only if current path metric is less than dm+T, where dm is the minimum cost among all survivors of the previous time step.
2. The total number of survivors per time step is limited to a critical number called Nmax selected by user.
Only best Nmax paths have to be retained at any
time.
![Page 40: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/40.jpg)
Trellis Diagram for AVA
![Page 41: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/41.jpg)
Architecture (contd.)
Add
Add
b1
sum1
b2
sum2
di < dm + T
di < dm + T
Countpaths
Count < Nmax
T = T-2
yes
no
Updatememory
yes
yes
Elimination of sorting
![Page 42: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/42.jpg)
42
Virtual Router Independent routing policies
for each virtual router
Key challenges• Isolation• Performance• Flexibility• Scalability
Forwarding Table
Routing Control
Virtual router B
Forwarding Table
Routing Control
Physical routerVirtual router A
DEMUX MUX
![Page 43: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/43.jpg)
43
Virtualization using FPGAs
A novel network virtualization substrate which
• Uses FPGA to implement high performance virtual routers
• Introduces scalability through virtual routers in host software
• Exploits reconfiguration to customize hardware virtual routers
FPGA
VirtualRouter 1
VirtualRouter 2
VirtualRouter 3
VirtualRouter 4
![Page 44: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/44.jpg)
44
Partial Reconfiguration Use partial reconfiguration to independently configure
virtual routers
![Page 45: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/45.jpg)
45
Full FPGA Reconfiguration Two virtual routers (A, B) initially in FPGA During reconfiguration router A migrated to software, the
other eliminated After reconfiguration two virtual routers (A, B’) again in FPGA
ReducedThroughput
![Page 46: Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review](https://reader035.vdocuments.net/reader035/viewer/2022062715/56649d6d5503460f94a4d938/html5/thumbnails/46.jpg)
46
Partial FPGA Reconfiguration A remains in hardware and operates at full speed 20X speedup in reconfiguration down time due to partial
reconfiguration SustainedThroughput