advanced algorithms and research...
TRANSCRIPT
![Page 1: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/1.jpg)
1
Advanced algorithms and research applications
Laurent LemarchandLaurent LemarchandLISyC/UBOLISyC/UBO
[email protected]@univ-brest.fr
![Page 2: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/2.jpg)
2
Logic synthesis for LUT-based FPGApresentation
LUT-based FPGA
Synthesis flow
Boolean networks for circuit synthesis
Large scale problems : parallelism and partitionning (cf TCAD IEEE 01/2012)
Algorithms
![Page 3: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/3.jpg)
3
Logic synthesis for LUT-based FPGAsynthesis flow
LUT-based circuits Cells Routing
![Page 4: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/4.jpg)
4
Logic synthesis for LUT-based FPGAsynthesis flow
Circuit 1 000 4-LUT runtimes (mn)
Simplification Mis II 2
K-bounded Roth-Karp decomposition
6
Covering (surface) Mis-Pga 36
Covering (unit delay) Flowmap 3
High levelsynthesis
placement
routing
Logic synthesis
Logicoptimization
Technologymapping
![Page 5: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/5.jpg)
5
Logic synthesis for LUT-based FPGAsynthesis problems size
Virtex-II
1000
Virtex-II
3000
Spartan-3 1000
Spartan-3 2000
Virtex-5
LX30
Virtex-5
LX50
Virtex-5
LX85
Virtex-5
LX110
Portes 1 million
s
3 million
s
1 million
s
2 million
s
----- ----- ----- -----
Bascules 10240 28672 15360 40960 19200 28800 51840 69120
LUT 10240 28672 15360 40960 19200 28800 51840 69120
Multiplieur
40 96 24 40 32 48 48 64
Bloc de RAM (kbit)
720 1728 432 720 1152 1728 3456
![Page 6: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/6.jpg)
6
Logic synthesis for LUT-based FPGAsynthesis problems size
Algorithms have at leasr O(n2) complexity Combinatorial explosion Resources problem
Computations Memory
size (SOP size)
time (mn)
![Page 7: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/7.jpg)
7
Logic synthesis for LUT-based FPGAsynthesis problems size
Algorithms have at leasr O(n2) complexity Combinatorial explosion Resources problem
Computations Memory
size (SOP size)
time (mn)
Divide and conquerPartitionning The design
![Page 8: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/8.jpg)
8
Logic synthesis for LUT-based FPGAboolean network
Directed acyclic graph
Primaryinputs
Primaryoutputs
![Page 9: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/9.jpg)
9
Logic synthesis for LUT-based FPGAboolean network and tech. mapping
Input : a Directed acyclic graph (DAG) G = (V, E) Output: K-feasible DAG G' = (V', E') :
v V', |inputs(v)| K
1 node = 1 K-LUT Technology mapping Lot of objectives
Surface : #LUT Delais : critical paths Routability : connection degrees & density
Optimizedbooleannetwork
Decomposition Feasiblenetwork
Technologymapping
Optimizedfeasiblenetwork
![Page 10: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/10.jpg)
10
Chortle-crf FPGA (Field Programmable Gate Array)
Minimize the used # LUTs
Place and route the LUTs
![Page 11: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/11.jpg)
11
Chortle-crf Dynamic Programming
Technology mapping for LUT-based FPGA
Cluster logic nodes into k-LUTs : one LUT can implement any logic function of up to k inputs (fanin) (truth table)
hd
e
b
c
f
a
g3-LUT
func(a,b,c)
a
b
c
g
![Page 12: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/12.jpg)
12
Chortle-crf Dynamic Programming
d
eb1
b2
Process from inputs to outputs
Solution for d+b1+b2 must Minimize the number of LUT Minimize the fanin of head LUT
![Page 13: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/13.jpg)
13
Let G = (V, E) with |V| = 2n find a partition X = V1 U V2 s.t |V1| = |V2| = n while minimizing edges crossing parts Parwise exchange neighborhood
Local search2-way partitionning problem
a
b
c=7
a
b
c=5 = 7-3+1
![Page 14: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/14.jpg)
14
At each step, choice the exchange maximizing the cut number gain Constraint : a node can be swapped only one time N/2 steps at most
2-way partitionningKernighan-Lin heuristic
![Page 15: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/15.jpg)
15
Extending 2-way partitionning Recursively or Kernighan Lin based
K-way partitionning : Minimize the global cut Balance parts size
Multi-level partitioning (Métis)
K-way partitionningtechniques
![Page 16: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/16.jpg)
16
Multi level partitionning
Multi level partitionningHMétis
clustering
Groupingnodes
Partitionning After unclustering After refinement
unclusteringclustering
Initial partitiononto clustered
graph
![Page 17: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/17.jpg)
17
Ciircuit partitionning nodes : logic gates edges : connections Create abd optimize sub systems
Multi level partitionningLogic synrthesis
![Page 18: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/18.jpg)
18
Motivations: divide and conquer Simple, Multi algorithms Problem size runtimes Parallelism
Quality ? Synthesis ever with multiple algorithms rarely optimal Better heuristics
Limit information loss
Partition based logic synthesisData partitionning
![Page 19: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/19.jpg)
19
Avoid maximum loss. 2-way partitionned network
Partition based logic synthesisNetwork partitionning
Nodes affected to parts 1 and 2 whileminimizing the cutand balance parts
![Page 20: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/20.jpg)
20
Avoid information loss. Primary I/O generated
Partition based logic synthesisNetwork partitionning
![Page 21: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/21.jpg)
21
Information loss : A and B in 2 distinct parts
Nodes are disconnected by the partitionning
Partition based logic synthesisQuality loss because Information loss
![Page 22: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/22.jpg)
22
Each part must lead to the same computing load Evaluate a priori sysnthesis algorithms runtimes
Partition based logic synthesisLoad balancing
time time
accumulated time : 125speed up : 125/100 = 1.25
accumulated time : 125speed up : 125/35 = 3.57
![Page 23: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/23.jpg)
23
Depends on synthesis algorithm And on network structure
Partition based logic synthesisLoad balancing
Algorithm (a) (b)
Boolean simplification O(n) O(1)
Technology mapping O(1) O(n)
![Page 24: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/24.jpg)
24
Depend of algorithms nature Local. Ex : k-feasible network decomposition Global. Ex : delay optimization
Critical path optimization
Evaluation criteria Synthesis ruuntimes and speedups Quality
LUT based FPGA technology mapping tools Mis-PGA Flowmap-d
Partition based logic synthesisResults
![Page 25: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/25.jpg)
25
Area optimization # LUT (CLBs = 2 LUTs in xc4100 series) Local and global decompositions (And/Or, Roth/Karp, kernels,
…) Global boolean simplifications (réinjections, substitution, ...) Exact or heuristic covering (BCP)
Important runtimes Automatic substitution of exact algorithms by heuristics for
large problems Impact on synthesis runtimes
Partition based logic synthesisMis-PGA
![Page 26: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/26.jpg)
26
Area optimization. #CLBs 12 circuits bench LGSYNTH'91
#LUT (CLBs = 2 LUTs in xc4100 series FPGA)
Quality loss / global synthesis without partitionnning
Partition based logic synthesisMis-PGA : quality
Loss
(%
)#
CLB
# partitions
![Page 27: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/27.jpg)
27
Cumulated runtime onto a single processor
speedup
Partition based logic synthesisMis-PGA : runtimes
Run
times
(se
c)sp
eedu
p
# partitions
![Page 28: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/28.jpg)
28
Example : Flowmap-d : delay optimization Critcal paths (U) or nominal delay (N, congestion)
Partition based logic synthesisFlowmap-d : quality
parts
Loss (%)
![Page 29: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/29.jpg)
29
Flowmap-d : O(n2)
Partition based logic synthesisFlowmap-d : runtimes
Parts (procs)
time speedupspeeduptime
mean
![Page 30: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/30.jpg)
30
Quality Loss(25%) with unit delay model (critical path) Gain (10%) with nominal delay model (better optimization
with heuristics since partitionning process exhibit congested areas)
Speed up : Superlinear for large scale designs
Important runtimes required to absorb the paralllelism overcost
Partition based logic synthesisFlowmap-d results
![Page 31: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/31.jpg)
31
QoS in Home Network for VBR
User demand : High Quality of Service for video broadcast Affordable, well known, closed, network environment Stream priority according to other network usages Bandwitdh reservation : guaranteed QoS (no delay)
gateway
tablet_1
connectedTV_1
PC_2
PC_1
console_1PC_3
L_3
L_2
L_1
STB_1
Wi-Fi
![Page 32: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/32.jpg)
32
Bandwitdh reservation
VBR video encoding case : Allocate average rate ? quality loss Allocate peak rate ? bandwidth loss Allocate exact rate ? Unpracticable – hard constraints
tradeoff between peak and exact reservation
time
trh
rou
put
thro
ugh
put
![Page 33: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/33.jpg)
33
Variable Bitrate Hull
Series of bitrates (amount of data per time slot) b
i for each time slot i (e.g 1-sec time slots)
ri : reservation of network bandwidth for each time slot ii r
i > b
i reservation hull that ensures QoS
time
bitrates reservations
r1
r4
r3r
2
thro
ugh
puts
![Page 34: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/34.jpg)
34
Constraints on reservation policy
Two aspects taken into account. Reservation consists in configuring network resources
M : bounded # successive differents reservation ri
P : minimal time between 2 reconfigurations (ie, minimal reservation duration)
time
throughputs reservations Lost bandwidth
r1
r4
r3r
2
thro
ug
hpu
ts
![Page 35: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/35.jpg)
35
Graph and optimization goal
time
r1
r4
r3r
2
thro
ug
hpu
ts
00 3323168121 52 67 61 total
cost:301
endstart
throughputs reservations Lost bandwidth
How to minimize ? total cost
![Page 36: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/36.jpg)
36
Graph paths
time
r1
r4
r3r
2
thro
ug
hpu
ts
00 3323168121 52 67 61 total
cost :301
endstartr
1
Cost gain
r'1
OR:
00 8660 15
….
totalcost :255
throughputs reservations Lost bandwidth
![Page 37: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/37.jpg)
37
Graph building
0t0tn+1
t1
cost0→1 endstart ….
cost1→2
cost0→2
cost0→i
cost1→i
t2
One node per time slot j > i an edge t
i → t
j : config at time t
i and reconfig at t
j
Weights correspond to overcosts An extra node at the end
![Page 38: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/38.jpg)
38
Best solution computation
Best solution : minimal total overcost Shortest path
Contraints on bandwidth allocation P : minimal time between 2 reconfigurations
(ie, minimal configuration duration)
M : bounded # successive differents configurations ci
Bellman on DAG
remove edges i → j s.t j - i < P
M first steps of Ford-Bellman algorithm
![Page 39: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/39.jpg)
39
Simulation results
NS2 simulation tool Hierarchical token bucket VBR : series of bitrates sources Delays and buffer size measurements
300 400 500 600 6400
200400600800
100012001400
0510152025303540
(a) fixed size HTB
delay (s)buffer size (kB)
bandwidth allocation
buff
er
size
(kB
)
dela
y (s
)
140 175 210 245 280 315 3500
50
100
150
200
250
300
0
5
10
15
20(b) M=149
delay (s)buffer size (kB)
average bandwidth allocation
buff
er
size
(kB
)
dela
y (s
)
![Page 40: Advanced algorithms and research applicationslabsticc.univ-brest.fr/~lemarch/ENG/Cours/algoP3avanENG.pdf · Covering (unit delay) Flowmap 3 High level synthesis placement routing](https://reader033.vdocuments.net/reader033/viewer/2022060810/608e90102b0bcb18c7365387/html5/thumbnails/40.jpg)
40
Implementation
Server (Linux Ubuntu system)
Streamingcomponant
(VLC)
Net
wor
k In
terf
ace
Streamingobserver
Reservationpolicy
Hullvalues
times
configurations
stream
Client (Beagleboard)
Streamingclient
(mplayer)
Raisederrors
# errors #errors #frames
Reference 8 5.6x10-4
1429 kbits/s 11 7.7x10-4
640 kbits/s 80 56.0x10-4
Hull P5 17 12.0x10-4