fpga architecture support for heterogeneous, relocatable...
TRANSCRIPT
![Page 1: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/1.jpg)
1
24th International Conferenceon Field Programmable Logic and Applications September 3rd, 2014
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 1
FPGA Architecture Support for Heterogeneous, Relocatable Partial
Bitstreams
Christophe HURIAUXv, Olivier SENTIEYSv★, Russell TESSIER✜
University of Rennes 1, France vInria, France ★
University of Massachusetts, USA ✜
![Page 2: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/2.jpg)
2
Outline§ Introduction
§ Overview of the FlexTiles project§ Architecture Overview§ Advantages of 3-D Stacking
§ Principles§ Task Migration in an FPGA§ Task Migration in FlexTiles§ Heterogeneous case
§ Approach§ Coping with Heterogeneity§ Design Constraints
§ Results§ Implementation in VPR
§ Conclusion
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 2
![Page 3: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/3.jpg)
3
FP7 FlexTiles Project
§ FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 3
§ Provide a heterogeneous many-core architecture offering § Large flexibility§ High-performance, energy efficiency§ Raised programming efficiency§ Self-adaptation through virtualization
![Page 4: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/4.jpg)
4
Architecture Overview
§ 3D-Stacked Heterogeneous manycore§ General Purpose Processors (GPP)
§ for flexibility and programming homogeneity§ Network On Chip§ Dedicated hardware accelerators mapped at
run-time on a reconfigurable layer
§ Reconfigurable layer with seamless task migration capabilities
§ Virtualization layer to provide an abstraction of the manycore and self adaptive services
§ Tool-chain for parallelization and compilation
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 4
![Page 5: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/5.jpg)
5
Architecture Overview
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 5- 5
3D interface to the NoC
DSP blocks
Memory blocks
![Page 6: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/6.jpg)
6
Task migration
§ Classical problem in dynamic reconfiguration[1]§ Enhance resource usage
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 6
4x4?
[1] K. Compton, Z. Li, J. Cooley, S. Knol, and S. Hauck, “Configuration relocation and defragmentation for run-time reconfigurable computing,” IEEE Transactions on VLSI Systems, vol. 10, no. 3, pp. 209 –220, 2002.
![Page 7: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/7.jpg)
7
3D Stacking
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 7- 7
Core Core CoreCore Core Core
Core Core Core
reconfigurable layer
multicore layer
§ 3D-Stacked Reconfigurable Accelerators§ Improved resource usage§ Improved bandwidth/latency§ Improved performance and energy efficiency
Core Core CoreCore Core Core
Core Core Core
![Page 8: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/8.jpg)
8
Task Migration in an FPGA
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 8
§ Predefined reconfigurable regions
§ Bit-stream depends on task location
I/O I/O I/O I/O I/O I/O I/O
I/O I/O I/O I/O I/O I/O I/O
I/OI/O
I/OI/O
I/OI/O
I/OI/O
I/OI/O
I/OI/O
I/OI/O
I/OI/O
I/O
I/O
HW Accelerator #1
BS #1
HW Accelerator #1
BS #2
![Page 9: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/9.jpg)
9
Task Migration in FlexTiles
§ A task is synthesized, placed & routed into a Virtual Bit-Stream (VBS)§ Independent from task physical location in the fabric§ No predefined configuration domains
§ Resource sharing/distribution easiness, simplified task migration
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 9
1 2 3 11 321 2
3 212
�
212
3
1 321
§ Reconfiguration controller generates final BS at run-time
![Page 10: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/10.jpg)
10
Task Migration in FlexTiles
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 10
3D NI3D NI
3D NI3D NI
RAM DSP RAM DSP
RAM DSP RAM DSP
3D NI3D NI
3D NI
3D NI
3D NI
3D NI
3D NI
3D NI
3D NI
3D NI
3D NI
HW Accelerator #2
VBS #2
HW Accelerator #1
VBS #1
![Page 11: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/11.jpg)
11
Heterogeneity
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 11
§ Homogeneous case§ No constraint on task placement§ Regular routing architecture
§ Cope with heterogeneity§ RAM, DSP, 3D I/Os§ Migration is limited
§ vertically to the same column§ to the next column containing same
complex blocks
TaskConfigured LELogic Element (LE)
![Page 12: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/12.jpg)
12
Proposed architecture
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 12
§ Heterogeneous blocks routing is abstracted from logic routing§ Long lines allow a trade-off between placement
flexibility and routing complexity§ A two-level routing is performed at runtime:
§ Logic routing (as in the homogeneous case)§ Heterogeneous block routing through long lines
![Page 13: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/13.jpg)
13
Design Constraints
§ I/Os are made through 3D Network Interfaces, spread over the reconfigurable fabric
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 13
Rec
onfig
urat
ion
RAM
Reconfiguration CTRL
MEM
DSP 3D NI
AI
3D NI
AI
DSPDSPDSPDSPDSPDSPDSPDSPDSPDSP
MEMMEMMEMMEMMEMMEMMEM
3D NI
AI3D NI
AI
3D NI
AI
3D NI
AI
3D NI
AI
3D NI
AI
DSPDSPDSPDSPDSP
MEMMEMMEM
3D NI
MEM
MEM
DSPDSPDSPDSPDSPDSPDSPDSPDSPDSPDSP
MEMMEMMEMMEMMEMMEMMEM
DSPDSPDSPDSPDSP
MEMMEMMEMMEM
AI
![Page 14: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/14.jpg)
14
Implementation in VPR
§ Versatile Place and Route (VPR), open source CAD tool for placement and routing
§ Part of the Verilog To Routing (VTR) framework
§ Source code modified to implement ourtechniques and deal with our constraints§ Horizontal long-lines spread over partitions§ Separate homogeneous and heterogeneous routing
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 14
VPR and VTR: https://code.google.com/p/vtr-verilog-to-routing/
![Page 15: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/15.jpg)
15
Implementation in VPR
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 15
X
X
Y X
X
Fc=0.5Fc=1
VPR Original Routing Model
§ Logic grid§ Block placement
§ X: simple block§ Y: 2 blocks tall
§ Mesh routing lines§ Switch boxes§ Interconnect
![Page 16: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/16.jpg)
16
Implementation in VPR
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 16
YX
X
X
X
Enhanced Routing Model
§ Logic grid§ Block placement§ Block typing
§ X: homogeneous§ Y: heterogeneous
§ Mesh routing lines§ Long lines§ Switch boxes§ Interconnect
§ Homogeneous§ Heterogeneous
![Page 17: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/17.jpg)
17
Results
§ Architecture based on a simplified Stratix IV with:§ Dual-port 144k memories§ Fracturable 36x36 multipliers
§ Evaluation on two criteria§ Delay of the critical path§ Minimum channel width
§ Number of tracks in the homogeneous routing channels
§ Minimum channel width determined by VPR§ Not directly related to silicon area
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 17
![Page 18: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/18.jpg)
18
Results§ Benchmark set: VTR framework circuits [1]
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 18
[1] Rose, Jonathan, Luu, Jason, Yu, Chi Wai, et al. The VTR project: architecture and CAD for FPGAs from verilog to routing. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. ACM, 2012. p. 77-86.
Circuit # Mem # Mult # LBbgm 0 11 2,174boundtop 1 0 2,977ch_intrinsics 1 0 272diffeq1 0 5 41diffeq2 0 5 43LU8PEEng 45 8 30mkDelayWorker32B 41 0 497mkPktMerge 15 0 17mkSMAdapter4B 5 0 181or1200 2 1 273raygentop 1 7 192stereovision1 0 38 990
![Page 19: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/19.jpg)
19
Results: Delay
§ Estimation of the worst case delay§ Impossible to predict where connections to long lines
will be done§ Some channels crossing fixed-function blocks are
longer
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 19
![Page 20: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/20.jpg)
20
Results: Delay
§ Only 2% delay increase (in average)
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 20
0
0,2
0,4
0,6
0,8
1
1,2
0,00
20,00
40,00
60,00
80,00
100,00
120,00
140,00
160,00proposed/classicns
Crit. Path (classic)
Crit. Path. (enhanced)
Crit. Path. (ratio)
![Page 21: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/21.jpg)
21
Results: Min. Channel Width
§ 1.8X channel width increase on average§ Need for specific routing algorithms to deal with
the heterogeneous interconnection network
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 21
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
0,00
20,00
40,00
60,00
80,00
100,00
120,00
140,00
160,00proposed/classic# tracks
min W (classic)
min W (enhanced)
min W (ratio)
![Page 22: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/22.jpg)
22
Conclusion
§ FPGA embedded in a 3D architecture§ More flexibility for task placement and/or
relocation§ Low impact on delay but cost on routing
resources§ Need to find a trade-off between flexibility and
area increase of additional connections
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 22
![Page 23: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/23.jpg)
23
Thank you for your attention
More info on FlexTiles: http://www.flextiles.eu
C. Huriaux, O. Sentieys and R. Tessier September 3rd, 2014 - 23
![Page 24: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/24.jpg)
24
Thank you for your attention
C. Huriaux, O. Sentieys and R. Tessier September 3rd, 2014 - 24
![Page 25: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/25.jpg)
25
Virtual Bit-Stream: Example
§ Hiding routing details§ Full BS is 129 bits§ Could be reduced by
giving less details
Jan. 2014CAIRN project-team - 25
CLBIN[1]
CLBIN[2]
CLBIN[3] CLBOUT
CLBIN[0]
4567
12131415
0123
891011
16
17
18
1920
![Page 26: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/26.jpg)
26
Virtual Bit-Stream: Example
§ Hiding routing details§ List of I/O and
connections§ 20 è 8 § 1 è 9 § 5 è 18
Jan. 2014CAIRN project-team - 26
4567
0123
89101116
17
18
1920
12131415
![Page 27: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/27.jpg)
27
Results: BS Sizes on MCNC Benchmarks
0"
200"
400"
600"
800"
1000"
1200"
1400"
1600"
tseng" tseng" diffeq" diffeq" apex4" des" ex5p" misex3"
Kilo%bits)
Rou:ng"
Logic"
Jan. 2014CAIRN project-team - 27
![Page 28: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/28.jpg)
28
Results: VBS Sizes on MCNC Benchmarks
44.4%$49.2%$ 47.2%$
55.2%$49.7%$
29.5%$ 27.4%$ 26.6%$
0.0%$
10.0%$
20.0%$
30.0%$
40.0%$
50.0%$
60.0%$
70.0%$
80.0%$
90.0%$
100.0%$
0$
200$
400$
600$
800$
1000$
1200$
1400$
1600$
tseng$ tseng$ diffeq$ diffeq$ apex4$ des$ ex5p$ misex3$
Kilo%bits)
BS$size$
VBS$size$
Compression$raBo$
Jan. 2014CAIRN project-team - 28
![Page 29: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/29.jpg)
29
Introduction: Architecture Overview
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 29- 29
3D Access Pointto the NoC
![Page 30: FPGA Architecture Support for Heterogeneous, Relocatable …people.rennes.inria.fr/.../huriaux-fpl14-presentation.pdf · 2016-03-02 · § FPGA embedded in a 3D architecture § More](https://reader033.vdocuments.net/reader033/viewer/2022042911/5f41d31af638ec4a246efa4c/html5/thumbnails/30.jpg)
30
Introduction: Architecture Overview
September 3rd, 2014C. Huriaux, O. Sentieys and R. Tessier - 30- 30
General Architecture Overview