mohamed abdelfattah vaughn betz. 2 why nocs on fpgas? hard/soft efficiency gap integrating hard nocs...

45
Design Tradeoffs For Hard and Soft FPGA-based Networks-on-Chip Mohamed ABDELFATTAH Vaughn BETZ

Upload: angie-bagg

Post on 01-Apr-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

Design Tradeoffs For Hard and Soft FPGA-based Networks-on-Chip

Mohamed ABDELFATTAHVaughn BETZ

Page 2: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

2

Outline

Why NoCs on FPGAs?

Hard/soft efficiency gap

Integrating hard NoCs with FPGA

1

2

3

Page 3: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

3

OutlineWhy NoCs on FPGAs?

Hard/soft efficiency gap

Integrating hard NoCs with FPGA

1

2

3

Motivation Previous Work

Page 4: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

4

Interconnect

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Page 5: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

5

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard Blocks:• Memory• Multiplier• Processor

Page 6: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

6

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard InterfacesDDR/PCIe ..

Interconnect still the same

Hard Blocks:• Memory• Multiplier• Processor

1600 MHz

200 MHz

800 MHz

Page 7: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

7

MotivationDDR3 PHY and Controller1. Bandwidth requirements for

hard logic/interfaces2. Timing closure

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1600 MHz

200 MHz

800 MHz

Page 8: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

8

MotivationDDR3 PHY and Controller1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Page 9: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

9

MotivationDDR3 PHY and Controller1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Page 10: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

10

MotivationDDR3 PHY and Controller1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Low-level interconnect hinders modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Page 11: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

Barcelona Los Angeles

Keep the “roads”, but add “freeways”.

Hard Blocks

Logic Cluster

Source: Google Earth

Page 12: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

12

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1. Bandwidth requirements for hard logic/interfaces

2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Low-level interconnect hinders modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoCNoC

Routers

Links Router forwards data packet

Router moves data to local interconnect

Page 13: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

13

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1. Bandwidth requirements for hard logic/interfaces

2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Low-level interconnect hinders modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoC

Pre-design NoC to requirements NoC links are “re-usable” Latency-tolerant communication NoC abstraction favors modularity

High bandwidth endpoints known

Page 14: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

14

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1. Bandwidth requirements for hard logic/interfaces

2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Low-level interconnect hinders modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoC

Latency-tolerant communication NoC abstraction favors modularity

Page 15: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

15

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Implementation options: Soft Logic (LUTs, .. ) Hard Logic (unchangeable)

Mixed Soft/Hard

Hard vs. Soft

Soft NoC Hard NoC• Build as needed out of LUTs • Must build the whole thing

• Tailor to application • Must be general enough for any aiapplication

• Slower, bigger • Faster, smaller

Investigate the hard vs. soft tradeoff for NoCs (area/delay)

Configurability Efficiency

Page 16: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

16

Previous Work FPGA-tuned Soft NoCs:

– LiPar (2005), NoCeM (2008), Connect (2012) Hard NoCs:

– Francis and Moore (2008): Exploring Hard and Soft Networks-on-Chip for FPGAs

Applications that leverage NoCs:– Chung et al. (2011): CoRAM: An In-Fabric Memory Architecture

for FPGA-based ComputingOur Contributions:

1. Quantify area/performance gap of hard and soft NoCs2. Investigate how this impacts NoC design (hard/soft)3. Integrate hard NoC with FPGA fabric

1. Why NoCs on FPGAs?

Page 17: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

17

OutlineWhy NoCs on FPGAs?

Hard/soft efficiency gap

Integrating hard NoCs with FPGA

1

2

3

NoC Architecture

Methodology Soft NoC design

Results

Area/Speed Efficiency Gap

Page 18: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

18

Router Microarchitecture NoC = Routers + Links

2. Hard/Soft Efficiency

State-of-the-art router architecture from Stanford:1. Acknowledge that the NoC community have excelled at

building a router: We just use it2. To meet FPGA bandwidth requirements:

High-performance router3. A complex router includes a superset of NoC

components that may be used: More complete analysis

Split router into 5 Components

Page 19: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

19

Router – 5 Components2. Hard/Soft Efficiency

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Page 20: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

20

Router – 5 Components2. Hard/Soft Efficiency

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Multi-Queue Buffer

• Port Width• Buffer depth• Number of VCs

= Memory + CIControl Logic

Input Modules

Page 21: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

21

Router – 5 Components2. Hard/Soft Efficiency

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Multiplexers

Logic + crowded interconnect

• Port Width• Number of Ports

Crossbar

Page 22: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

22

Router – 5 Components2. Hard/Soft Efficiency

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Retiming Register

Registers + little control logic

• Port Width• Number of VCs

Output Modules

Page 23: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

23

Router – 5 Components2. Hard/Soft Efficiency

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Arbiters

= Logic + Registers

• Number of Ports• Number of VCs

Allocators

Page 24: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

24

Design Space2. Hard/Soft Efficiency

5 Components

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Input Module

Crossbar

VC Allocator

SW Allocator

Output Module

Port Width

Number of Ports

Number of VCs

Buffer Depth

4 Parameters

Page 25: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

25

Methodology Post-routing FPGA (soft) area and delay Post-synthesis ASIC (hard) area and delay Both TSMC 65 nm technology (Stratix III) Verify results against previous FPGA:ASIC

comparison by Kuon and Rose

2. Hard/Soft Efficiency

Per Router Component

Page 26: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

26

3 Options for Buffer on FPGA Relatively small memories Critical component in router design 3 options for FPGA:

Registers

LUTRAM

Block RAM

One per LUT

640 bits

9 Kbits

2. Hard/Soft Efficiency

Area of each implementation option

Page 27: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

27

Width = 32 Bits

2. Hard/Soft Efficiency

Another logic cluster used

3 Options for Buffer on FPGA

Page 28: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

28

3 Options for Buffer on FPGA Relatively small memories 3 options for implementation on FPGA

Registers

LUTRAM

Block RAM

One per LUT

640 bits

9 Kbits

0.77 Kbit/mm2

23 Kbit/mm2

142 Kbit/mm2

16% utilized BRAM more area efficient than fully used LUTRAM (Valid for Stratix III)

LUTRAM could win for some points in other FPGAs

Use BRAM for FPGA (soft) implementationSoft

2. Hard/Soft Efficiency

Page 29: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

29 High port count inefficient in softSoft

24X – 94X

60X – 170X

2. Hard/Soft Efficiency

Results – High Port Count

Page 30: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

30 High port count inefficient in soft Width scales betterSoft

2. Hard/Soft Efficiency

Results – Width

26X – 17X

72X

Page 31: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

31 Buffer depth is free on FPGAs when using BRAMSoft

Filling up the BRAM

Results – Deep Buffers2. Hard/Soft Efficiency

Page 32: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

32

Soft Router Design Design recommendations based on FPGA silicon area Supported by delay measurements

Buffer depth is free on FPGAs when using BRAMSoft

High port count inefficient in soft Width scales betterSoft

Use BRAM for FPGA (soft) implementationSoft

2. Hard/Soft Efficiency

Page 33: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

33

Results – Area

Memory

= Logic + Registers

2. Hard/Soft Efficiency

Router Component Mean Area Ratio LUT:REGInput Module 17 --Crossbar 85 --VC Allocator 48 8:1Switch Allocator 56 20:1Output Module 39 0.6:1Router 30

Page 34: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

34

Results – Delay

2. Hard/Soft Efficiency

Router Component Mean Delay RatioInput Module 2.9Crossbar 4.4VC Allocator 3.9Switch Allocator 3.3Output Module 3.4Router 3.6

Page 35: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

35

OutlineWhy NoCs on FPGAs?

Hard/soft efficiency gap

Integrating hard NoCs with FPGA

1

2

3

Hard NoC + FPGA Wiring

Conclusion Future Work

Page 36: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

36

What to harden?Router Component Area Ratio Delay RatioInput Module 17 2.9Crossbar 85 4.4VC Allocator 48 3.9Switch Allocator 56 3.3Output Module 39 3.4Router 30 3.6

Router Component Area Ratio Delay RatioInput Module 17 2.9Crossbar 85 4.4VC Allocator 48 3.9SW Allocator 56 3.3Output Module 39 3.4Router 30 3.6

50% Total Area Critical

Path

Results suggest hardening Crossbar and Allocators Mixed hard/soft implementation

40%

10%

3. Hard NoC with FPGA

Page 37: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

37Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Mixed Implementation

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Input Modules Output Modules

Virtual Channel (VC) Allocator

Switch Allocator

Crossbar Switch

1

5

1

5

Soft Hard MixedArea 4.1 mm2 (1X) 0.14 mm2 (30X) 2.3 mm2 (1.8X)

Speed 150 MHz (1X) 810 MHz (5X) 390 MHz (2.5X)

? ?

How to connect hard and soft?

How efficient is mixed/hard after doing that?

Soft

Hard

Mixed not worth hardening

For a typical router ..• 5 ports• 32 bits wide• 2 VCs• 10 buffer words

3. Hard NoC with FPGA

Page 38: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

38

Integrating a Hard Router3. Hard NoC with FPGA

Router Logic

Programmable Interconnect

Router

• Same I/O mux structure as a logic block – 9X the area• Conventional FPGA interconnect between routers

Logic clusters

RouterLogic

Page 39: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

39

Router Logic

Programmable Interconnect

FPGA

Router

Integrating a Hard Router3. Hard NoC with FPGA

• Same I/O mux structure as a logic block – 9X the area• Conventional FPGA interconnect between routers

730 MHz

19th of FPGA vertically ( 2.5mm)

Page 40: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

40

Router Logic

Programmable Interconnect

Router

Integrating a Hard Router3. Hard NoC with FPGA

Assumed a mesh Can form any topology

FPGA

Page 41: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

41

Soft Hard Hard (+ interconnect)Area 4.1 mm2 (1X) 0.14 mm2 (30X) 0.18 mm2 = 9 LABs (22X)

Speed 150 MHz (1X) 810 MHz (5X) 730 MHz (4.7X)

64-node NoC on Stratix V

Integrating a Hard Router

Router Logic

Programmable Interconnect

Router

Soft Hard (+ interconnect)

Area~12,500

LABs576 LABs

%LABs 33 % 1.6 %

%FPGA 12 % 0.6 %

3. Hard NoC with FPGA

Hard NoC + Soft Interconnect is very compelling

Provides 47 GB/s peak bisection bandwidth

Very Cheap! Less than cost of 3 soft nodes

Page 42: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

Why NoCs on FPGAs?

Hard/soft efficiency gap

Integrating hard NoCs with FPGA

1

2

3

• Big city needs freeways to handle traffic• Solve communication problems for a large/heterogeneous FPGA:

• Timing Closure – Interconnect Scaling – Modular Design

• A hard NoC is on average 30X smaller and 3.6X faster than soft• Crossbars and allocators worst – Input buffer best

• An efficient soft NoC:• Uses BRAMs – Large width, low Port Count – Deep buffers

• Mixed implementation does not make sense• Integrated fully hard NoC with FPGA fabric (for NoC Links)

• 22X area improvement over soft• Reaches max. FPGA frequency (4.7X faster than soft)• 64-node NoC = 0.6% of total FPGA area (Stratix V)

Page 43: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

43

Future Work Power analysis More hardening:

– Dedicated inter-router links (hard wires)– Clock domain crossing hardware

How do traffic hotspots (DDR/PCIe) influence NoC design?

Latency insensitive design methodology that uses NoC CAD tool changes for a NoC-based FPGA

3. Hard NoC with FPGA

Page 44: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3

Thank You!

[email protected]

Page 45: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Hard/soft efficiency gap Integrating hard NoCs with FPGA 1 1 2 2 3 3