public seminar_final 18112014

35
Hardware Implementation and Evaluation of Flexible Router Architecture for NoCs Hossam El-Sayed Abdel-Fadeel M.Sc. Student, ECE department, E-JUST, Research Assistance, NTI email: [email protected] [email protected] Supervised by: Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty, Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf December 2, 2013 1

Upload: hossam-hassan

Post on 14-Apr-2017

270 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Public Seminar_Final 18112014

1

Hardware Implementation and Evaluation of Flexible Router

Architecture for NoCsHossam El-Sayed Abdel-FadeelM.Sc. Student, ECE department, E-JUST,

Research Assistance, NTIemail: [email protected]

[email protected] by:

Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty, Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf

December 2, 2013

Page 2: Public Seminar_Final 18112014

2

• MOTIVATION• RELATED WORK• BASE ROUTER ARCHITECTURE• FLEXIBLE ROUTER ARCHITECTURE• EVALUATION AND EXPERIMENTS• CONCLUSION

December 2, 2013

Outline Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 3: Public Seminar_Final 18112014

3

• Process technology scales Transistor densities increases.

Many Processing Elements in a single chip. BUT, also global wiring delays increases. (wire speed not scaling)

Performance of Digital Systems increases in terms of computation.• Design concept

Many Processing Elements (PEs) need to be interconnected. Need a structured and scalable on-chip communication architecture.

Computation-centric design. Communication-centric design.

December 2, 2013

Why Network on Chip?

To combat these issues, researchers have proposed Network on Chip (NoC)

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 4: Public Seminar_Final 18112014

4

• NoC = Routers + Links.– Network topology (how the nodes are connected

to each other)– Routing algorithm (how packets move: source

destination)– Flow control (controls the transmission of packets

between routers)– Router architecture (Buffers, Arbiters,

Crossbars, ..etc.)• Buffer Requirements in a Router

– Stores arriving Packets or flits.

December 2, 2013

What is Network on Chip (NoC) ?

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 5: Public Seminar_Final 18112014

5

Buffers in NoC Routers

• Why buffering ? Wait for routing decisions. Contention for the same output channel. Congested downstream router.

• Large buffers improve Throughput and Latency.• BUT, in cost of

– Area: High hardware resource overhead– Power: Large energy consumers about 64% of the total router leakage

power .• Need efficient ways to use buffer resources

– Through Perfect management of available buffers.• Several architectures and implementations were proposed .

December 2, 2013

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 6: Public Seminar_Final 18112014

6

Related Work

• Central Buffer Sharing Method– All ports share a central buffer– Improves the performance but at

the cost of• Area overheads• Complexity of control

• Distributed Shared Buffer – Shows improvement in the

throughput but in cost of • Power and • Area overhead.

December 2, 2013

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 7: Public Seminar_Final 18112014

7

• Improve the performance of the overall network.– Modifying the Router Architecture

• Using the same amount of available buffers in more efficient way.– If there is a contention at any input port, the Flexible

Router will try to allocate any suitable free buffer in other input ports in the router.

– No need to increase the size of buffers or to use extra virtual channels (VCs)

December 2, 2013

Flexible Router Approach (1/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 8: Public Seminar_Final 18112014

8December 2, 2013

Flexible Router Approach (2/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Base Router Congestion Problem

Busy

Busy E

W

S

N

Packets requesting busy buffer will be blocked

Page 9: Public Seminar_Final 18112014

9December 2, 2013

Flexible Router Approach (3/3)

Instead of waiting busy buffer to be free look for another one.

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Increases packets moving through router

Features of Flexible Router

Busy

The design of Flexible Router similar to the base router except the added functionality and modules to the input ports.

Busy Busy

Efficient buffer utilization

Enhance Packets throughput

Low hardware resource overhead

EW

S

N

Page 10: Public Seminar_Final 18112014

10December 2, 2013

Base Router Architecture Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 11: Public Seminar_Final 18112014

11December 2, 2013

Input Port Module

RC

Rou

ting

Com

puta

tion

FIFO buffer

ReqUpStr ReqInt(3:0)

FIFOController

GntUpStr GntInt(3:0)

ReqInCnt

GntInCnt

EmptyFull

PacketIn PacketInCnt IntPacket

Rea

dEn

Writ

eEn

Rea

dAdd

r

Writ

eIA

ddr

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 12: Public Seminar_Final 18112014

12December 2, 2013

Output Port Module

Mux

Round RobinArbiter

gnt[1:0]

ReqInt (3:0)

GntInt (3:0) ReqDnStr

GntDnStr

fullDnStrPacketIn 0

PacketOutPacketIn 1

PacketIn 2

PacketIn 3

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 13: Public Seminar_Final 18112014

13December 2, 2013

Basic operation of Base Router

Receiving flowchart of Down Stream

Sending flowchart of Up Stream

Out

put P

ort

Up

Stre

am R

oute

r (U

S)

Full_US

Request_US

Grant_US

PacketIn_US Inpu

t Por

t Do

wn

Stre

am R

oute

r (D

S)

Inpu

t Por

tU

p St

ream

Rou

ter

(US)

Full_DS

Request_DS

Grant_DS

PacketOut_DS

Out

put P

ort

Dow

n St

ream

Rou

ter

(DS)

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 14: Public Seminar_Final 18112014

14December 2, 2013

Flexible Router Architecture Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 15: Public Seminar_Final 18112014

15December 2, 2013

Input Port Module

RC

Rou

ting

Com

puta

tion

ReqUpStr ReqInt(3:0)GntUpStr

GntInt(3:0)

ReqInCnt

GntInCnt

EmptyFull

PacketIn IntPacket

Rea

dEn

Writ

eEn

Rea

dAdd

r

Writ

eIA

ddr

FIFO buffer

FIFO Flexibility Controller

Req_FFCE_FIFO_W,N,SGnt_FFCE_FIFO_W,N,S

MU

X

EastPacket

Packets From Other Ports

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 16: Public Seminar_Final 18112014

16December 2, 2013

Basic operation of Flexible Router The FFC requests other

FIFOs in a sequential order.pseudo code for East FFC : if (FIFO West is not full){Send Request and wait Grant ;}else if (FIFO North is not full){Send Request and wait Grant ;}else if (FIFO South is not full){Send Request and wait Grant ;}

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 17: Public Seminar_Final 18112014

17

• By applying the turn model on the Flexible router working under XY routing we can avoid deadlock.

• Under XY routing, possible packet directions that each buffer can store in the Flexible router are as follows:– North buffer:

• Can contain packets directed to Local or South. – South buffer:

• Can contain packets directed to Local or North. – East buffer:

• Can contain packets directed to Local, North, South, or West. – West buffer:

• Can contain packets directed to Local, North, South, or East.

December 2, 2013

Possible Packet Directions

Packets directed to the local port they reach their destination and are absorbed directly with the local port.

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 18: Public Seminar_Final 18112014

18

• NoC parameters used in this work :– A 64-bit 5-input-buffer router

System Architecture

Selected Logic Why usedArbitration Round Robin Fairness Switching Store-And-Forward For simplicity and prove of concept.Routing Algorithm XY- DOR Routing Minimize area and control overhead.

Deadlock free routingTopology Mesh Most common for 2D chipsPacket/Flit size 64 bits Can be vary from 32 to 264 bitsBuffer Size 2,4,8 Packets Small to see the utilizationTraffic Patterns Uniform Random, Hot-Spot and

Nearest-NeighborFor performance evaluation

December 2, 2013

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 19: Public Seminar_Final 18112014

19

XY - Dimension-Ordered Routing (XY - DOR)

S

D

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 20: Public Seminar_Final 18112014

20

• Latency is the time elapsed since a particular packet enters the network until its last packet reaches its destination.

• Throughput is the rate at which packets are delivered by the network for a particular traffic pattern. .

• There are many factors that a affecting these parameters– Topology: determines the connecting form of the system and the size,

or the number of nodes.– Injection rate: the rate at which packets are injected into the

simulator, tell the simulator how many packets to inject per simulation cycle per nodes on an average.

– Flow control: It refers to the number of virtual channels per physical channel and the depth of each virtual channel; the unit is flit.

December 2, 2013

Performance Parameters Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 21: Public Seminar_Final 18112014

21

• A cycle-accurate NoC simulation system in Verilog HDL is developed to evaluate the performance of Flexible Router.

• Synthesis Environment:– XILINX ISE 14.1 Target platform – XILINX Virtex-5 xc5vfx70t-

1ff1136 FPGA.– Cadence SoC Encounter ® Digital Implementation System,

with 180nm technology. (Encounter RTL Compiler®)

December 2, 2013

Evaluation Approach Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 22: Public Seminar_Final 18112014

22December 2, 2013

Simulation Platform (1/3)

PE Information Where FunctionSend Time Sender Log Cycle counter of each sent packet

Receive Time Receiver Log Cycle counter of each received packet

PE Module Sender ID Sender and Receiver Log The PE Module ID of the Sender

PE Module Receiver ID Receiver Log The PE Module ID of the Receiver

Packet ID Sender and Receiver Log The ID of the transmitted Packet

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Packet Injector Flow Chart

Page 23: Public Seminar_Final 18112014

23

Simulation Platform (2/3)

Verilog RTL Model

Verilog Testbench

Simulation Compiler

Simulation Results

Log FilesWaveform Matlab

Matlab calculates the following: Average Latency for all the packets in the simulation system. Average Throughput for all the packets in the simulation system.

December 2, 2013

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

SimulationGraphs

Modelsim or ISim

Page 24: Public Seminar_Final 18112014

24

• Most performance analysis used synthetic traffic patterns with different characteristics.

• Simulation done under 3 different traffic patterns:– Uniform (UNI): all the traffic is equally distributed between all nodes.

This is the most commonly used traffic pattern for network evaluation because it is straightforward to implement, it makes no assumptions about the application.

– Nearest-Neighbor (NN): any node sends only to its neighbor nodes.– Hotspot (HS): 90% of the traffic is directed to the hotspot node at (2,

2) and the rest of the traffic is equally distributed between all other nodes.

December 2, 2013

Simulation Platform (3/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 25: Public Seminar_Final 18112014

25

Uniform Random Traffic

December 2, 2013

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

0 0.02 0.04 0.06 0.08 0.1 0.12 0.1420

40

60

80

100Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

0.05

0.1

0.15Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

50

100

150Buffer Size = 8

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

0.05

0.1

0.15

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Buffer Size = 8

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.120

0.02

0.04

0.06

0.08

0.1

0.12

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

) Buffer Size = 2

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.1225

30

35

40

45

50

55

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Buffer Size = 2

Base RouterFlexible Router

Page 26: Public Seminar_Final 18112014

26

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.212

12.5

13

13.5

14

14.5

15 Buffer Size = 2

Packet Injection Rate (Packets/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Base RouterFlexible Router

December 2, 2013

Nearest Neighbor Traffic Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.212

12.5

13

13.5

14

14.5

15

15.5Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Base RouterFlexible Router

0 0.05 0.1 0.15 0.212

12.5

13

13.5

14

14.5

15

15.5

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

Buffer Size = 8

Flexible RouterBase Router

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

0.05

0.1

0.15

0.2Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.20

0.05

0.1

0.15

0.2Buffer Size = 8

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20

0.05

0.1

0.15

0.2Buffer Size = 2

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

The traffic characteristics of Nearest Neighbor has that each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput.

Page 27: Public Seminar_Final 18112014

27December 2, 2013

Hot Spot Traffic

0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016Buffer Size = 8

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

0 0.005 0.01 0.015 0.02 0.025 0.030

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016Buffer Size = 2

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.01820

40

60

80

100

120

140Buffer Size = 2

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

BR-BUF-2FR-BUF-2

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.0180

50

100

150

200

250

300

350

400

450

500Buffer Size = 8

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

BR-BUF-8FR-BUF-8

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.01620

40

60

80

100

120

140

160

180

200

220

Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Ave

rage

Lat

ency

(Cyc

les)

BR-BUF-4FR-BUF-4

0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016Buffer Size = 4

Packets Injection Rate (Packet/Cycle/PE)

Thro

ughp

ut (P

acke

ts/C

ycle

/PE

)

Base RouterFlexible Router

The slight improvement in HS, except of increasing saturation point, is because the HS packets are injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces. It could be one of our future work on modifying the architecture of FR to be suitable for such kind of this type of traffic.

Page 28: Public Seminar_Final 18112014

28

• Using Xilinx ISE® Synthesis Tool (XST) targeting Virtex-5 FPGA, xc5vfx70t-1ff1136.

– The area and maximum frequency results of both Flexible and Base Routers– The increase in area is accepted due to the added logics for flexibility and FPGA

resources.

December 2, 2013

Synthesis Results (1/2) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

FPGA resourcesNumber of resources used

Base Flexible

BUF2 BUF4 BUF8 BUF2 BUF4 BUF8

LUTs 657 776 836 1078 1111 1112

FFs 425 430 440 473 474 493

AREA RESULTS OF XILINX FPGA

FPGA resources Base Flexible

BUF2 BUF4 BUF8 BUF2 BUF4 BUF8

Max Frequency (MHz) 164 150 150 141 139 141

FREQUENCY RESULTS OF XILINX FPGAMax Clock Frequency decreased in Flexible router due to the Flexibility units but the Performance of Flexible Router in terms of throughput and Latency overcome this impact.

Page 29: Public Seminar_Final 18112014

29December 2, 2013

Synthesis Results (1/2)

ConfigurationArea in Cells Power in µW

Cell area Leakage Switching

Base 557963.68 1.015 51421.04

Flexible 661936.7 1.15 56372.29

Overhead 18 % 13 % 9.6 %

• Using Cadence Encounter RTL Compiler tool and 180nm standard cell library. – The power dissipation and area overhead are obtained for each case

at a typical operating conditions for 180nm technology. • 25o C, 1.8 Volts, Typical Transistor Model.

– Both dynamic and leakage power estimates were extracted from the synthesized router implementation, assuming a 50% uniform switching activity on all router input ports.

AREA AND POWER RESULTS FOR 180NM TECHNOLOGY

Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 30: Public Seminar_Final 18112014

30

• Experiment results show that Flexible Router • Increase in the throughput • Reduce the latency

• @low injection rates both Base and Flexible routers have nearly the same performance.• @ high injection rates Flexible has better performance, hence the propriety of flexibility

used.• Flexible router has saturation point higher than that of the Base router.• For UNI traffic there is 15% allows higher injection rate, in addition to improvement in the

performance at higher rates.• For HS and NN it is a small improvement (increasing saturation point), specially for HS.

– For HS, regards to that HS packets injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces.

– As the traffic characteristics of NN where each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput.

December 2, 2013

Analysis Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 31: Public Seminar_Final 18112014

31

• Decrease the communication overhead due to FFC• Support hot Spot traffics by modifying the FFC.• Implement and evaluate the Flexible Router for:

Virtual Channels. Other switching techniques like Virtual Cut-Through and Wormhole.

• Explore Flexible Router to support 3-D Network on Chip.• More real-world example implementations• The support for dynamically reconfigurable system

December 2, 2013

Future Work Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 32: Public Seminar_Final 18112014

32

[1] Hossam El-Sayed, Mohammed Ragab, Mohammed S. Sayed, and Victor Goulart, “ Hardware Implementation and Evaluation of the Flexible Router Architecture for NoCs,” 20th IEEE-ICECS International Conference on Electronics, Circuits, and Systems, UAE, Dec. 2013. (Accepted As Lecture).[2] Hossam El-Sayed, Ahmed Shalaby, Mostafa Said, Mohammed S. Sayed, Mohammed Ragab and Victor Goulart, Performance Evaluation of Flexible Router Architecture for NoCs,” 24th International Conference on Field Programmable Logic and Applications, Munich, Germany; September 2 - 4, 2014. (Submitted).

December 2, 2013

Published Papers Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Page 33: Public Seminar_Final 18112014

33December 2, 2013

Acknowledgment Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion

Finally I’d like to thanks all people helped meSpecially

Maher AbdelrasoulAhmed Shalaby Mostafa Said

Page 34: Public Seminar_Final 18112014

34

Thank You

December 2, 2013

Page 35: Public Seminar_Final 18112014

35December 2, 2013