handling global traffic in future cmp nocs

Post on 22-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Handling Global Traffic in Future CMP NoCs. Ran Manevich, Israel Cidon, and Avinoam Kolodny. . Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel. QNoC. Research. Group. SLIP 2012. Bandwidth Version of Rent’s Rule. B – Cluster external bandwidth. - PowerPoint PPT Presentation

TRANSCRIPT

Handling Global Traffic in Future CMP NoCs

Ran Manevich, Israel Cidon, and Avinoam Kolodny.

Module

Modu le Module

Modu le Modu le

Modu le Modu le

Modu le

Module

Modu le

Modu le

Modu leGroup

ResearchQNoC

Electrical Engineering DepartmentTechnion – Israel Institute of Technology

Haifa, Israel

SLIP 2012

Bandwidth Version of Rent’s Rule

B – Cluster external bandwidth.k – Average bandwidth per

module.G – Number of modules in a

cluster.R – Rent’s exponent, 0<R<1.

B = kGR

G = 16B = ∑

Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007

Rent’s Exponent Reflects Traffic Locality

CMP NoC Traffic Follows Rent’s Rule

2D Mesh NoC

~Average of CMP parallel programs*

* Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008

2D Mesh – Packets Classification by Distance For illustration purposes, packets are

classified according to distances between sources and destinations.

K=8

Nearest Neighbor (NN) –Dist = 1

Local – 1<Dist<2+K/8

Global – Dist ≥ 2+K/8K=

16

Fraction of global packets decreases in large systems

Rent’s exponent (R) = 0.7

(NearestNeighbor)

Dominance of Global Packets in BW/Router and Light Load Latency

Nearest Neighbor traffic is dominant in small systems.

* Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010

*

In large systems:1.Global packets are

minority.2.Global packets

dominate BW/router and average latency.

Problem!!!

In large systems, global packets (minority):

Consume most of the network’s BW.Significantly increase average light load latency.

Solution - PyraMesh

Overall hops-count is reduced.Average latency is reduced.

Average BW per router is reduced.

Hierarchical 2D mesh. Global packets are routed

through higher hierarchy levels.

12345678 hopsinstead of 14!

Source

Dest.

PyraMesh - ArchitectureK – The size of the base

mesh.NL – Number of levels.NP – Number of pyramids on

top of the base mesh.

αi – Ratio between the sizes of levels i and i+1.

Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension.

K = 8, NL = 2, NP = 1αi = 4, Ci = 2

K = 8, NL = 3, NP = 1αi = 2, Ci = 1

K = 8, NL = 2, NP = 4αi = 4, Ci = 1

Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter:

Routing – XY:

PyraMesh – Addressing and Routing

1

,( , )1

, ;

i

i X Y i mmi i

X YAddress at a

at at

Packets are distributed among levels i according to their travel distance (D) in the base mesh.

DThi – Distance threshold of level i. If D > DThi , the packet is directed to level

i+1. Example: DThi = 6, 12, 20

PyraMesh – Packets Classification

Highest Level Travel Distance

4 D>203 12<D≤202 6<D≤12

1 (Base Mesh) D≤6

Area overhead,

Wiring overhead,

Maximum bandwidth per router*,

Average light-load latency* =

F(K,NL,NP,αi,Ci,Dthi*,R*)

PyraMesh – OptimizationCONSTRAINTSOPTIMIZATION

OBJECTIVES

Optimization Results Example of 16x16 System, R = 0.7

Throughput optimized PyraMesh:

Light load latency optimized PyraMesh:

D≤55<D≤8

D>8Packets distance thresholds

D≤66<D≤18

D>18

Light Load Latency Performance

BMesh – The baseline meshScaled Mesh (SMesh) – Links wider than inBMesh by PyraMesh area overhead factor.

HNoC –

Throughput Results, R = 0.7

Our Contributions

The observation that global packets limit scalability of large systems.

PyraMesh – A novel framework for hierarchical NoCs design.

Characterization of Rentian traffic in large NoCs.

Conclusions Global packets limit performance in

large (future) CMP systems.

PyraMesh – A novel class of hierarchical 2D mesh topologies.

PyraMesh handles global traffic in future CMP NoCs.

Thank You!

Related Work

CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip

networks”. International Conference on Supercomputing, 2006.

GigaNoC C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A

hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007.

Long Range Links U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance

optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006.

Hierarchical Rings on a Mesh S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed

meshes using hierarchical rings for global routing”. ASAP 2007.

Hierarchical 2-Levels 2D MeshMarkus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010.

top related