presenter: min-yu lo 2015/10/19 asit k. mishra, n. vijaykrishnan, chita r. das computer architecture...

24
Presenter: Min-Yu Lo 111/06/27 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on Citation: 6

Upload: julian-hodges

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Presenter: Min-Yu Lo

112/04/20

Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on 

Citation: 6

Page 2: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Network-on-chip (NoC) has become a critical shared resource in the emerging Chip Multiprocessor (CMP) era. Most prior NoC designs have used the same type of router across the entire network.

While this homogeneous network design eases the burden on a network designer, partitioning the resources equally among all routers across the network does not lead to optimal resource usage, and hence, affects the performance-power envelope. In this work, we propose to apportion the resources in an NoC to leverage the non-uniformity in network resource demand. Our proposal includes partitioning the network resources, specifically buffers and links, in an optimal manner.

2

Page 3: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

This approach results in redistributing resources such that routers that require more resources are allocated more buffers and wider links compared to routers demanding fewer resources. This results in a novel heterogeneous network, called HeteroNoC, which is composed of two types of (1)routers – small power efficient routers, and (2)big high performance routers.

We evaluate a number of heterogeneous network configurations, composed of big and small routers, and show that giving more resources to routers along the diagonals in a mesh network provides maximum benefits in terms of performance and power. We also show the potential benefits of the HeteroNoC design by (1)co-evaluating it with memory-controllers and (2)configuring it with an asymmetric CMP consisting of heterogeneous cores.

3

Page 4: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Mesh Architecture is composed by.. Router ,Core

Router Architecture is composed by.. Crossbar, Buffer, SA(Switch Allocator)

4

Page 5: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

In traditional NOC mesh architecture, Whether it is reasonable to use the same network structure design?

Need to review the higher part of Mesh network utilization(Buffer utilization, Link utilization). Central

Diagonal

5

Page 6: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

What can we do? Network resource Re-distribution (Called HeteroNoC). How many big and small router? How to place can get the maximum benefit? (Performance , Power)

What is that we can a HeteroNoC design be leveraged to achieve better performance/power in a CMP? Combined HeteroNoC and re-placement memory-controllers.

6

Page 7: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

HeteroNoC Buffer Re-distribution The number of VCs affects power consumption and performance. Small router (2VCs) Big router (6VCs)

HeteroNoC Link Re-distribution Flit width 192b -> 128b (flit width significantly affects router power.) Wide link 256b Narrow link 128b

HeteroNoC combined Link and Buffer Re-distribution Two 128b flits can be combined to be simultaneously sent over the

wider link.

7

Page 8: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

The total number of VCs and network bi-section bandwidth are kept the same in the baseline and heterogeneous networks. VCs(Virtual Channel) :

。4 Baseline router (VCs = 4 x 3) = 3 Small router + 1 Big router (VCs = 2 x 3 + 6)

Bandwidth(Link resource) : 。W(homo) x N = W(hetero) x N(narrow) + 2W(hetero) x N(wide)

Reduction in power consumption 0.67 x N^2 >= 0.3 x N(s) + 1.19 x (N^2 – N(s)) (N(s) > 38)

8

Page 9: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Reposition the Resources(Three type layout) Only links to redesign (+B). Combined router and link redesign (+BL).

Three kinds of layout placement (Big router) Center layout Row 2_5 layout Diagonal layout

9

Page 10: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

In the HeteroNoC architecture with combined buffer and link re-distribution, we use both 128b and 256b links with flit size being 128b.

When communication takes place between a small and a big router (or two big routers) between which a 256b link exists, two 128b flits can be combined to be simultaneously sent over the wider link.

10

Baseline Small router to Small router Two Small router to Big router

Big router to Big routerTwo Small router to Big router Big router to Small router

Page 11: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Impact on Buffer Read/Write Stage(Left figure) The primary overhead comes from including the

second layer of smaller muxes.

Impact on SA Stage(Right figure) The area overhead of additional arbiters is around

2.5% of the router area (obtained from Synopsys synthesis).

11

Page 12: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Experimental Methods Synthetic traffic

。Network-only Analysis Standard Application Benchmark

。System-level Analysis。IPC(Inter-Process Communication) improvement

Performance/power improvement Average latency Average power consumption Network throughput Latency breakdown Power breakdown

12

Page 13: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Evaluate three placement layouts Diagonal (Best layout) Center (middle layout) Row 2_5 (Poor layout)

Buffer-only redistribution Combined buffer and link redistribution

13

Page 14: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Latency reduction reason Placement Big router in a crowded place.

Combine two flits and transmit them simultaneously over the wider link.

Power reduction reason Buffer reduction(33%)

Crossbar power(Reduction the flit size)

14

Page 15: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Commercial Application PARSEC Application

15

Page 16: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

16

The Diagonal+BL has the best results with 12% and 10% average improvements in IPC.

We not only use 33% fewer buffer and save in network power consumption, but also see IPC improvement.

Page 17: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Co-evaluating it with memory-controllers. Reduce request-response latency when cache miss.

Reduce request to memory controller latency.

17

Page 18: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Conclusion This paper propose HeteroNoC designing a

heterogeneous network, composed of big and small routers, by redistributing the buffer and link bandwidth.

The diagonals layout performs significantly better than the traditional homogeneous network under a variety of traffic patterns.

My Comment It proposed a good idea for NoC improvement, And

is very simple to implement.

18

Page 19: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

Redesign network router and link architecture Small power efficient routers. Big high performance routers. Narrow link (width =128 bits). Wide link (width =256 bits).

Reposition the Resources(Three type layout) Only links to redesign. Combined router and link redesign.

19

Page 20: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

20

Network-only analysis UR(uniform random) pattern. NN(Nearest Neighbor) pattern.

Use other network topology Tours.

Standard Application Benchmark

Page 21: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

What’s problem High single-threaded performance when thread

parallelism is low.

High throughput when thread parallelism is high.

Modify routing algorithms Using Table-based routing (Packet to/from large

core)

21

Page 22: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

(1)Should we design a heterogeneous NoC with two types (small and big) of routers?

(2)If yes, how many such big/small routers do we need and how do we redistribute the buffer and link width between these routers without changing the original bisection width and buffer resources?

(3)Is there an optimal placement of big routers that would maximize the performance and power benefits compared to the baseline homogeneous mesh? and (4) How else can a Het- eroNoC design be leveraged to achieve better performance/power in a CMP?

22

Page 23: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

23

Network-only analysis(Synthetic) UR(uniform random) pattern NN(Nearest Neighbor) traffic pattern(Worse results) Bit-complement traffic pattern Self-similar traffic pattern

Use other network topology Tours.

Standard Application Benchmark Network analysis

IPC(Inter process)

Page 24: Presenter: Min-Yu Lo 2015/10/19 Asit K. Mishra, N. Vijaykrishnan, Chita R. Das Computer Architecture (ISCA), 2011 38th Annual International Symposium on

24