best of both worlds: a bus-enhanced network on-chip (benoc)

30
Best of Both Worlds: A Bus- Enhanced Network on-Chip (BENoC) Ran Manevich, Isask’har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel Institute of Technology May, 2009

Upload: taylor

Post on 08-Jan-2016

37 views

Category:

Documents


2 download

DESCRIPTION

Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC). Ran Manevich, Isask ’ har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny. Technion – Israel Institute of Technology. May, 2009. Network on-Chip : the Good News . Interconnect for SoCs, CMPs and FPGAs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

Best of Both Worlds: A Bus-Enhanced Network on-Chip

(BENoC)

Ran Manevich, Isask’har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny

Technion – Israel Institute of Technology

May, 2009

Page 2: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

2

Network on-Chip : the Good News

Interconnect for SoCs, CMPs and FPGAs Multi-hop, packet-based communication Efficient resource sharing

Scalable performance and efficiency in Power Area Design productivity

System Bus

Page 3: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

3

Network on-Chip : the Bad News

Increased and hard-to-predict latency due to multi-hop and sharing Time critical signals

Broadcast? multicast? No easy solutions Slow (10s of cycles)

I wish I had a bus at hand ….

Page 4: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

4

Solution: Bus-Enhanced NoC (BENoC)

Bus re-introduced as a NoC “add-on”

Use NoC for data Optimized for high bandwidth

Use bus for short meta-data Low bandwidth, low latency Broadcast, multicast

Overhead should be justified!

R

RR RR

R

R

R RR

R

R

R R

R

R

R

R R

R

R

R

R

R

RR

RR

R

R

R

R

Module Module

Module Module

Module Module

Module Module

Module

Module

Module

Module

Module

Module

Module

Module

Page 5: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

5

In-band support of time critical communication; and:In-band Multicast/Broadcast Complex router

implementation Suffer from multi-hop latency

Existing Bus-NoC hybrids Form a topological hierarchy Typically bus used for local

communication

Related WorkModule

Module

Module

Module

Module

Module

Module Module Module

R

R

R R

R

R R R

R

Module Module

Module Module

Module

Module

Module

Module

R R R

Module Module

Module Module

Module

Module

Module

Module

R R R

R R R

Page 6: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

6

BENoC Services

Fast unicast and multicast signaling CMP cache example

Anycast Find resources that fulfills certain

conditions E.g., “Looking for an idling DSP”; or

“Where are the 5 closest multipliers?” Convergecast

Efficient collection of feedback back to the initiator

Barrier synchronization, …

Page 7: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

7

Additional BENoC Applications

NoC control Router configuration

E.g., routing table configuration Adapt NoC routing for load balancing Fault discovery and recovery

System control Power management Resource load balancing

Debug

Page 8: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

8

Outline Introduction MetaBus architecture MetaBus latency and energy analysis CMP cache use case

Page 9: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

9

Conventional System Buses

Figure is copied from “Amba Specifications Rev 2.0” - http://www.arm.com/products/solutions/AMBA_Spec.html

Bandwidth optimized Poor scalability Not suitable for tasks in

BENoC

Page 10: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

10

MetaBus Design Requirements

Low area, low power Low bandwidth Low latency Simple Versatile Scalable

Multicast and broadcast support

Acknowledgement

R

R

R

R

R R

R

RR R R

RR R R

R

Module

Module

Module

Module

Module

Module

Module

Module

ModuleModule Module Module

ModuleModule Module Module

“MetaBus”

Page 11: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

11

MetaBus Architecture

Many possible implementations Example: tree topology with distributed

arbitration

Module#1

Module#2

Module#3

Module#4

Module#5

Module#6

Module#7

Module#8

Module#9

BusStation

BusStation

BusStation

BusStation

Root

BusStation

Page 12: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

12

Module#1

Module#2

Module#3

Module#4

Module#5

Module#6

Module#7

Module#8

Module#9

BusStation

BusStation

BusStation

BusStation

Root

BusStation

Data Path

Data to rootData to receivers

Page 13: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

13

Module#1

Module#2

Module#3

Module#4

Module#5

Module#6

Module#7

Module#8

Module#9

BusStation

BusStation

BusStation

BusStation

Root

BusStation

Address word propagates to the rootData word

1Data word 2

propagates to the modules

Example: Broadcast of Two Words

Page 14: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

14

Module#1

Module#2

Module#3

BusStation

BusStation

Root

BusStation

Distributed Arbitration Mechanism

Bus RequestBus Grant

Page 15: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

15

Module#1

Module#2

Module#3

Module#4

Module#5

Module#6

Module#7

Module#8

Module#9

BusStation 3

BusStation 4

BusStation 5

BusStation 2

Root

BusStation 1

Address word propagates to the rootData word

1propagates to the modules

Masking Saves Power

Mask1Mask2Mask3Mask4Mask5

Mask1

Mask2

Mask3

Mask4

Mask5

Unicast from Module#3 to Module#5

1 0

1 0 1

10101

Page 16: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

16

(Binary )Bus Station

Page 17: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

17

MetaBus Floorplan – An Example

64 modules balanced binary MetaBus

Page 18: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

18

Outline Introduction MetaBus architecture MetaBus Latency and energy analysis CMP cache use case

Page 19: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

19

Analysis Highlights 1/4

NoC Broadcast+Unicast Energy/Transaction:

2NoC broadcast flits NL NDE V N K C C

2

1

2NoC unicast flits W NL ND

nE V N L C C

Page 20: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

20

Analysis Highlights 2/4

MetaBus Broadcast and Unicast Energy/Transaction:

2,

12

,1 1

D D

MetaBus flits D BL BD upbroadcast

B Bn n

flits BL R BD down Rn n

E V N B C C

V N C B C B

2,

2,1

MetaBus flits D BL BD upunicast

flits R D BL D BD down

E V N B C C

V N B B C B C

Page 21: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

21

Analysis Highlights 3/4

NoC unicast and broadcast latency:

NoC unicast CiR Nclk Nclk flitsT nN T T N

NoC broadcast Nclk flitsT n T N

Page 22: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

22

Analysis Highlights 4/4

MetaBus unicast and broadcast latency:

,,

,

, ,

,

1.5

0.7 0.4

0.7 0.4

MetaBus flits

BL BD upD BL BD up BL BL

BD up

R BL BD down BL BD downD BL BL

BD down

T N

C CB R C R C

C

B C C R CB R C

C

Page 23: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

23

Results - Energy Consumption

Energy consumption for a 3 data words broadcast and unicast transactions

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35 40

Number of Modules

En

erg

y p

er t

ran

sact

ion

[n

J]

MetaBus Broadcast

Network Broadcast

MetaBus Unicast

Network Unicast

Bus and NoC unicast and broadcast energy per transaction

10X10 mm chip

64 modules mesh

1GHz NoC clock

Speed optimized bus

@0.18um

Page 24: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

24

Results - Latencies 3 data words broadcast and unicast

transactions latencies in system with a frequency and a speed optimized MetaBus.

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35 40

Number of modules

La

ten

cy

[n

s]

MetaBus

Network Broadcast

Network Unicast

Figure 9: Bus and NoC broadcast latencies

10X10 mm chip

64 modules mesh

1GHz NoC clock

Speed optimized bus

@0.18um

Page 25: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

25

Outline Introduction MetaBus architecture MetaBus Latency and energy analysis CMP cache use case

Page 26: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

26

Dynamic Non-Uniform Cache Access

Split large cache into independent smaller banks Non uniform cache access time (NUCA)

Cache lines are moved to shorten access time Dynamic NUCA

Before fetching a into its L1$, a CPU needs to find the L2 cache storing the line

CPUL1$

L2$ L2$

L2$ L2$

L2$ L2$

L2$ L2$

L2$ L2$

L2$ L2$

L2$ L2$

L2$ L2$

CPUL1$

CP

UL1$

CP

UL1$

CPUL1$

CPUL1$

CP

UL1

$

CP

UL1

$

L2$

CMP

(Chi

p Mul

ti Pr

oces

sor)

Page 27: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

27

Simulation Setup 16 processors, 64 L2 cache banks PARSEC and SPLASH-2 benchmarks Vanilla Wormhole NoC Simulation account for bus latency,

arbitration time, etc.

Page 28: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

28

Simulation Results

Performance improvement in BENoC compared to a NoC-based CMP

(a) average read transaction latency; (b) application speed

Page 29: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

29

Summary Current NoCs are largely distributed

Borrowing concepts from off-chip networks On-chip environment provides an

opportunity Enhancing the network with a bus gives the

best of both worlds Advanced services are easily supported

Anycast, management and control Cost effective

Power and performance Analysis and simulation

Page 30: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

30

Thank you!

Questions?

[email protected]

Bus-Enhanced NoC

M odule

M odule M odule

M odule M odule

M odule M odule

M odule

M odule

M odule

M odule

M odule

QNoCResearch

GroupGroup

ResearchQNoC