performance and power efficient on-chip communication using adaptive virtual point-to-point...

20
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to- Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol Computer Engineering Department, Sharif University of Technology, Tehran, Iran [email protected]

Upload: everett-harvey

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point

Connections

M. Modarressi, H. Sarbazi-Azad, and A. TavakkolComputer Engineering Department, Sharif University of Technology, Tehran,

[email protected]

Page 2: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

2 Sharif University of Technology

Outline

Introduction and Motivations Virtual Point-to-Point (VIP) Connections Static VIP Construction Scheme Dynamic VIP Construction Scheme Setup Network Evaluation Results Conclusions and Future Work

Page 3: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

3

On-Chip Communication Mechanisms

Packet-Switched NoCs Good Resource Utilization Modest Design Effort/Time Due to Structured and Predictable Links Some Power and Performance Overheads Due to Multi-Stage

Pipelined Routers

Dedicated Point-to-Point Links Ideal Power and Performance Poor Scalability: Significant Area Overhead for Large Systems Significant Design Effort/Time Due to Non-Predictable Link

Properties

Virtual Point-to-Point Connections in a Packet-Switched NoC

Page 4: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

4

VIP Connections

VIP: VIrtual Point-to-point Connections Over One VC (Virtual Channel) of Each Physical Channel Bypass Some Router Pipeline Stages

Inexpensive Extensions to a Traditional Wormhole Router Router Control Unit, Arbiter, Buffer of the VIP Virtual Channels

Page 5: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

5

Router Architecture

Buffer at the VIP Virtual Channels Is Replaced by a Register (1-Flit Buffer)

VIP Paths Are Kept by VIP Allocator Units at Output Ports Determines Which Input Is Connected to This Port Along the VIP

Allocates Output Port to VIP When Control Signals Indicate That the VIP Has an Incoming Flit to ForwardA Flow-Control Mechanism Prevents Starvation in Packet-Switched Flits

Page 6: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

6

VIP Connections

A VIP Is Constructed by Chaining the VIP Registers in the Routers Between the Source And Destination Nodes of a Communication Flow

Provides a Virtual Dedicated Pipelined Link With 1-flit VIP Buffers as Staging Registers Flits Only Travel Over the Crossbars and Links Which Cover the Actual

Physical Distance Between Their Source and Destination Nodes

Skip Through Buffer Read, Buffer Write, and Allocation Operations

Page 7: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

7

VIP Connections

VIPs Are Not Allowed to Share a Common Link To Remove Buffering, Arbitration,…

A Limited Number of VIPs in a Network

But VIPs Cover a Significant Portion of On-Chip Traffic Due to Communication Locality In Most Multi-Core SoC Applications Each Core Communicates With a

Few Other Cores

In CMP Workloads Each Node Tends to Have a Small Number of Favored Destinations for Its Messages

Page 8: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

8

VIP Construction Algorithm - Static

Based on Application Traffic Pattern

Input Applications Are Described by a Task-Graph (TG)

A Heuristic Algorithm

Map the TG Cores into the Nodes of a Mesh-based NoC

Construct VIP for TG Edges in Order of Their Communication Volumes

Find a Path Through Packet-Switched Network for a TG Edge If There Are Not Sufficient Free Resources to Build a VIP for It

Page 9: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

9

VIPs for the VOPD Application

VIPs Cover 100% of the On-Chip Traffic for This Application Static VIP Construction Scheme:

Benchmarks: VOPD, MWD, MPEG, MP3+H263 Up to 58% Reduction in Message Latency (39% on Average) Up to 65% Reduction in Power Consumption (49% on Average)

V8V9

V10

V15

V13V14

V11

V12

V7

V6

V3

V4

V16

V5

V2V170

27

49

362

362362

353

16

16

157

16

16

16

300

16

500313

313

94

16

357

V13

V1

V7

V12

V2

V6

V9

V3

V5

V8

V4

V16

V14 V15 V11 V10

Page 10: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

10

VIPs vs. Physical Point-to-Point Connections

VIPs Offer: Power and Performance Close to Dedicated Physical Point-to-

Point Connections

More Flexibility Dynamically Reconfigurable Based on the Traffic Pattern of the

Running Application

Less Design Effort Customized Dedicated Connections Over Regular Components

Page 11: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

11

Dynamic VIP Construction

An Alternative VIP Construction Scheme

Dynamically Changes the VIP Connections in Response to Communication Requirements Imposed By the Running Application Monitoring the NoC Traffic Detecting High-Volume Communications and Constructing a VIP for

Them

Select the Best Route for a VIP Using a Simple Setup Network

Page 12: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

12

Setup Network

Setup Network Structure A Light-Weight Control Network

Simple Node Structure and Small Bit-Width The Same Topology as the Main Data Network

Setup Network Operation Keep the Track of the Number and Destination of Packets Sent by Each

Node Select Traffic Flows Weighting Higher Than a Threshold (Bit/Sec.) Finds a Path Along One of the Shortest Paths Between the Source and

Destination Nodes of the Traffic Flow to Construct a VIP

Page 13: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

13

Dynamic VIP Construction

Establishing a New VIP May Tear Down Some Existing VIPs Cost of a VIP: The Cumulative Weight (bit/sec.) of the VIPs That Will Be

Torn Down By This New VIP

Setup Network: Finds the Path With Minimum Cost Sends the Cost to the Source Node to Decide on Establishing the New

VIP

A New VIP Is Established If the Cumulative Weight of the Torn Down VIPs Is Less Than the Weight of the Requesting Traffic Flow

Page 14: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

14

Setup Network

VIP Setup Procedure: Arbitrating Among VIP Setup Requests Running the Distributed VIP Setup Algorithm Setting Up a VIP in the Data Network By Configuring the VIP Allocator

of the Nodes Along the VIP Path Tearing Down Conflicting VIPs

Each Setup Network Node Contains the Configuration Information of Its Corresponding Data Network Node Due to the Distributed Nature of the Algorithm

Short Reconfiguration Time

Page 15: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

S

D

4

5

0

2 9

8

35

5 0

5 4 7

4

5

12

10

9

9

12

15

12

8

21

Port Cost ( Weight of the VIP Using It )

1. Add the Received Cost (4) to the Weight of Ports Along the Shortest Path (the W and N Ports) toward the Destination Node

2. Send the New Costs (9 and 12) to the Neighboring Nodes Along the Destination Node

Select the Minimum Cost and Keep the Port from Which the Smaller Cost Is Received

15

Page 16: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

16

Dynamic VIP Construction

The Setup Network Operates in Parallel with Packet Transmission in Packet-switched Network Hide the Setup Time

The Setup Network Has a Small Bit-width and Operates Infrequently (Only When a High-volume Flow Is Detected) Negligible Power and Area Overhead

Page 17: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

17

Evaluation Results

XMulator NoC Simulator (www.xmulator.org) A C# -based Simulator Orion Power Library

Comparison with a Conventional NoC (5-Stage Pipelined Wormhole Switch)

Multi-Core SoC Traffic: H.263 Decoder+MP3 Decoder, H.263 Decoder+ MP3 Encoder, MP3

Decoder+ MP3 Encoder

38% Reduction in Message Latency, 46% Reduction in Power Consumption

Page 18: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

18

Evaluation Results

Synthetic Traffic: N-Hot Traffic: 80% of Messages to Exactly N Destination, 20% to Randomly

Chosen Nodes

Power (nJ/Cycle)

20

30

40

50

60

70

80

90

100

110

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

Traffic (Message/nod/cycle)

Po

wer

(nJ/c

ycle

) 1-hot Conv.

1-hot VIP

2-hot conv.

2-hot VIP

3-hot Conv.

3-hot VIP

0

100

200

300

400

500

600

0 0.02 0.04 0.06 0.08 0.1 0.12

Traffic (Message/node/cycle)

Ave

rage

Mes

sage

Lat

ency

(C

ycl

es)

1-hot Conv.

1-hot VIP

2-hot conv.

2-hot VIP

3-hot Conv.

3-hot VIP

Message Latency (cycles for 8-flit packets)

Page 19: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

19

Summary and Future Work

Adaptable Virtual Point-to-Point Connections in a Packet-Switched NoC

Benefit from the Advantages of Both Communication Methods Two Static and Dynamic VIP Construction Schemes Significant Power/Latency Reduction

Future Work Comparing the Method with Related Work; Express Virtual

Channels, Single-Cycle Routers, … Precise Area/Power Results by Implementing the NoC in Hardware

Analytical Models Show Small Area Overhead

Page 20: Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol

20

Questions?

[email protected]