highly fault-tolerant noc routing with application-aware congestion management doowon lee, ritesh...

33
Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee , Ritesh Parikh and Valeria Bertacco University of Michigan

Upload: eric-edwards

Post on 13-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

Highly Fault-tolerant NoC Routingwith Application-aware

Congestion Management

Doowon Lee, Ritesh Parikh and Valeria BertaccoUniversity of Michigan

Page 2: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

2

Wide Range of Applications

(picture sources) 1. N-body simulation: https://www.astro.rug.nl/~weygaert 2. semiconductor: http://spectrum.ieee.org 3. computational biology: http://csbio.cs.umn.edu/ 4. molecular structure: http://nanotechnologyuniverse.com

everyday applications

cloud computing

physical simulation

scientific applications

computationalchemistry

computational biology

semiconductorsimulation

varying computation characteristic,user requirement, etc.

Page 3: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

3

Application Running on Network-on-Chip

(picture sources) 1. Video encoder: Gary Sullivan et al., Standardized Extensions of High Efficiency Video Coding (HEVC) 2. Tilera TILE-Gx8072: http://www.tilera.com

application example: video encoder

chip multiprocessorwith network-on-chip (NoC)

mapping

analysis

communication frequency

destination

sour

ce

64-thread simulationof SPLASH-2 (ocean)

(number of flits)

some pairscommunicatemore frequently

A

B

B

A

Page 4: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

4

Fragile Networks-on-Chip

increasing transistor density transistor reliability↓

network-on-chip… possible single point of failure

22 nm(Intel)

14 nm(Intel)

7 nm(IBM)

tail of transistor scaling

permanent faults solution:network-on-chiproutingreconfiguration

Page 5: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

5

How to reduce NoC degradation from faults?

state-of-the-artrouting reconfiguration[Aisopos 11]

0 10 20 30 40 50 600

2

4

6

8

10

number of faults affecting the NoC

satu

ratio

n th

roug

hput

(fl

its/c

ycle

) minimum throughput requirement

our goal

motivating experiment: fault vs. performance degradation

KEY IDEA: application-aware routing optimized to application’s communication patterns

Network-on-chip reconfiguration entails performance degradation

Page 6: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

6

solution 1problem

Application-Aware Routing (1/2)

various route options(no restriction)

S

D

path diversity = 6

1

1

1

1

1

2

1

3

1 2

1 3

S

D

1

1

1

1

1

1

1

1

1 2

path diversity = 3

deadlock-free

deadlock possible

avoid deadlock

by restricting turns 0 0

How do we find adaptive routing optimized to communication patterns?

Page 7: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

7

solution 2

Application-Aware Routing (2/2)

various route options(no restriction)

S

D

path diversity = 6

1

1

1

1

1

2

1

3

1 2

1 3

Where to best place turn restrictions? NP-complete problem

path diversity = 6

11S

D

1

1

1

2

1

3

1 3

1 2

How do we find adaptive routing optimized to communication patterns?

OUR CONTRIBUTION: turn-restriction placement heuristic

deadlock possible

avoid deadlock

by restricting turns

problem

Page 8: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

8

Presentation Outline

• FATE (Fault- and Application-aware Turn-model Extension)(1) Turn-enabling rules (2) Load estimation “How to reduce search?” “Which is the most valuable turn?”

(3) Overall routing computation algorithm

• Experimental evaluation• Conclusions

0 1

3 4

2

5

6 7 8

Page 9: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

9

How to reduce turn-restriction search?

To avoid unfruitful turn-restriction patterns…

0 1

2 3

pattern 1. network disconnection pattern 2. non-minimal restriction

pattern 3. possible deadlock

0 1

3 4

2

5

0 1

2 3

Page 10: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

10

Turn-Enabling Rules

0 1

3 4

2

5

6 7 8

basic rules

enable adjacent turns(cycle, node, link)

0 1

4 5

2

6

8 9

3

7

10 11

15141312

advanced rules

enable remote turns(horizontal, vertical, diagonal)

… each time a turn is disabled, several others should be enabledTo avoid unfruitful turn-restriction patterns…

Page 11: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

11

Traffic-Load Estimation

Which is the most valuable turn? use traffic-load estimation to decide

specific goals(1) balancing link utilization(2) prioritizing turns that are critical

load calculation steps

pathdiversity

linkload

turnload

cycleload

weightscaling

take into account hop-by-hop route-decisions

Page 12: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

12

Traffic-Load Estimation Step by Step

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1

1

1

1

1

1

1

1

1

2

1

0

3

3

1

2

3

3

3

3

6

6

path diversity

link load

turn load

cycle load

weight scaling

multiply by communication frequencymedium traffic low traffic

high traffic

Page 13: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

13

Example: Link, Turn, Cycle Load (1/2)

link load (from path diversity)

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1

1

pathdiversity

1

1

1

1

1/2 = 0.5

0.5

link load

0.25

0.25

0.25

0.25

90.25

0.25

turn load

0.125

0.1250

0.25

9 10

1413

cycle load

sum: 2 4

0.125

diversity link turn cycle scaletraffic-load estimation 5 steps:

1

1

1

2

1

0

0.17

0.17

0.17

0.33

0.17

0

6

Page 14: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

14

Example: Link, Turn, Cycle Load (2/2)

link load (from path diversity)

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1/2 = 0.5

0.5

link load

0.25

0.25

0.25

0.25

0.25

turn load0.25

0(no path)

9 10

1413

0.125

cycle load

14

sum:0.3750.25

diversity link turn cycle scaletraffic-load estimation 5 steps:

0.17

0.17

0.17

0.33

0.17

0

Page 15: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

15

Example: Weight Scaling

1

4 5

8 9

7

11

13S1

D12

6

10

14

0.125

0.250.38

0.38

most congested cycle

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

2.5

53

3

9.8

8

13.2

12.5

9.2

9

scaling

sourcedestination S1D1 S2D2communication frequency 20 8

D2

S2

13.5

Page 16: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

16

Putting it all together

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

1) evaluate turns, one at a time (choose the one leading to least congestion)

2) apply turn-enabling rules

iterate this process until no undecided turn is left

1 2

3 4

Page 17: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

17

Backtracking

deadlock possible due to greedy turn-restriction selections turn-enabling rules do not resolve all deadlock-causing patterns

backtrack to the last decision

example placement

0 1

4 5

2

6

8 9

3

7

10 11

decision tree

node 5turn NW

node 6turn NE

deadlockdetected

backtra

ck

node 3turn SW

Page 18: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

18

FATE Route-Computation Procedure

start (trigger)

end

estimate traffic load

choose turn to bedisabled

deadlock?disconnect?

no undecidedturn?

apply turn-enablingrules

back

trac

k

loop

: disabled turn: enabled turn: undecided turn

: high traffic: medium traffic: low traffic

network example

procedure flowchart trigger: (1) new application launch(2) fault occurrence

Page 19: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

19

Presentation Outline

• FATE routing

• Experimental evaluation

– Experimental setup

– Evaluation on faulty topologies

– Evaluation on fault-free topologies

– Overheads

• Conclusions

Page 20: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

20

Experimental Setup

• BookSim simulation with 8 X 8 mesh networks– 3-stage router pipeline, 2 VCs/protocol class, 5 flits/VC

• Fault injection– faults in bidirectional links– 5 fault rates: 1 faulty link, 3%, 5%, 10%, and 15% faulty links– 10 random fault patterns for each fault rate

• Traffic benchmarks– 5 synthetic patterns: bit complement, bit reversal, shuffle, transpose,

uniform random– 11 traces from SPLASH-2 multi-threaded workloads

• generated from gem5 simulation with MESI cache coherence• 4 memory controllers at mesh corners

Page 21: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

21

Prior Routing Solutions

• Fault-tolerant routing– Breadth-First Search (BFS) [Schroeder 91, Aisopos 11]– Depth-First Search (DFS) [Sancho 04]

• Application-aware routing– Bandwidth-Sensitive Oblivious Routing (BSOR) [Kinsy 09, Kinsy 13] – Application-Specific Routing Algorithms (APSRA) [Palesi 08]

• Fully-adaptive routing on 2D mesh (congestion management)– Dynamic XY (DyXY) [Li 06]– Neighbor on Path (NoP) [Ascia 08]– Regional Congestion Awareness (RCA) [Gratz 08]

Page 22: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

22

Saturation Throughput for Synthetic Patterns

number of faulty links

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

0

0.01

0.02

0.03

0.04

0.05BFS DFS BSOR APSRA FATE

bitcomp bitrev shuffle transpose uniform0

0.01

0.02

0.03

0.04

0.05BFS DFS BSOR APSRA FATE

traffic pattern

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

9.5% 10.6% 17.7%23.3%

33.3%

5.5% -0.5% 0.1%2.9%

9.3%

less performancedegradation asfaults increase

33.3% ↑ over fault-tolerant routing

9.3% ↑ over app.-aware routing

gains maximizedwith unbalancedload

still provide gainwith uniform load

(15% fault rate)

fault-tolerant application-aware our solution

Page 23: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

23

Packet Latency for SPLASH-2 Traces

1 fault 3% faults 5% faults 10% faults 15% faults0

20406080

100120

BFS DFS BSOR APSRA FATE

aver

age

pack

et la

tenc

y (c

ycle

s)

number of faulty links

0

20

40

60

80

100

120

benchmark programaver

age

pack

et la

tenc

y (c

ycle

s)

minimal increaseuntil 5% faults

up to 59% (13%)latency reductionover BFS (APSRA)

13%

228 cycles

59%

significantly lowerlatency in 5 programs

(15% fault rate)

Page 24: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

24

Performance on Fault-Free Meshes

3 VCs 4 VCs 6 VCs0

0.02

0.04

0.06

0.08

0.1DORDyXYNoPRCA1DBFSDFSBSORAPSRAFATE

number of VCs

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

fully-adaptive

fault-tolerant

application-aware

Compared to DOR, fault-tolerant and application-aware routing,FATE always provides higher saturation throughput ( better traffic-load estimation)

Compared to fully-adaptive,FATE outperforms at small number of VCs ( more VCs for normal transfer)

deterministic

our solution

Page 25: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

25

Overheads

• Software computation– 2-4 sec for 8X8 meshes on Intel Xeon® processor

(two orders of magnitude faster than APSRA)

– ~110 turn-placement attempts

(little dependence on fault rate)

• Hardware overheads– Area: 6% increase (routing table, route-computation logic)

– Power consumption not measured

• Better power-efficiency than APSRA

• Can be more power-efficient than application-agnostic solutions

when reusing same routing multiple times

Page 26: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

26

Conclusions

• FATE provides highly fault-tolerant routing with graceful

performance degradation by leveraging application traffic patterns

• Performance improvement over existing fault-tolerant routing

33% improvement in saturation throughput (synthetic traffic patterns)

59% improvement in packet latency (SPLASH-2 traces)

• Two orders of magnitude faster route-computation

Page 27: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

27

Thank you! Question?

Page 28: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

28

Backup Slides

Page 29: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

29

Various Turn-Restriction Choices

exponential increase of turn-restrictionchoices as network size increases

4 possibilities

16 possibilities (not shown other 8 cases)

2-D mesh with M nodes con-tains possibilities𝟒(√𝑴−𝟏)×(√𝑴−𝟏)

example 1: 4 nodes

example 2: 6 nodes

Page 30: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

30

Basic Turn-Enabling Rules(Cycle, Node, Link)

0 1

2 3

rule 1(cycle): undecided

: enabled: disabled

turn types

0 1

3 4

2

5

6 7 8

rule 2 (node)0 2

5

6 7 8

1

3 4

rule 3 (link)

Which turns should be enabled upon a turn-restriction decision?(1) to minimize the number of restrictions(2) to guarantee deadlock-freedom

0 1

3 4

2

5

violatedturn

What happens ifwe break the rules?

deadlock happens

Page 31: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

31

Advanced Turn-Enabling Rules(Common Link, Opposite-corner Turn)

: undecided: enabled (basic)

: disabled

turn types

: enabled (advanced)

: candidate

0 1

4 5

2

6

8 9

3

7

10 11

15141312

rule 4: common link0 1

4 5

2

6

3

7

Why rule 4? Let’s applying basic rules…

should beenabled forboth candidates

rule 5: opposite-corner turn0 1

4 5

2

6

8 9

3

7

10 11

15141312

horizontalenabling

verticalenabling

diagonalenabling

see paperfor details

Page 32: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

32

Applying Basic Turn-Enabling Rulesto Faulty Topologies

rule 1: cyclespecial case – no doublecount:counted only for one cycle

0 1

3 4

2

5

6 7 8

mutualturn

rules 2 & 3: node & link

no special change

0 1

3 4

2

5

6 7 8

deadlock when disabling only mutual turn

Page 33: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan

33

Applying Advanced Turn-Enabling Rulesto Faulty Topologies

rule 4: common link

apply only towards fault-free directions

0 1

4 5

2

6

8 9

3

7

10 11

15141312

rule 5: opposite-corner turn

apply as if fault-free

0 1

4 5

2

6

8 9

3

7

10 11

15141312