synchronized progress in interconnection networks (spin) · synchronized progress in...

of 39 /39
Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech ([email protected]) Tushar Krishna Georgia Tech ([email protected]) Paul V. Gratz Texas A&M University ([email protected]) ISCA 2018 Session 8B: Interconnection Networks

Upload: others

Post on 14-Jun-2020

10 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

Page 1: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for

deadlock freedom

Aniruddh Ramrakhyani Georgia Tech

([email protected])

Tushar Krishna Georgia Tech

([email protected])

Paul V. Gratz Texas A&M University

([email protected])

ISCA 2018 Session 8B: Interconnection Networks

Page 2: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Network Routing2

B A

C

F

E

D

G

H

I

J

KDeadlock

Page 3: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

3

A

C D

B

Deadlock

E

F

Page 4: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

Deadlocks are hard to detect during functional verification.

Manifest after a long use time.

Depend on : traffic pattern, injection rate, congestion.

Show up due to system wear out faults and power-gating of network elements which are hard to simulate.

Need a solution for functional correctness !!

4

Page 5: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

A

C D

BE

F1

2

3

45

6

Higher to Lower not

allowed

Page 6: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.  

Implementations: Turn model [5], XY routing, Up-Down routing [20].

Limitations:

Routing Restrictions: Increased Latency, Throughput loss, Energy overhead

Require large no. of VCs for fully adaptive routing.

6

Page 7: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

A

C D

BE

F

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

E

FVC0

VC0

Escape-VC

Escape-VC

Page 8: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Solution II: Duato’s Theory

Limitations:

Energy and Area overhead of escape VCs.

Additional routing tables/logic for routing within escape-VC.

8

Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

Page 9: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity, Throughput Loss.

Solution IV: Deflection Routing

Assign every flit to some output port even if they get misrouted.

Implementation: BLESS [10], CHIPPER [35]

Limitation: Livelocks, non-minimal routing

9

Page 10: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

ControlDeflection Routing

SPIN

Theory

Metric

1 61 2

2 2

1

1 1

Can we do better ??

Page 11: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

OutlineRouting Deadlocks

State of the Art

Dally’s Theory

Duato’s Theory

Flow Control Routing

Deflection Routing

SPIN : Synchronized Progress in Interconnection Networks

Evaluations

Conclusion

11

Page 12: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN : Key Idea12

A

C D

B

Deadlock

E

F

What if: We coordinate the movement of every packet to the next hop at a given time ??

Simultaneous Synchronized

Movement

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

spin complete

Page 13: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN : Key Idea13

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

Each spin leads to one hop forward movement of all deadlocked packets.

One spin may not resolve the deadlock. If so, spin can be repeated

Deadlock is guaranteed to be resolved in a finite number of spins [proof in paper, Sec. III]

Page 14: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN : Key Idea14

A

C D

BE

F

First spin complete

Second spin complete

Page 15: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN : Key Idea15

A

C

D

B

E

F

Packets E &B exit the loop

Deadlock Resolved

Page 16: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Implementation Example

Micro-architecture

FAvORS

Evaluations

Conclusion

16

Page 17: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN: Implementation ExampleSPIN is a generic deadlock freedom theory that can have multiple implementations.

We choose a recovery approach as deadlocks are rare scenarios (See Sec. II-F).

Our Implementation:

Detect the Deadlock.

Coordinate a time for spin.

Execute the spin.

17

Page 18: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Detect Deadlocks

Use counters.

Placed at every node at design time. Optimize by exploiting topology symmetry (See Static Bubble [6]).

If packet does not leave in threshold time (configurable), it indicates a potential deadlock.

Counter expired ? Send probe to verify deadlock.

18

Page 19: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Probe Returns: Deadlock Confirmed

Page 20: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

Probe returns to sender: Cyclic buffer dependence, hence deadlock.

Next, send a move msg. to convey the spin time Upon receiving move msg., router sets its counter to count to spin cyle.

20

Page 21: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Move Msg.21

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Move returns

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Page 22: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : spin22

A

C D

BE

F

Counters expire together in the spin

cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Page 23: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : spin23

A

C

D

B

EF1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Page 24: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

Counter expires again, process repeated.

Optimization: send probe_move after spin is complete.

probe_move checks if deadlock still exists and if so, sets the time for the next spin.

Details in paper (Sec. IV-B).

24

Page 25: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Implementation Example

Micro-architecture

FAvORS

Evaluations

Conclusion

25

Page 26: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

Links are anyways idle during deadlocks.

Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped).

Distributed Design: any router can initiate the recovery.

4% area overhead compared to traditional mesh router in 15nm Nangate [42].

26

Page 27: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Walkthrough Example

Micro-architecture

FAvORS

Evaluations

Conclusion

27

Page 28: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal Adaptive Non-minimal Adaptive.

Route Selection Metrics: Credit turn-around time Hop Count

More details in paper (Sec. V).

28

Page 29: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Evaluations

Conclusion

29

Page 30: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Evaluations 30

Simulator gem5 simulator + Garnet 2.0 Network model

Topologies 8x8 Mesh 1024 node Off-chip Dragon-fly

Link Latency

1-cycle Inter-group: 3-cycleIntra-group: 1-cycle

Traffic Synthetic + Multi-threaded (PARSEC)

Synthetic

Network Configuration:

Page 31: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Evaluations : Baselines8x8 Mesh:

31

Design Routing Adaptivity

Minimal Theory Deadlock Freedom Type

West-first Routing Partial Yes Dally Avoidance

Escape-VC Full Yes Duato Avoidance

Static-Bubble [6] Full Yes Flow-Control Recovery

1024 Node Off-chip Dragon-fly:

Design Routing Adaptivity

Minimal Theory Deadlock Freedom Type

UGAL [37] Full No Dally Avoidance

Page 32: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

0255075

100

0.01 0.08 0.15 0.22 0.29 0.36Inj. Rate (flits/node/cycle)

Neighbor

Late

ncy

(cyc

les)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

50% higher throughput compared to UGAL_Dally

62% higher throughput compared to

Minimal Routing 1-VC

25% higher throughput compared to UGAL_Dally

Page 33: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Transpose (1-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

FAvORS_Min_1VCSPIN

West-First_1VCDally

68% higher throughput compared

to West-first 3-VC

8% higher throughput compared

to Escape-VC 3-VC

80% higher throughput compared

to West-First 1-VC

10% higher throughput

compared to Static-Bubble 3-VC

Page 34: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

ConclusionDeadlocks are a fundamental problem in Interconnection Networks.

SPIN is a new deadlock freedom theory

Simultaneous packet movement for deadlock recovery

No routing restrictions or escape-VCs required.

Enables true one-VC fully adaptive routing for any topology

Salient Features of our Implementation:

Scalable: Distributed Deadlock Resolution

Plug-n-Play: topology agnostic

68% higher (Mesh) & 62% higher (dragon-fly) saturation throughput.

34

Page 35: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

ConclusionPractical Applications:

35

On-Chip Mesh (Intel Xeon Phi, Cavium Thunder X2)

Super-computers Dragon-fly (Cray XC Networks)

Datacenters JellyFish (HP), Fat Tree (Google)

Irregular Topologies Faults (Static Bubble [6])Power-gating (Router Parking[29])

NoC Generators FlexNoc (ARTERIS),Sonics GN (SONICS)

Domain specific Accelerators

Eyeriss [15]

Thank you !!

Page 36: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Back-up36

Page 37: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

SPIN : ApplicationsSPIN is a generic deadlock freedom theory

Scalable: distributed deadlock resolution

Plug-n-Play: doesn’t require knowledge of topology

SPIN can thus be used in :

On-chip networks: Mesh (Intel SCC, Tilera Tile64)

Supercomputers: Dragon-fly (Cray XC Networks)

Datacenters: Jellyfish (HP), Fat Tree (Google)

Static & Dynamically Changing Irregular topologies due to faults (Static Bubble [6]) & power-gating (Router Parking [29])

NoC Generators (Opensmart [13]) & Domain specific accelerator (Eyeriss[15])

37

Page 38: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Probe Returns: Deadlock Confirmed

Page 39: Synchronized Progress in Interconnection Networks (SPIN) · Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia

Implementation Example : Move Msg.39

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Cntr

Move returns

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin