synchronized progress in interconnection networks (spin) · synchronized progress in...
Embed Size (px)
TRANSCRIPT

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for
deadlock freedom
Aniruddh Ramrakhyani Georgia Tech
Tushar Krishna Georgia Tech
Paul V. Gratz Texas A&M University
ISCA 2018 Session 8B: Interconnection Networks

Network Routing2
B A
C
F
E
D
G
H
I
J
KDeadlock

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.
3
A
C D
B
Deadlock
E
F

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.
Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.
Cause system breakdown and kill chips.
Deadlocks are hard to detect during functional verification.
Manifest after a long use time.
Depend on : traffic pattern, injection rate, congestion.
Show up due to system wear out faults and power-gating of network elements which are hard to simulate.
Need a solution for functional correctness !!
4

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.
5
A
C D
BE
F1
2
3
45
6
Higher to Lower not
allowed

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.
Implementations: Turn model [5], XY routing, Up-Down routing [20].
Limitations:
Routing Restrictions: Increased Latency, Throughput loss, Energy overhead
Require large no. of VCs for fully adaptive routing.
6

A
C D
BE
F
Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.
Implementation: turn restrictions in escape-VC.
7
E
FVC0
VC0
Escape-VC
Escape-VC

Solution II: Duato’s Theory
Limitations:
Energy and Area overhead of escape VCs.
Additional routing tables/logic for routing within escape-VC.
8
Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.
Implementation: turn restrictions in escape-VC.

Other SolutionsSolution III: Flow Control
Restrict injection when no. of empty buffers fall below a threshold
Implementation: Bubble Flow Control [9]
Limitation: Implementation Complexity, Throughput Loss.
Solution IV: Deflection Routing
Assign every flit to some output port even if they get misrouted.
Implementation: BLESS [10], CHIPPER [35]
Limitation: Livelocks, non-minimal routing
9

Comparison of Deadlock Freedom Theories10
Acyclic CDG not Required
No Packet Injection
RestrictionsLivelock
Free
VC cost for Mesh Routing
Minimal Adaptive
Topology Indepen-
dent
Dally
DuatoFlow
ControlDeflection Routing
SPIN
Theory
Metric
1 61 2
2 2
1
1 1
Can we do better ??

OutlineRouting Deadlocks
State of the Art
Dally’s Theory
Duato’s Theory
Flow Control Routing
Deflection Routing
SPIN : Synchronized Progress in Interconnection Networks
Evaluations
Conclusion
11

SPIN : Key Idea12
A
C D
B
Deadlock
E
F
What if: We coordinate the movement of every packet to the next hop at a given time ??
Simultaneous Synchronized
Movement
Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.
spin complete

SPIN : Key Idea13
Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.
Each spin leads to one hop forward movement of all deadlocked packets.
One spin may not resolve the deadlock. If so, spin can be repeated
Deadlock is guaranteed to be resolved in a finite number of spins [proof in paper, Sec. III]

SPIN : Key Idea14
A
C D
BE
F
First spin complete
Second spin complete

SPIN : Key Idea15
A
C
D
B
E
F
Packets E &B exit the loop
Deadlock Resolved

OutlineRouting Deadlocks
State of the Art
SPIN : Synchronized Progress in Interconnection Networks
Key Idea
Implementation Example
Micro-architecture
FAvORS
Evaluations
Conclusion
16

SPIN: Implementation ExampleSPIN is a generic deadlock freedom theory that can have multiple implementations.
We choose a recovery approach as deadlocks are rare scenarios (See Sec. II-F).
Our Implementation:
Detect the Deadlock.
Coordinate a time for spin.
Execute the spin.
17

Implementation Example : Detect Deadlocks
Use counters.
Placed at every node at design time. Optimize by exploiting topology symmetry (See Static Bubble [6]).
If packet does not leave in threshold time (configurable), it indicates a potential deadlock.
Counter expired ? Send probe to verify deadlock.
18

Implementation Example : Probe Msg.19
A
C D
BE
FCounter
Expires at Node 5
Probe
Send Probe
1. Deadlock Detection
2. Coordinating the spin.
3. Executing the spin.
Probe Returns: Deadlock Confirmed

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.
Probe returns to sender: Cyclic buffer dependence, hence deadlock.
Next, send a move msg. to convey the spin time Upon receiving move msg., router sets its counter to count to spin cyle.
20

Implementation Example : Move Msg.21
A
C D
BE
F
Move
Send Move
Set counter to count to spin cycle
Move returns
1. Deadlock Detection
2. Coordinating the spin.
3. Executing the spin.

Implementation Example : spin22
A
C D
BE
F
Counters expire together in the spin
cycle
1. Deadlock Detection
2. Coordinating the spin.
3. Executing the spin.

Implementation Example : spin23
A
C
D
B
EF1. Deadlock Detection
2. Coordinating the spin.
3. Executing the spin.

Multiple SPIN OptimizationResolving a deadlock may require multiple spins
After spin, router can resume normal operation.
Counter expires again, process repeated.
Optimization: send probe_move after spin is complete.
probe_move checks if deadlock still exists and if so, sets the time for the next spin.
Details in paper (Sec. IV-B).
24

OutlineRouting Deadlocks
State of the Art
SPIN : Synchronized Progress in Interconnection Networks
Key Idea
Implementation Example
Micro-architecture
FAvORS
Evaluations
Conclusion
25

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.
Spl. Msgs. have higher priority in link usage over regular flits.
Links are anyways idle during deadlocks.
Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped).
Distributed Design: any router can initiate the recovery.
4% area overhead compared to traditional mesh router in 15nm Nangate [42].
26

OutlineRouting Deadlocks
State of the Art
SPIN : Synchronized Progress in Interconnection Networks
Key Idea
Walkthrough Example
Micro-architecture
FAvORS
Evaluations
Conclusion
27

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.
FAvORS : Fully Adaptive One-vc Routing with SPIN.
Algorithm has two flavors: Minimal Adaptive Non-minimal Adaptive.
Route Selection Metrics: Credit turn-around time Hop Count
More details in paper (Sec. V).
28

OutlineRouting Deadlocks
State of the Art
SPIN : Synchronized Progress in Interconnection Networks
Evaluations
Conclusion
29

Evaluations 30
Simulator gem5 simulator + Garnet 2.0 Network model
Topologies 8x8 Mesh 1024 node Off-chip Dragon-fly
Link Latency
1-cycle Inter-group: 3-cycleIntra-group: 1-cycle
Traffic Synthetic + Multi-threaded (PARSEC)
Synthetic
Network Configuration:

Evaluations : Baselines8x8 Mesh:
31
Design Routing Adaptivity
Minimal Theory Deadlock Freedom Type
West-first Routing Partial Yes Dally Avoidance
Escape-VC Full Yes Duato Avoidance
Static-Bubble [6] Full Yes Flow-Control Recovery
1024 Node Off-chip Dragon-fly:
Design Routing Adaptivity
Minimal Theory Deadlock Freedom Type
UGAL [37] Full No Dally Avoidance

Saturation Throughput 1024-node Off-chip Dragon-fly:
32
0255075
100
0.01 0.03 0.05 0.07 0.09 0.11
Bit-complement
Late
ncy
(cyc
les)
Inj. Rate (flits/node/cycle)
0255075
100
0.01 0.08 0.15 0.22 0.29 0.36Inj. Rate (flits/node/cycle)
Neighbor
Late
ncy
(cyc
les)
UGAL_3VCSPIN
UGAL_3VCDally
FAvORS_NMin_1VCSPIN
Minimal_1VCSPIN
50% higher throughput compared to UGAL_Dally
62% higher throughput compared to
Minimal Routing 1-VC
25% higher throughput compared to UGAL_Dally

Saturation Throughput 33
8x8 On-chip Mesh:
Transpose (3-VC)
Inj. Rate (flits/node/cycle)
Late
ncy
(cyc
les)
West-First_3VCDally
0
25
50
75
100
0.001 0.031 0.061 0.091 0.121
Static_Bubble_3VCFlow-Control
EscapeVC_3VCDuato
FAvORS_Min_3VCSPIN
0
25
50
75
100
0.001 0.031 0.061 0.091 0.121
Transpose (1-VC)
Inj. Rate (flits/node/cycle)
Late
ncy
(cyc
les)
FAvORS_Min_1VCSPIN
West-First_1VCDally
68% higher throughput compared
to West-first 3-VC
8% higher throughput compared
to Escape-VC 3-VC
80% higher throughput compared
to West-First 1-VC
10% higher throughput
compared to Static-Bubble 3-VC

ConclusionDeadlocks are a fundamental problem in Interconnection Networks.
SPIN is a new deadlock freedom theory
Simultaneous packet movement for deadlock recovery
No routing restrictions or escape-VCs required.
Enables true one-VC fully adaptive routing for any topology
Salient Features of our Implementation:
Scalable: Distributed Deadlock Resolution
Plug-n-Play: topology agnostic
68% higher (Mesh) & 62% higher (dragon-fly) saturation throughput.
34

ConclusionPractical Applications:
35
On-Chip Mesh (Intel Xeon Phi, Cavium Thunder X2)
Super-computers Dragon-fly (Cray XC Networks)
Datacenters JellyFish (HP), Fat Tree (Google)
Irregular Topologies Faults (Static Bubble [6])Power-gating (Router Parking[29])
NoC Generators FlexNoc (ARTERIS),Sonics GN (SONICS)
Domain specific Accelerators
Eyeriss [15]
Thank you !!

Back-up36

SPIN : ApplicationsSPIN is a generic deadlock freedom theory
Scalable: distributed deadlock resolution
Plug-n-Play: doesn’t require knowledge of topology
SPIN can thus be used in :
On-chip networks: Mesh (Intel SCC, Tilera Tile64)
Supercomputers: Dragon-fly (Cray XC Networks)
Datacenters: Jellyfish (HP), Fat Tree (Google)
Static & Dynamically Changing Irregular topologies due to faults (Static Bubble [6]) & power-gating (Router Parking [29])
NoC Generators (Opensmart [13]) & Domain specific accelerator (Eyeriss[15])
37

Implementation Example : Probe Msg.38
A
C D
BE
FCounter
Expires at Node 5
Probe
Send Probe
Probe Returns: Deadlock Confirmed

Implementation Example : Move Msg.39
A
C D
BE
F
Move
Send Move
Set counter to count to spin cycle
Cntr
Move returns
1. Counter Expires2. Send Probe3. Send Move4. Counter expires
in spin cycle5. Spin