phastlane: a rapid transit optical routing network · computer systems laboratory cornell...
TRANSCRIPT
-
Phastlane: A Rapid Transit Optical Routing Network
Mark Cianchetti, Joseph Kerekes, and David Albonesi
Computer Systems Laboratory
Cornell University
-
O/E
Translation
E/O
Translation
Optical
Coupler
Photonic System Components
The Interconnect Bottleneck
� Future processors: tens to hundreds of cores
� Dire need for fast and power efficient on-chip interconnect
� Nanophotonics
• Ultra-fast signal propagation
• High bandwidth density
• Low power
10001101111100101010111100001
11011101110011100000110101000
11111001010101000111000111000
11101010101001110101010001110
Electrical Input Data
+ Vinput -
-
Limitations of Nanophotonics
� Lack of fundamental building blocks
• Logic gates
• Memory structures
� Single routing layer
• Power loss of waveguide crossings
Upon formation, unimpeded data
blasted from source to destination
Traditional Approach To Exploiting Photonics
Source
NodeDestination
Node
Direct Bus Path
-
Respecting the Limitations…
� Previous proposals largely bus-based
� Direct photonic links between source and destination
� Data blasted in a single optical transmission
Corona ArchitectureColumbia Architecture FireFly ArchitectureCornell Ring Architecture
Dest Node
Me
sh
Co
lum
n
Electrical Network
Optical Network
-
Overview of Previous Work
Cache Line
Token
Destination Bus
Fully Connected XBAR
Snake XBAR
Corona
Cache Line>> Cache LineCache LineUnit of Data Transfer
Per HopPer HopWDMShared Resource Arb.
Router ChannelsRouter ChannelsNoneShared Resources
Packet SwitchedElectrical SetupSnoopy BusNetwork Operation
MeshTorusRing-busNetwork Topology
PhastlaneColumbiaCornell
RingFeature
-
Phastlane Contributions
� Novel nanophotonic router architecture
� Hybrid optical/electrical packet-switched, mesh network
• Nanophotonics: High-speed, multi-hop packet transfers
• Electrical: Logic and buffering to handle packet conflicts
-
� Minimally impede flow of light from source to destination
• Simplified routing, arbitration
� Dual die configuration
Phastlane Architecture
Transmit: Node 0 to Node 10
Arbitration
Routing
-
Optical Node Operation
Crossbar Resonators
Optical Router
-
Electrical Node Operation
� Blocked packets are buffered locally
• Per port buffers, single processor buffer
� Output multiplexers connect to output port optical transmitters
Electrical Node
-
Simplified Packet Routing
� Packets are dimension-order, source routed
• Portion of packet holds pre-computed routing bits
� Per hop control bits enable near-instant switch traversal
Source Routing
110010111010011100011111011111110001100101010100011
100011100010010100011100001110001110001111100000000
000101000111000111000101010101101010100111000001111
+
Data/Address/Other
Per Hop Packet Control
01000
Straight
R5
01000
Straight
R4
00100
Left
R3
01000
Straight
R2
01000
Straight
10000
Local
Direction
Encoding
R1R6Hop Number
XY Dimension-Order
-
Per Hop Control Bits
� Per hop control groups set switch resonators
• Left, right, straight, local, broadcast
� Frequency translation
• Enables hop oblivious routing
Left
Rig
ht
Stra
ight
Local
Bro
adcast
Control Waveguide
R 3R 4
Empty
01000 Straight
R5
01000 Straight
R4
01000 Straight
R3
00100 Left
R2
01000 Straight
10000 Local
Direction
Encoding
R1R6Hop Number
Router R1 examines
control bits
Frequency shift
-
Per Hop Control Bits Continued
� Control waveguide split into C0 and C1
• WDM limitations
� Frequency and physical translation
• Enables hop oblivious routing
C1
C0C1
C0
-
Competing Packets: Resolution
� Packet collisions avoided by packet blocking
� Collision resolution performed “on the fly”
N
S
W E
Two packets collide
Packet
Received
Collision resolution
-
High-Speed Drop Network
� Drop path gradually formed as packet passes through network
� Packet dropped if downstream buffer full
� Drop signal travels opposite data packet
Source
Drop NodeN
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
N
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
N
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
N
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
N
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
N
EW
D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0i
D1i
C0i
C1i
D00
D1o
C0o
C1o
D2i
D2o
D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o
D00C0oC1o D0 i C0 i C1 i
D0o
C0o
C1o
D0i
C0i
C1i
D00
C0o
C1o
D0i
C0i
C1i
D00 C0o C1oD0 iC0 iC1 i
-
Router Design Space Exploration
-
Evaluation Methodology
� 64 node, 8x8 mesh network topology
� Low latency, high bandwidth electrical network baseline
• Multi-port receive for destination packets
� Packet size: 80 bytes, single flit
� Synthetic and Splash benchmarks
Baseline Configuration
XBAR
Baseline Node
-
Synthetic Benchmark Results
0
10
20
30
40
50
0 0.05 0.1 0.15 0.2 0.25 0.3
Offered Traffic
Late
ncy .
0
10
20
30
40
50
0 0.05 0.1 0.15 0.2
Offered Traffic
Late
ncy .
0
10
20
30
40
50
0 0.05 0.1 0.15 0.2 0.25 0.3
Offered Traffic
Late
ncy .
0
10
20
30
40
50
0 0.05 0.1 0.15 0.2
Offered Traffic
Late
ncy .
8X6X
5X 6X
Bit Comp Bit Rev
Shuffle Transpose
-
0
1
2
3
4
5
Bar
nes
Cho
lesk
y
FFT Lu
Oce
an
Rad
ixR
aytra
ceW
ater
NSq
Wat
ersp
atia
l
FMM
Ne
two
rk S
pe
ed
up
Electrical
Optical
Optical 64 Entry Buffers
Optical Inf Buffers
Splash Performance Analysis
� 2X speedup across all benchmarks
-
Splash Power Analysis
� 80% reduction in power across all benchmarks
0
0.25
0.5
0.75
1
Barn
esC
hole
sky
FFT Lu
Oce
an
Rad
ixR
aytra
ceW
ater
NSq
Wat
ersp
atia
l
FMM
Re
lati
ve
Po
we
r
Electrical
Optical
Optical 64 Entry Buffers
-
Conclusions
� Novel nanophotonic router architecture
� Packet-switched, hybrid optical/electrical mesh network
� Up to 8X performance improvement for synthetic workloads
� Up to 4X performance improvement, 80% power reduction, for Splash
� Future work
• Lower power broadcast scheme
• Improved allocation and flow control