reconfigurable computing on-line communication strategies ... · reconfigurable computing on-line...
Post on 12-Jul-2018
241 Views
Preview:
TRANSCRIPT
Reconfigurable Computing
ReconfigurableReconfigurable ComputingComputing
OnOn--lineline communicationcommunication strategiesstrategies
ChapterChapter 77
Prof. Dr.Prof. Dr.--Ing. Jürgen TeichIng. Jürgen TeichLehrstuhl für HardwareLehrstuhl für Hardware--SoftwareSoftware--CoCo--DesignDesign
Reconfigurable Computing 2
OnOn--line connection line connection -- Motivation Motivation
Routing-conscious temporal placement algorithms consider distance among components during placementHowever, they do not consider implementation of a dynamic connection mechanism required for communication among components.
In this section, we will investigate existing approaches for solving the communication problem between components dynamically placed on and removed from the device, namely:
Bus-based approachesCircuit routingNetwork-on-Chip (NoC) approaches
Reconfigurable Computing 4
BUS BUS -- oriented communicationoriented communicationMany components connected at fixed locationsOne arbiter for BUS-ManagementSoC (System on Chip): Buses can be used to connect different modules
ARM AMBAAdvance high-performance bus (AHB)Advance peripheral bus (APB)
IBM CoreConnectProcessor local bus (PLB)On-chip peripheral bus (OPB)
Silicore Whishbone
Mod4 Mod 1
Mod3
Mod2
Arbiter
Reconfigurable Computing 5
Using standard bus-arbiter (Becker)
Device is divided into slots
Each task must be placed in a slot
Each component implements the bus-transaction
Each component can be a master
An arbiter manages the bus-assignment
OS-frame
Quelle: ITIV, Uni Karlsruhe (TH)
Decompessor
Control
Bus-MacroMaster-Module
Controller Com
ICAP
Mod
ule
0
ModCom
Mod
ule
1
ModCom
ModCom
Mod
ule
2
ModCom
Mod
ule
3
ModCom
Mod
ule
4
Quelle: ITIV, Uni Karlsruhe (TH)
BUS BUS -- oriented communicationoriented communication
Reconfigurable Computing 6
Encapsulating the BUS-transaction in a wrapper (Platzner, Walder)
Divide the device into slotsEach task must be placed in a given slotA slot is enveloped in a wrapper which hides the bus-transaction process
Communication takes place through a fixed module called the OS.
Each module can send a message by writing in its send bufferThe OS copies messages from the send buffers to the receive buffers of modulesThe receive modules read their message from its receive buffer
OS-frame
task-slot
task-slot
task-slot
task-slot
Inter Frame Communication Channels (IFCC)
Communication via the OSCommunication via the OS
Reconfigurable Computing 7
Communication with off-chip module is also done via the OS
OS-frame
Communication via the OSCommunication via the OS
Reconfigurable Computing 9
Architecture:Set of Processing elementsCommunication signals are set between two PEs using a set of switches on a path from the source to the destinationAdvantage:
Direct communication. No need to process packets
Drawbacks:Computing a route is expensive. Difficult to be done on-lineRouted lines create a large amount of prohibited area
Prohibited area can be overcome by using an extra layer exclusively for circuit routing
Dynamic Networks Dynamic Networks –– circuit routing circuit routing
Prohibited area
Reconfigurable Computing 10
A set of n processing elements and k segmented busesCrosspoints (switches) are used to set the connection between the segments at run-time
The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach
PE 5PE 4PE 3PE 2PE 1
Switches
Reconfigurable Computing 11
The sender always initiates a communication request and terminates (frees) an established communication pathEach communication path is granted until the end of the communication
OS-frame
The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach
PE 5PE 4PE 3PE 2PE 1
Reconfigurable Computing 12
On a columnwise reconfigurable device, the RMB provides a modular communication infrastructureAll the switches in one column are grouped togetherThe separation of horizontal reconfigurable regions is done via bus macros
OS-frame
The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach
PE 5PE 4PE 3PE 2PE 1
Bus macros
Reconfigurable Computing 13
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
T1 T2
T3
T7T6
T5
T8
T4
T9
T1
M11
M11
M11
M11
…T1 T2
T3
Reconfigurable Computing 14
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
T1 T2
T3
FPGA
RMB
M1 M2
M3
Reconfigurable Computing 15
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1
M2M3
Reconfigurable Computing 16
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1
M2M3
Reconfigurable Computing 17
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1
M2M3
Reconfigurable Computing 18
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1
M2M3
Reconfigurable Computing 19
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1M2
M3
FPGA
RMB
M1M2
M3 FPGA
RMB
M1M2
M3
{ } ( ) ( ) ( ){ }jkiEjink σσσ ≤≤∈∈ ,,maxmin ,,1 K
( ) ( ) ( )jiEji σσσ −∈,maxmin
( ) ( )( )∑ ∈−
Ejiji
,min σσσ
Minimum Bandwidth (MBW)
Minimum Cutwidth Linear Arrangement (MCLA)
Optimal Linear Arrangement (OLA)
Reconfigurable Computing 20
AlgorithmsAlgorithms forfor ReconfigurationReconfiguration
FPGA
RMB
M1
M2M3
Reconfigurable Computing 22
Video game: Module RelocationVideo game: Module Relocation
Racket Position
User Input
Ball Position
Visualization
4
20
20 38
Reconfigurable Computing 23
Video game: Module RelocationVideo game: Module Relocation
Racket Position
User Input
Ball Position
Visualization
4
20
20 38
Reconfigurable Computing 24
4
20
20 38
Video game: Module RelocationVideo game: Module Relocation
Racket Position
User Input
Ball Position
Visualization
CP
Use
rIn
put
CP
Rac
ket
Posi
tion
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
Task:• Place modules such that the least number of
bus segments is required Solution:
• Integer Linear Program (FPL’06)
Reconfigurable Computing 25
4
20
20 38
Video game: Module RelocationVideo game: Module Relocation
Racket Position
User Input
Ball Position
Visualization
CP
Use
rIn
put
CP
Rac
ket
Posi
tion
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
CP
Use
rIn
put
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
CP
Rac
ket
Posi
tion
58 parallel segments
Reconfigurable Computing 26
Video game: Module RelocationVideo game: Module Relocation
CP
Use
rIn
put
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
CP
Rac
ket
Posi
tion
Length of longest connection is 3
58 parallel segmentsTask:
• Place modules such that for given maximalnumber of parallel bus segmentsthe length of the longest connectiondistance is minimized
Solution:• Integer Linear Program (FPL’06)
Reconfigurable Computing 27
Video game: Module RelocationVideo game: Module Relocation
CP
Use
rIn
put
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
CP
Rac
ket
Posi
tion
Length of longest connection is 2
Reconfigurable Computing 28
Video game: Module RelocationVideo game: Module Relocation
Length of longest connection is 2
CP
Rac
ket
Posi
tion
CP
Use
rIn
put
CP
Bal
lPo
sitio
n
CP
Visu
aliz
atio
n
Reconfigurable Computing 30
Video game: Video game: ErlangenErlangen Slot Machine (ESM)Slot Machine (ESM)
Reconfigurable Computing 31
ImplementationImplementation
CP0 CP2CP1 CP3
RacketPosition
UserInput DisplayBall
Position
Reconfigurable Computing 32
ReferencesReferences
CP0 CP2CP1 CP3
RacketPosition
UserInput DisplayBall
Position
[1] Minimizing Communication Costs for Reconfigurable Slot ModulesS. Fekete, J. van der Veen, M. Majer, J. TeichIn Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), Madrid, Spain, August 28-30, 2006.
[2] A Practical Approach for Circuit Routing on Dynamic Reconfigurable DevicesA. Ahmadinia, C. Bobda, J. Ding, M. Majer, J. Teich, J. van der Veen and S. Fekete,In Proceedings of the IEEE International Workshop on Rapid System Prototyping (RSP), Montreal, Canada, pp. 84-90, June 8-10, 2005.
[3] The Erlangen Slot Machine: A flexible FPGA-platform for partially reconfigurable applications at run-time. J. Angermeier, D. Göhringer, M. Majer and J. Teich.Tutorial, 20th International Conference on Architecture of Computing Systems (ARCS 2007), Springer LNCS series, Zurich, Switzerland, March 12-15, 2007.
[4] The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-Based Computer. M. Majer, J. Teich, A. Ahmadinia and C. Bobda. Journal of VLSI Signal Processing Systems, Springer, vol. 46(2), March 2007.
[5] The Erlangen Slot Machine - A Platform for Interdisciplinary Research in Reconfigurable Computing. J. Angermeier, D. Göhringer, M. Majer, J. Teich, S. Fekete and J. van der Veen. it - Information Technology, Heft 3/2007, Oldenbourg, München, 2007.
[6] Optimal free-space management and routing-conscious dynamic placement for reconfigurable computing.A. Ahmadinia, C. Bobda, S. Fekete, J. Teich and J. van der Veen.IEEE Transactions on Computers, volume 56, number 3, 2007.
Reconfigurable Computing 34
A Network on Chip consists ofA set of processing elementsA set of network elements also called routersEach PE is connected to a network elementEach PE is assigned to the same address as its corresponding network elementCommunication is packet-basedEach packet contains the destination address and some dataRouters are used to forward packets in the right direction according to the destination addressA router contains little logic. It may have some buffer for storage of packets in case of high traffic
NoC (Network on Chip) NoC (Network on Chip) –– based based communicationcommunication
Router
NoC
Reconfigurable Computing 35
Limitations of fixed NoC communicationFixed position for modulesLarger modules must be split
Packet based communication inside a component is not efficient
Direct communication must be used on a module boundaryWe seek a network infrastructure which
allows modules placed at a given location to use all the resources in their areachanges according to the placement of modules on the deviceEach component always accesses other components and pins for communication
Dynamic Networks Dynamic Networks
Reconfigurable Computing 36
Architecture: like NoC architectureSet of Processing elementsSet of network elements implementing routers in their basic configurationEach PE is connected to a network elementDirect communication among neighbour PEsCommunication is packet-basedEach packet contains the destination address and some dataThe ratio router size/module size must be kept small
Dynamic Networks Dynamic Networks –– DyNoC (Dynamic NoC) DyNoC (Dynamic NoC)
Reconfigurable Computing 37
Dynamics in the NoCEach module is represented as a rectangular box encapsulating a given functionAll resources (routers and PEs) in a placement area of a module are assigned to the module
Therefore, the network logic should be flexible to be used as logic in a given module
Upon completion, each module restores its routers to their basic configurationUp to a selected router, all the routers in the area of a component are no more accessible from the networkEach placed component accesses the network using the router attached to it North-East (NE) PENetwork varies with temporal placement of modules on the device
Dynamic Networks Dynamic Networks –– DyNoC DyNoC
Reconfigurable Computing 38
Module and pin reachability:A module (pin) is reachable iff all the messages sent to this module (pin) can reach their destination.
We define the component graphG = (V,E) as follows:
V is the set of components and pinsAn edge (u,v) belongs to E iff a path exists between u an v
If G is connected, then all components and pins are reachableThis increases the architectural requirements
Dynamic Networks Dynamic Networks –– DyNoC DyNoC -- Reachability Reachability
Reconfigurable Computing 39
Additional architectural requirementsA ring of network elements must be available around the chipThe PEs at the chip boundary must be connected to the router at the chip boundaryEach placed component accesses the network using the PE associated to it North-East (NE) PEOnly PEs are allowed to be at the boundary of a component
Dynamic Networks Dynamic Networks –– DyNoC DyNoC -- Reachability Reachability
Reconfigurable Computing 40
Theorem (Bobda et al.): If each component is synthesized in such a way that it is internally surrounded only by processing elements, then each placement on the reconfigurable device causes a strongly connected component graph. Proof:
Assume that the corresponding component graph is not strongly connected, then
at least two components abut or one component abuts the device boundary.
Consider, for example, case 1):Either the two components must overlapOr, one component uses some routers on its boundary.
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Reachability Reachability
PE
PE
PE
PE
PE
PE
PE
PE
A
PEPEPEPE
PE
X XX
XXX
X
X
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
X
A
Reconfigurable Computing 41
Example of a feasible placement
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Reachability Reachability
Reconfigurable Computing 42
Routing in a mesh without obstacles The XY-router
Fast and EfficientLocal decisive5 inputs and 5 outputs channelsInput-FIFO on each channel
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– RoutingRouting
The router compares its address to the destination address of a packet
If X-router < X-packet, packet is sent eastIf X-router > X-packet, packet is sent westIf X-router = X-packet and Y-router < Y-packet, packet is sent northIf X-router = X-packet and Y-router > Y-packet, packet is sent southIf X-router = X-packet and Y-router = Y-packet, copy packet to local FIFO
xy
Reconfigurable Computing 43
The dynamic placement of components creates obstacles in the networkThe routing must be able to recognize obstacles and be able to surround components.Vertical and horizontal obstacles are treated differently
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing
Obstacles
Reconfigurable Computing 44
Dealing with obstaclesXY-RouterAdditionally:
Activate signal to neighboursIf the router is available, this control signal is high. Otherwise, it is low
Component surrounding strategies are requiredThe S-XY (Surround XY) router
Operates in three modesThe N-XY: Normal operating mode. The packets are routed according to the XY strategyThe SH-XY: The router enters this mode when a horizontal obstacle is foundThe SV-XY: The router enters this mode when a vertical obstacle is found
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XYXY
Reconfigurable Computing 45
The SH-XY mode: Surrounding obstacles in the horizontal direction
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY
DestObstacle
Component
Routing Path 2
Routing Path 1
YDest > YRouter
XDest < XRouter
XDest = XRouterYDest < YRouter
XDest = XRouterYDest = YRouter
Stamp packets to avoid “ping-pong” game
Reconfigurable Computing 46
ImplementationImplementation
• Virtex II 6000
– 4x4 DyNoC– 7% of FPGA-Usage– Router latency: 2,5ns– 32bit Data-BUS and
6x4x32bit FIFO per Router
Reconfigurable Computing 47
Surrounding obstacles in the vertical directionPlace a stamp on packets to avoid a “ping-pong”game
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY
ObstacleComponent
DestinationComponent
RoutingPath1 Routing
Path2
Ping ponggame
Reconfigurable Computing 48
Theorem (Bobda et al.): The S-XY algorithm is deadlock-free, i.e., each packet will reach its destination after a finite number of steps.
Proof: ExerciseProve that
Each component is reacheable, i.e., a path is always available from source to destinationA packet is never blocked in the network (Theorem 1)Since a packet can never be blocked, this will happen only if apacket is looping around a component.
Prove that this will never happen!
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY
Reconfigurable Computing 49
The decision to left/right or up/down is taken arbitrarilyIn the worst case, the path can be very longTo avoid this, consider guiding the router by the components
Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY
00000101
01
01
01
00
00
00000101
01
01
01
00
00
Guided routing
C4
C 3C
2C
1
S
D
top related