systolic array
TRANSCRIPT
![Page 1: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/1.jpg)
Fault Tolerance in Systolic Arrays
Presented on : 09 March 2012
![Page 2: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/2.jpg)
Presentation Overview
• Systolic arrays– Introduction– Structures – Matrix Multiplication– Applications
• Fault Tolerance in Systolic Arrays– Hardware Schemes– Software/Algorithm based– Reconfigurable SA
![Page 3: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/3.jpg)
A set of simple processing elements with regular and local connections which takes external inputs and processes them in a predetermined manner in a pipelined fashion
What is Systolic Computing?
![Page 4: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/4.jpg)
• Systolic computers pump data through
• The architectures are not general but tied to specific algorithms
• Systolic computers show both pipelining and parallel computation
PE
Memory
PE
Memory
PE ----- PE
Generalization of pipelined array architecture
![Page 5: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/5.jpg)
Functions of a Cell in a Systolic System
• Systolic systems consists of array of PE (Processing Elements) – processors called cells, – each cell connected to a small number of nearest neighbours in a mesh like
topology.
• Each cell performs sequence of operations on data that flows between them.
• Generally operations are same in each cell.
• Each cell performs an operation or small number of operations on a data item and then passes it to its neighbour.
• Systolic arrays compute in “lock-step” with each cell undertaking alternate compute/communicate phases.
![Page 6: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/6.jpg)
SIMD Array Vs Systolic Array [1]
• PEs under supervision of one control unit
• All PEs receive same instruction broadcast by control unit
• PEs operate on different data sets from distinct data streams.
Processing Units
Processing Units
Processing Units
Interconnection Network(Local)
Control Unit
Data Bus
Control Bus
……..
![Page 7: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/7.jpg)
Systolic Array.
• SIMD array usually loads data into its local memories before starting the computation.
• Systolic arrays usually pipe data from an outside host and also pipe the results back to the host.
Processing Units
Interconnection Network(Local)
……..
Control Unit
Processing Units
Control Unit
Processing Units
Control Unit
Figure – Ref [1] Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf
![Page 8: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/8.jpg)
Variations of Systolic Arrays
• Systolic arrays can be built with variations in: – Connection Topology
• 2D Meshes
• Hypercubes
– Processor capability: ranging through-
• Trivial- just an ALU
• ALU with several registers
• Simple CPU- registers, run own program
• Powerful CPU- local memory also
![Page 9: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/9.jpg)
• Linear array with 1D I/O.
• Linear array with 2D I/O.
1D Linear Array
2D Linear Array
Typical Structures of a Systolic Architecture
![Page 10: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/10.jpg)
Matrix Multiplication [2]
• Consider multiplying a 3x2 X 2x1 matrix:
![Page 11: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/11.jpg)
Systolic Arrays [2]
![Page 12: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/12.jpg)
T0
T1
T2
T3
T4
T5
T6
T7
Y values goes left, X values go right, A values fan in
Figure – Ref [2] Jason HandUber , Systolic Arrays , February 12, 2003 , http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf
![Page 13: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/13.jpg)
Structures of a Systolic Architecture [1]
• Bi-directional two-dimensional network
![Page 14: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/14.jpg)
• Planar array with perimeter I/O. This configuration allows I/O only through its boundary cells.
• Focal Plane array with 3D I/O. This configuration allows I/O to each systolic cell.
![Page 15: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/15.jpg)
Figure – Ref [3] Jop Sibeyn , Systolic Matrix Product, http://users.informatik.uni-halle.de/~jopsi/dpar03/chap3.shtml
![Page 16: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/16.jpg)
Figure – Ref [4] Shaaban, Systolic Architectures , http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrix-multiplication-systolic.pdf
![Page 17: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/17.jpg)
![Page 18: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/18.jpg)
![Page 19: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/19.jpg)
![Page 20: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/20.jpg)
![Page 21: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/21.jpg)
![Page 22: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/22.jpg)
![Page 23: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/23.jpg)
Structures of a Systolic Architecture
![Page 24: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/24.jpg)
• hexagonal network
Structures of a Systolic Architecture [1]
![Page 25: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/25.jpg)
Structures of a Systolic Architecture
![Page 26: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/26.jpg)
Structures of a Systolic Architecture
• trees
![Page 27: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/27.jpg)
• Matrix Inversion and Decomposition. • Solution of difference and differential equations • Linear Programming • Sorting and Searching• Polynomial Evaluation • Convolution• Systolic arrays for matrix multiplication. • Image Processing • Image Recognition • Computational Geometry • CAD • Systolic lattice filters used for speech and seismic signal processing
• Artificial neural network.• Robotics• Equation Solving• Combinatorial Problems
Applications Of Systolic Arrays
![Page 28: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/28.jpg)
Features of Systolic Arrays
• Synchrony - data is rhythmically computed (Timed by a global clock) and passed through the network.
• Modularity – array (Finite/Infinite) consists of modular processing units.
• Regularity - processing units are interconnected with homogeneously.
• Spatial Locality - cells has a local communication interconnection.
• Temporal Locality - cells transmits the signals from one cell to other which require at least one unit time delay.
• Pipelinability - array can achieve a high speed.
![Page 29: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/29.jpg)
Systolic Disadvantages
• Complicated – Both in Hardware and Software. – In fact entire volumes exist outlining systolic array verification.
• Expensive in comparison to uni-processor systems, although much faster.
• A systolic array used as attached array processor, integrated into
an existing host as a back-end processor
– it receives data and o/p the results through an attached host computer,
![Page 30: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/30.jpg)
• One for One Redundancy
– each PE of SA has a redundant PE
– standby PE keeps monitoring the active one at all times
– it becomes active if active PE fails
– it has to keep itself synchronized with the active unit
operations
Fault Tolerance
![Page 31: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/31.jpg)
• N + X redundancy
– consists of N+X PEs, where typically X is much smaller than N.
– whenever any of N modules fails, one of the X modules takes
over its functions
– health monitoring of N units by X units at all times is not
practical, a higher level module monitors the health of N units
– If one of the N units fails, it selects one of the X units.
Fault Tolerance
![Page 32: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/32.jpg)
• Load Sharing
– all the PEs that are equipped to perform the SA function share
the load
– higher level module performs load distribution, maintains
health status of the PEs.
– If one load-sharing PE fails, the higher level module starts
distributing load among the rest of the units.
– There is a graceful degradation of performance with hardware
failure.
Fault Tolerance
![Page 33: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/33.jpg)
SA with N + 1 Redundancy SA with N + 1 Redundancy [5]
Regular SA (N=4) • N PEs, N+1 interconnections
SA with N + 1 redundancy (N=4)• N+1 PEs, • N mux, • N demux, • 2N+1 interconnections
Figures - Ref [5] I N Tselepis and M P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf
![Page 34: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/34.jpg)
Three Versions of the Computation Structure
• A SA with pipeline period α = 3 can perform an original algorithm and two
redundant algorithms concurrently.
• Redundant computations can be performed by the idle PEs at idle clock
cycles.
• Redundancies are introduced at the computational level by deriving three
equivalent algorithms, but with disjoined index spaces.
![Page 35: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/35.jpg)
Re-computing with Shifted Operands Re-computing with Shifted Operands [6]
Figure – Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75
![Page 36: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/36.jpg)
Three processors all work on same problem and compare results
Triple Modular Redundancy (TMR) [5]
• 3N PEs, • 2 Voters, • 3(N+1) interconnections
Figures – Ref [5] Tselepis and M P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays, http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf
![Page 37: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/37.jpg)
Triple Time Redundancy [7]
• gracefully degradable linear systolic arrays• TMR• Fault Detection/Correction• Time redundancy achieved - concurrent error correction/detection
Figure – Ref [7] Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988
![Page 38: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/38.jpg)
Algorithm-based Error Detection & Fault Location [6]
Array A
Array B
Checksum matrix multiplication
![Page 39: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/39.jpg)
Reconfigurable Systolic StructuresIndependent switches Switches separated from PEs and treated as independent elements instead of part of PE.
Local SwitchesSwitches placed immediately around each PE. Information entering a faulty PE can be directed to one of its neighbours without processing.
Bus-structured switches PEs are in collinear layout, with bundles of communication parallel to the row to which the PEs are connected
Address renamingEach processor has modifiable address with redundant processors and links provided. Once a faulty PE is detected, addresses of the processor are rearranged so that the faulty PE is excluded and redundant PE is included.
![Page 40: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/40.jpg)
Processor-Switch Lattice
Figure – Ref [6] Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75
![Page 41: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/41.jpg)
References
1. Systolic Computing Fundamentals, http://web.cecs.pdx.edu/~mperkows/temp/May13/systolic.pdf
2. Jason HandUber , Systolic Arrays , February 12, 2003 ,
http://web.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf
3. Shaaban, Systolic Architectures , http://web.cecs.pdx.edu/~mperkows/temp/May22/0020.Matrix-
multiplication-systolic.pdf
4. Jop Sibeyn , Systolic Matrix Product, http://users.informatik.uni-halle.de/~jopsi/dpar03/chap3.shtml
5. I.N. Tselepis and M.P. Bekakos, Fault-Tolerant Implementation of Systolic Arrays,
http://www.aueb.gr/pympe/hercma/proceedings2009/H09-FULL-PAPERS-1/TSELEPIS-BEKAKOS-1.pdf
6. Jacob A. Abraham, Prithviraj Banerjee, Chien-Yi Chen, W. Kent Fuchs, Sy-Yen Kuo, and A. L. Narasimha
Reddy. 1987. Fault Tolerance Techniques for Systolic Arrays. Computer 20, 7 (July 1987), 65-75
7. Majumdar, A.; Raghavendra, C.S.; Breuer, M.A.; , "Fault tolerance in linear systolic arrays using time
redundancy," System Sciences, 1988. Vol.I. Architecture Track, Proceedings of the Twenty-First Annual
Hawaii International Conference on , vol.1, no., pp.311-320, 0-0 1988
![Page 42: Systolic Array](https://reader033.vdocuments.net/reader033/viewer/2022061613/553399554a7959e6558b4970/html5/thumbnails/42.jpg)
Thank you