christian steinle, university of heidelberg, institute of computer engineering1 l1 hough tracking on...
TRANSCRIPT
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
1
L1 Hough Tracking on a Sony Playstation III
Christian Steinle, Andreas Kugel, Reinhard MännerComputer Engineering, University of Heidelberg
• Contents
– Motivation for the Playstation implementation– Algorithm adaption– Job restrictions– Algorithm implementation– Status
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
2
• Hough Space: 1 dimension for 1 parameter of a track– bending 1/Pz, angle (Px/Pz) and Py/Pz)– Hough Space dimensions: 95 x 31 (cells) x 191 (layers)
– Detector slice with constant angle corresponds to one 2D Hough histogram
– Detector slices are overlapping (multiple scattering)
Z
X
Y
Py/Pz
Px/P z
1 /Pz
Motivation for the Playstation implementation
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
3
Motivation for the Playstation implementation
• L1 Hough Tracking algorithm development in HDL for FPGA HDL descriptions for all critical functional units
• Synthesis result of a single histogram layer implemented with registers shows the requirement of about 30.000 logic cells
50% of FPGA (XC4VLX60: 59,904) registers used for layer
• Conclusion: Develop a multi-chip solution
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
4
Motivation for the Playstation implementation
• Conclusion: Use the CellBE of a Sony Playstation III as cheap and
flexible rapid prototyping system
Multi-chip algorithm Multi-core platform
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
5
Algorithm adaption
• Conclusion: Two parallelism levels:
multiple processors work on input hit data in parallel vector capable ALUs process multiple layers in parallel
Creation of input data packages as so-called jobs
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
6
Job restrictions
1. Balance workload for the processors
• Number of histogram layers: 191• Number of processors for a Sony Playstation III: 6
– Cell BE: 8 but 1 disabled and 1 reserved for OS
yersForJobnumberOfLa32:n1Restrictio
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
7
Job restrictions
2. Optimal use of the processor‘s vector capable ALU
• Each histogram cell contains a 5 bit signature (1bit/detector)
• The vector capable ALU is 128 bit 128 bit / 5 bit / signature = 25 signatures in parallel
• Speed gain by using an implicite type instead of bit manipulation
unsigned char (8 bit) is ALU and memory applicable Each histogram cell contains actually 3 unused bits 128 bit / 8 bit / signature = 16 signatures in parallel
• Random access requires the composing of the processing ALU vector with 16 elements from the memory
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
8
Job restrictions
• Hough transformed Hit contains γmin and γmax
Hit is inserted into γmax - γmin + 1 consecutive layers
• Speed gain by optimizing the histogram memory structure while using the consecutive histogram layer entries for a single hit
Direct memory access on up to 16 consecutive layers in parallel
Each hit in a job is systolically processed just once in parallel A single hit can contribute to more than one job (γmax - γmin >
16, γmax of hit > γmax of actual job) yersForJobnumberOfLa16:n2Restrictio
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
9
Job restrictions
3. Optimal use of the processor‘s local storage
• Local storage contains program code, static data, heap and stack
• Conclusion: Used memory for job = codingTable + histogram + input data Available memory = stackPointer – heapPointer -
securityRegion ForJobusedMemoryemoryavailableM:n3Restrictio
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
10
Algorithm implementation
• Configuration part on the PPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
11
Algorithm implementation
• Configuration part on the SPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
12
Algorithm implementation
• Different versions of the job creation are possible
– CODEVERSION 1:• Create all jobs with a loop over all hits before the SPU processing
– CODEVERSION 2:• Create all jobs with a loop over all jobs before the SPU processing
– CODEVERSION 3:• Create the jobs parallely pipelined with the SPU processing
– CODEVERSION 4:• Create the jobs for dedicated SPUs when they are ready
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
13
Algorithm implementation
• Processing part one on the PPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
14
Algorithm implementation
• Processing part two on the PPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
15
Algorithm implementation
• Processing part three on the PPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
16
Algorithm implementation
• Different versions of the histogram memory usage
– MEMORYVERSION 1:• Use vector type for memory and computation
– MEMORYVERSION 2:• Use smallest type for a single histogram cell in the memory and
build vector type for computation
– MEMORYVERSION 3:• Use smallest type for histogram cells in parallel layers and build
vector type for computation
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
17
Algorithm implementation
• Proccessing part on the SPU
Christian Steinle, University of Heidelberg, Institute of Computer Engineering
18
Status
• Algorithm adaption is implemented
• Hough processing has to be implemented:– Actual step in scalar version: peakfinding2D– Missing steps in scalar version: peakfinding3D– Actual step in vector version: diagonalization– Missing steps in vector version: peakfinding2D, peakfinding3D
• Measure the timing
• Compare the timing with– the timing of the C++ framework implementation– the estimated timing of the FGPA implementation