elad hadar omer norkin supervisor: mike sumszyk winter 2010/11, single semester project....
TRANSCRIPT
Elad HadarOmer Norkin Supervisor: Mike Sumszyk
Winter 2010/11, Single semester project.Date:22/4/12
Technion – Israel Institute of TechnologyFaculty of Electrical EngineeringHigh Speed Digital System Lab (HS DSL)
Exploring new implementation tools for GIDEL PROCSTAR platform
(PART II - PROCAPI)
Project motivationImplementing a video analysis designs on GIDEL PROCSTAR III platform that will enable usage and exploration of a new development platform (PART I – PROCHILs, PART II – PROCWIZARD, PROCAPI, PROCMegaFIFO ).
Proper usage of development tools throughout all stages of implementation from algorithm to hardware.
Preparing a clear user-guide that will enable a fast and simple ramp-up of the tools and the appropriate flow.
First part - PROCHILs
• PROCHILs is a Hardware-In-the-Loop acceleration tool for running Simulink designs on FPGAs.
• Automatically translate Simulink designs into FPGA code (compatible with the PROC board installed on the target PC) and run it under Simulink.
PROCHILs main advantages• Speed - Dramatically improves simulation speed, with a
dedicated accelerator for Simulink designs. Direct Hardware Burn. Direct generation HDL code that matches the target board. Fast HW simulation using Simulink/Matlab interface.
• S imp l i c i t y - Enables building a design visually and uploading it directly, with minimal effort, into the PROC board.
• Effi c iency - Enables concurrent engineering at an early stage. Cuts development cycle time (and costs). Extremely efficient on resources consuming processing
algorithms.
• Re l i ab i l i t y - Improve design reliability.
• We have measured the input data processing time
of the Hardware generated by ProcHIL and the equivalent Software simulation for different length data vectors.
• There is a significant acceleration of the processing time using the generated Hardware, especially for longer data vectors.
PROCHILs Performance
• Ratio between the Hardware & Software running time for different length input vectors
• An exponential curve fitting will give us that the ratio converges to ~128.
PROCHILs Performance
• Even when considering the highest processing ratio, we get a rate of which is very low.
• For 512x512 pixels image it means 1.04 [Frames/Sec] which is insufficient for video streaming.
PROCHILs main weakness
15,000,000[ / sec] 273,573[ / sec]
54.83pixels pixels
PROCAPI
• Not suited for applying on streaming data designs (Real-Time designs)
PROCAPI introduction
A set of functions that provide the means to access PROC boards by supplying methods that enable real-time configuration and querying of the board.
Motivation: Learning and practice of effective debug methodology using PROCAPI while streaming video through an image processing design.
PROCAPI allows the user to control data transfer between the PC and the PROC board (using a controllable DMA channels). In PROCHIL this ability is transparent to the user.
Main goals and phases of work1) Learning PROC API, PROCMegaFIFO2) Define and build an integrated DSPbuilder design
combining PROCAPI video streaming functions, data channels and PROCMegaFIFO memories.
Hardware and Development environment
• GiDEL PROCAPI (Version 8.8)• ALTERA’s DSPBuilder blockset for Simulink (Version 10.1)
• ProcWizard (Version 8.8)• Quartus II (version 10.1)• Matlab (Version 2009a)• OpenCV (Version 2.1)
• GiDEL PROCStar III (Altera Stratix III) board (4-FPGA)
System block diagram
Project flow
Prewitt edge detector implementation
Controller
Prewitt edge detector
Prewitt edge detector
Prewitt edge detector
Prewitt edge detector
222 255 201 180
155 111 143 96
87 87 55 27
34 67 0 3
A
0 0 255 0
255 255 255 0
0 0 255 0
0 0 255 0
Result 0
255
if G threshold
else
1 0 1 1 1 1
1 0 1 *A 0 0 0 *A
1 0 1 1 1 1X YG G
X YG G G
Pixel neighborhood storing
Controller
Controller
Controller
Preventing pipe contamination• enableBit and clken are both connected to read_acknowledge from
FIFO IN.• Data pipeline of the Prewitt edge detector is 512+4 stages long.• When FIFO IN is empty:
1. Stop all data propagation in order to avoid garbage in the pipe that will affect the algorithm correctness.
2. The writing will resume only when a new valid pixel arrives at the end of the data pipeline.
Interrupt control• Two inputs:
1. WE - Is FIFO IN empty?2. AD – Is the arriving pixel the last pixel of a frame?
• One output:1. dmaInterrupt
Controller implementation
• Data may be propagating or stopped.• Pixel is not the last pixel of a frame
• One cycle state• interrupt is sent
• Frame is finished but the FIFO IN is empty AD is always on until New pixel resets the counter
1 1 2
2 1 2
1 2
1 ( )
1 ( )
( )
F n F n F n WE n
F n F n F n WE n
DMAI F n F n AD n
Interrupt controller
1 1 2
2 1 2
1 2
1 ( )
1 ( )
( )
F n F n F n WE n
F n F n F n WE n
DMAI F n F n AD n
Signal compiler
VHDLLibrary
Quartus
VHDLLibrary
.rbf
Source & Header files
PROCWizard
PROC API
Read Frame
Display Frame
C code structure
• opening camera • Setting DMA channels• Define new buffers
• Capturing new frame• Write the frame from
input buffer to input FIFO• Interrupt • Show original frame
Main function
(pre processing)
Second thread function
(post processing)Loop
• Write the frame from output FIFO to output buffer
• Show processed frame
Blue-OPENCV function
Purple-API function
Red-interrupt
Frame Rate
• We used TIC & TOC macros using OpenCV functions to asses the video output stream frame rate.
• We did so by measuring time elapsed for presenting 30 frames, And thus concluding frame rate.
• Full Prewitt edge detector design: 12[fps]
• Empty design- image capture and present- Hardware: 25[fps]
• Empty design- image capture and present- Simulink: 23[fps]
Frame loss detection
1 0 1
1 0 1
1 0 1
convolution with
amp
1 1 1
0 0 0
1 1 1
convolution with
amp
Frame loss detection
• By creating another version of the Prewitt edge detector, we managed to divide the doubled number back to its value
Time delay
• We have also noticed a time delay between the original input stream and the filtered one. We had 2 hypotheses to the cause of that delay:• The delay was caused by the code complexity, by the multiple
loops, memory copying.• The delay is caused by the FIFO and its size, due to an
accumulation of frames in the FIFO waiting to be extracted.
• Although we made many optimizations in the C++ code, the delay was not reduced at all.
• When we reduced the FIFOs size from 8MB to1MB the time delay decreased dramatically from about 3 seconds to less than half a second.
STRATIX III FPGA
DSP-Builder based edge detection design
FIFO IN FIFO OUT
Time delay
• We assumed that FIFO OUT is always near empty, because the extraction of the processed images is not limited by the hardware’s rate (12[fps]), and performed at the high rate that the DMA can accomplish.
• The delay is mainly because of the time it takes the images to pass through the 1MB FIFO IN at a rate of 12[fps]. That calculates to 4 images delay, and means 0.333 second delay, as observed by us.
Results & Conclusions
• Frame rate is satisfying - 12• No frames are lost!• Time delay is very low – less than half a second
sec
frames
Useful outputs
• An implemented template for all video streaming processing algorithms. Only minimal effort is needed to integrate a new algorithm and run it!
• A full user-guide which enables a fast and simple ramp-up of the tools, summs up all conclusions made and consists of needed background knowledge.
• The Lab team• Special thanks to
Mike Sumszyk who guided us with devotion….