elad hadar omer norkin supervisor: mike sumszyk winter 2010/11, single semester project....

Post on 29-Dec-2015

217 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Elad HadarOmer Norkin Supervisor: Mike Sumszyk

Winter 2010/11, Single semester project.Date:22/4/12

Technion – Israel Institute of TechnologyFaculty of Electrical EngineeringHigh Speed Digital System Lab (HS DSL)

Exploring new implementation tools for GIDEL PROCSTAR platform

(PART II - PROCAPI)

Project motivationImplementing a video analysis designs on GIDEL PROCSTAR III platform that will enable usage and exploration of a new development platform (PART I – PROCHILs, PART II – PROCWIZARD, PROCAPI, PROCMegaFIFO ).

Proper usage of development tools throughout all stages of implementation from algorithm to hardware.

Preparing a clear user-guide that will enable a fast and simple ramp-up of the tools and the appropriate flow.

First part - PROCHILs

• PROCHILs is a Hardware-In-the-Loop acceleration tool for running Simulink designs on FPGAs.

• Automatically translate Simulink designs into FPGA code (compatible with the PROC board installed on the target PC) and run it under Simulink.

PROCHILs main advantages• Speed - Dramatically improves simulation speed, with a

dedicated accelerator for Simulink designs. Direct Hardware Burn. Direct generation HDL code that matches the target board. Fast HW simulation using Simulink/Matlab interface.

• S imp l i c i t y - Enables building a design visually and uploading it directly, with minimal effort, into the PROC board.

• Effi c iency - Enables concurrent engineering at an early stage. Cuts development cycle time (and costs). Extremely efficient on resources consuming processing

algorithms.

• Re l i ab i l i t y - Improve design reliability.

• We have measured the input data processing time

of the Hardware generated by ProcHIL and the equivalent Software simulation for different length data vectors.

• There is a significant acceleration of the processing time using the generated Hardware, especially for longer data vectors.

PROCHILs Performance

• Ratio between the Hardware & Software running time for different length input vectors

• An exponential curve fitting will give us that the ratio converges to ~128.

PROCHILs Performance

• Even when considering the highest processing ratio, we get a rate of which is very low.

• For 512x512 pixels image it means 1.04 [Frames/Sec] which is insufficient for video streaming.

PROCHILs main weakness

15,000,000[ / sec] 273,573[ / sec]

54.83pixels pixels

PROCAPI

• Not suited for applying on streaming data designs (Real-Time designs)

PROCAPI introduction

A set of functions that provide the means to access PROC boards by supplying methods that enable real-time configuration and querying of the board.

Motivation: Learning and practice of effective debug methodology using PROCAPI while streaming video through an image processing design.

PROCAPI allows the user to control data transfer between the PC and the PROC board (using a controllable DMA channels). In PROCHIL this ability is transparent to the user.

Main goals and phases of work1) Learning PROC API, PROCMegaFIFO2) Define and build an integrated DSPbuilder design

combining PROCAPI video streaming functions, data channels and PROCMegaFIFO memories.

Hardware and Development environment

• GiDEL PROCAPI (Version 8.8)• ALTERA’s DSPBuilder blockset for Simulink (Version 10.1)

• ProcWizard (Version 8.8)• Quartus II (version 10.1)• Matlab (Version 2009a)• OpenCV (Version 2.1)

• GiDEL PROCStar III (Altera Stratix III) board (4-FPGA)

System block diagram

Project flow

Prewitt edge detector implementation

Controller

Prewitt edge detector

Prewitt edge detector

Prewitt edge detector

Prewitt edge detector

222 255 201 180

155 111 143 96

87 87 55 27

34 67 0 3

A

0 0 255 0

255 255 255 0

0 0 255 0

0 0 255 0

Result 0

255

if G threshold

else

1 0 1 1 1 1

1 0 1 *A 0 0 0 *A

1 0 1 1 1 1X YG G

X YG G G

Pixel neighborhood storing

Controller

Controller

Controller

Preventing pipe contamination• enableBit and clken are both connected to read_acknowledge from

FIFO IN.• Data pipeline of the Prewitt edge detector is 512+4 stages long.• When FIFO IN is empty:

1. Stop all data propagation in order to avoid garbage in the pipe that will affect the algorithm correctness.

2. The writing will resume only when a new valid pixel arrives at the end of the data pipeline.

Interrupt control• Two inputs:

1. WE - Is FIFO IN empty?2. AD – Is the arriving pixel the last pixel of a frame?

• One output:1. dmaInterrupt

Controller implementation

• Data may be propagating or stopped.• Pixel is not the last pixel of a frame

• One cycle state• interrupt is sent

• Frame is finished but the FIFO IN is empty AD is always on until New pixel resets the counter

1 1 2

2 1 2

1 2

1 ( )

1 ( )

( )

F n F n F n WE n

F n F n F n WE n

DMAI F n F n AD n

Interrupt controller

1 1 2

2 1 2

1 2

1 ( )

1 ( )

( )

F n F n F n WE n

F n F n F n WE n

DMAI F n F n AD n

Signal compiler

VHDLLibrary

Quartus

VHDLLibrary

.rbf

Source & Header files

PROCWizard

PROC API

Read Frame

Display Frame

C code structure

• opening camera • Setting DMA channels• Define new buffers

• Capturing new frame• Write the frame from

input buffer to input FIFO• Interrupt • Show original frame

Main function

(pre processing)

Second thread function

(post processing)Loop

• Write the frame from output FIFO to output buffer

• Show processed frame

Blue-OPENCV function

Purple-API function

Red-interrupt

Frame Rate

• We used TIC & TOC macros using OpenCV functions to asses the video output stream frame rate.

• We did so by measuring time elapsed for presenting 30 frames, And thus concluding frame rate.

• Full Prewitt edge detector design: 12[fps]

• Empty design- image capture and present- Hardware: 25[fps]

• Empty design- image capture and present- Simulink: 23[fps]

Frame loss detection

1 0 1

1 0 1

1 0 1

convolution with

amp

1 1 1

0 0 0

1 1 1

convolution with

amp

Frame loss detection

• By creating another version of the Prewitt edge detector, we managed to divide the doubled number back to its value

Time delay

• We have also noticed a time delay between the original input stream and the filtered one. We had 2 hypotheses to the cause of that delay:• The delay was caused by the code complexity, by the multiple

loops, memory copying.• The delay is caused by the FIFO and its size, due to an

accumulation of frames in the FIFO waiting to be extracted.

• Although we made many optimizations in the C++ code, the delay was not reduced at all.

• When we reduced the FIFOs size from 8MB to1MB the time delay decreased dramatically from about 3 seconds to less than half a second.

STRATIX III FPGA

DSP-Builder based edge detection design

FIFO IN FIFO OUT

Time delay

• We assumed that FIFO OUT is always near empty, because the extraction of the processed images is not limited by the hardware’s rate (12[fps]), and performed at the high rate that the DMA can accomplish.

• The delay is mainly because of the time it takes the images to pass through the 1MB FIFO IN at a rate of 12[fps]. That calculates to 4 images delay, and means 0.333 second delay, as observed by us.

Results & Conclusions

• Frame rate is satisfying - 12• No frames are lost!• Time delay is very low – less than half a second

sec

frames

Useful outputs

• An implemented template for all video streaming processing algorithms. Only minimal effort is needed to integrate a new algorithm and run it!

• A full user-guide which enables a fast and simple ramp-up of the tools, summs up all conclusions made and consists of needed background knowledge.

• The Lab team• Special thanks to

Mike Sumszyk who guided us with devotion….

top related