for computer vision applications in smart camera network

Reconfigurable FPGA Architecture for Computer Vision applications in

Smart Camera Network

Luca MaggianiSupervisors: Prof. Roberto Saletti (Università di Pisa)

Dr. Paolo Pagano (Scuola Superiore Sant’Anna)Prof. François Berry (Blaise Pascal University, Clermont Ferrand)

Project work done at TeCIP Institute, Scuola Superiore Sant’Anna

Outline

● Smart Camera ● Architecture of a Smart Camera Network node● Reconfigurable FPGA architecture● Image processing in SCN● Implementation and results

○ Smart Camera in the “Internet of Things”○ Histogram of Oriented Gradients

● Conclusion

Smart Camera

What is a Smart Camera?

Smart Camera combine:

➔ sensing➔ processing➔ communications

Smart Camera as evolution of WSN

image processing & understanding

Centralised architecture

Static infrastructure

Single point of failure

Limited in-node processing

Why Smart Camera?Billions of cameras are deployed in public and private environments

➔ Video surveillance➔ Transportation➔ Entertainments➔ Security➔ Autonomous vehicle

● Row or compressed data is sent to a central server

● High data bandwidth requested (wired connection)

Making cameras smarter

image sampling &

enhancement

standard camera

image sampling &

enhancement

Image processing &understanding

Raw data

Events

on board

Smart Camera

on board

limited bandwidth

high bandwidth

“Process data where it is captured” B. Rinner, Pervasive Smart Cameras, PECCS 2011

Smart Camera Network

Distributed processing in a distributed network

Distributed processing:★ decentralised approach★ events notification★ reduced data transfer

Distributed network:★ pervasive, collaborative,

dynamic network★ low power infrastructure★ Reliability

SCN scenariosPedestrian and vehicle tracking,Distributed video surveillance

Smart Cities

Smart Camera issues

Events

limited bandwidth

local image processing operation

Local image processing

● heavy image processing tasks○ limited hardware resource

● power consumption constrained● processing and understanding

Limited bandwidth

● Low power wireless protocol○ IEEE802.15.4

● Aggregated data packet● Unreliable medium● Event driven communication

image sampling &

enhancement

Image processing &understanding

on board

Smart Camera

Local image processing

QVGA 320x240

Filtering76800 pixels

8 bit / pixelOutput image

➢ 76800 pixels per frame x 25 FPS = 1.92 Mpix / s➢ the simplest filtering kernel uses 3x3 convolutional matrix

○ 8 iterations per pixel○ plus overhead for sum/subtraction ~4 iterations

➢ at least, 20 MOPS are requested only for a filtering

State of the art solution: ★ High frequency DSP or CPU - SLR engineering★ reduced image resolution - LittleSister project★ Custom processor - ASIC - Xetal-Pro

http://www.slr-engineering.at/

http://www.iminds.be/en/research/overview-projects/p/detail/littlesister

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04443179

Are FPGAs suited for SCN?

Usual FPGA drawbacks:➢ require highly specialised knowledge of electronic design

○ HDL, methodology, different than software approach➢ custom IPs are targeted to a specific application

○ limited hardware reuse➢ Static hardware architecture

FPGAs combine:★ parallel processing★ flexible architecture★ integrated solution★ limited power consumption

Thesis’ goals

➢ Smart Camera architecture definition ○ low-cost (pervasiveness)○ easy to use

➢ Development of a flexible FPGA-architecture○ optimised processing (parallelism)○ ubiquitous applications (reconfigurable)

➢ Exploiting CV tasks with in-node processing○ Image processing hardware IP○ Hardware Library

Architecture of a Smart Camera Network node

Middleware abstraction

Architecture of a SCN node (1/3)

FPGA + microcontroller architecture:● FPGA+SoftCore: image capture and heavy processing● uC: network and middleware handling


The FPGA provides:1. Camera interface2. Hardware image pre-processing (streaming paradigm)3. Feature extraction and object recognition


The FPGA extracts image features and sends aggregated data to the microcontroller

The microcontroller implements the resource abstraction on the network and handles

configuration settings.

RS232

Reconfigurable FPGA architecture

FPGA-based solution:The Hw-Sw codesign approach

Hardware-Software codesign: Hardware and Software joint development technique to exploit both the HDL optimisation and the

code flexibility

● Optimised solutions● Dedicated architecture● Power efficiency

● Code flexibility● Dynamic configuration● Sequential operations

Why a mixed Hw/Sw solution?

"Pure-hardware" approachoptimised solution

"Pure-hardware" approach: dedicated and monolithic solution, focused only on a particular operation

power efficiency

Software

Hardware

While, a mixed Hw/Sw solution provides:● Optimised computer video pipeline

○ HW modules● Configurable parameters● Easy debug ● Constant output latency

FPGA architecture proposal

L.Maggiani, C.Salvadori, M.Petracca, P.Pagano, R.Saletti, “Reconfigurable FPGA Architecture for Computer Vision applications in Smart Camera Network”, in proceedings of the 7th International

Conference on Distributed Smart Camera, ICDSC, Palm Springs, Oct 2013

HDL abstraction

➢ Hardware architecture○ optimised performances

➢ Model based oriented○ Limited HDL knowledge○ Hardware IP reuse

What we mean as “reconfigurable”?Single in/Single out Double in/Double out

Single in/Double out Double in/Single out

Data flow redirection

Multiple inputs are allowed, but they have to be on differentoutput bus (stream collisions are avoided).

Software addressable IPEach hardware IP is configurable through the SoftCore databus

Altera NIOSII CPU

Why are we using a softcore CPU?1. Memory mapped peripheral2. Useful during debug3. Integrated development tool

50MHz, 32MB SDRAM, 600 Logic Elements,

royalty free

Image processing in a Smart Camera Network

Our approach to Computer Vision

Where hardware optimization meets the software flexibility..

HDL module

HDL module

HDL module

HDL module

Computer vision pipeline

software configuration

HDL instance HDL instance

Embedded CV dataflow

Hardware Library tool

1. reduced development time

2. easy to use (drag&drop)

3. complete IP reuse

Hardware Library

● RouteMatrix

● VideoSampler

● RemoteImg

● GradientHW

● HistogramHW

● StreamStore

● NormHW testing

● StereoHW design phase

● HoughHW design phase

Hardware Library instance: GradientHW

Performs a spatial gradient extraction, with a fixed result latency (2 clock cycles)

Hardware Library instance: HistogramHW

Performs the histogram extraction over a configurable size window

Software flexibility● configurable cell width

5x5, 6x6, 8x8,full scale● configurable bins n°● threshold

Hardware optimisation● video stream ● low memory use● parallel read-modify-write

operations

HistogramHW (internal view)

Implementation and results

The boards

Smart Camera for the IoT

L.Maggiani, G. M.Iodice, A.Gassani, C. Salvadori, A.Azzarà, R. Saletti, P.Pagano, “A novel architecture of a Smart Camera Networks tailored to the IoT”, Workshop on Architecture of Smart Camera,Seville,June 2013

Hardwarepre-

processing

SoftCoreelaboration

about 1 fps@ Q-VGA resolution

In our demo we show:

● local processing algorithm (GBHT)● a smart camera network based on IPv6 protocol

Hardware based elaboration

Software based elaboration (SoftCore)

Probabilistic shape recognition

Lineintercept

VideoSampler GradientHW StreamStore

Smart Camera for the IoT: Goals

SmartCamera1IPv6 address: 2001::a:a:ff:fe00:1 Smart Camera Network

○ In-node heavy processing

○ Every node connected to the SCN is addressable using IPv6

○ The SCN node made available to the network the resource of the triangle coordinates (when the triangle is detected)

SmartCamera2IPv6 address: 2001::a:a:ff:fe00:22

Smart Camera for the IoT: Demo

SmartCamera1

SmartCamera2

Border Router

Histogram of Oriented Gradients

State of the art for pedestrian and vehicle detection

Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, INRIA, 2005

HOG pipeline

Gradient extraction Histogram Normalization

FPGA implementation

HOG results

L.Maggiani, C.Salvadori, P.Pagano, “FPGA implementations of Histograms of Oriented Gradients for pedestrian detection - Hw-Sw codesign approach”, Workshop on Architecture of Smart Camera, Seville, June 2013

HOG performances

VideoSampler

GradientHW

HistogramHW

NormHW

200 LUT512 Byte (FIFO buffering)

1200 LUT960 Byte (Row buffering)32 DSP module 9x9

850 LUT16 kByte (Cell buffering)

400 LUT2 kByte (Block buffering)module still in testing

only 12% of EP4CE22 within DE0-nano

All the blocks inside the pipeline are implemented using the streaming paradigm:

● constant latency: the maximum value between all the block latencies

○ Maximum latency = Histogram latency (2560 clock cycles)

● works at the same fps as input○ the maximum manageable frame

rate depends on technological constraints

○ contingent case: 12 fps (~83ms)

Next step: Support Vector Machine

Conclusion

● A Smart Camera Network node architecture has been designed and implemented

● The proposed reconfigurable FPGA architecture has been deployed in image processing tasks

● Due to the hardware abstraction, limited knowledge of FPGA programming is requested

● With the current Hardware Library, only simple operations are available

Future work

● Realise a one-chip solution, where the SoftCore manages the configuration and the network communications

● New Hardware Library modules

● Hardware classifier for HOG pipeline

● Deploy a Smart Camera Network test bed, where each node is able to exchange information with the others

Smart Camera SoPC

video

networkNIOS

for computer vision applications in smart camera network

Documents