for computer vision applications in smart camera network
TRANSCRIPT
Reconfigurable FPGA Architecture for Computer Vision applications in
Smart Camera Network
Luca MaggianiSupervisors: Prof. Roberto Saletti (Università di Pisa)
Dr. Paolo Pagano (Scuola Superiore Sant’Anna)Prof. François Berry (Blaise Pascal University, Clermont Ferrand)
Project work done at TeCIP Institute, Scuola Superiore Sant’Anna
Outline
● Smart Camera ● Architecture of a Smart Camera Network node● Reconfigurable FPGA architecture● Image processing in SCN● Implementation and results
○ Smart Camera in the “Internet of Things”○ Histogram of Oriented Gradients
● Conclusion
What is a Smart Camera?
Smart Camera combine:
➔ sensing➔ processing➔ communications
Smart Camera as evolution of WSN
image processing & understanding
Centralised architecture
Static infrastructure
Single point of failure
Limited in-node processing
Why Smart Camera?Billions of cameras are deployed in public and private environments
➔ Video surveillance➔ Transportation➔ Entertainments➔ Security➔ Autonomous vehicle
● Row or compressed data is sent to a central server
● High data bandwidth requested (wired connection)
Making cameras smarter
image sampling &
enhancement
standard camera
image sampling &
enhancement
Image processing &understanding
Raw data
Events
on board
Smart Camera
on board
limited bandwidth
high bandwidth
“Process data where it is captured” B. Rinner, Pervasive Smart Cameras, PECCS 2011
Smart Camera Network
Distributed processing in a distributed network
Distributed processing:★ decentralised approach★ events notification★ reduced data transfer
Distributed network:★ pervasive, collaborative,
dynamic network★ low power infrastructure★ Reliability
Smart Camera issues
Events
limited bandwidth
local image processing operation
Local image processing
● heavy image processing tasks○ limited hardware resource
● power consumption constrained● processing and understanding
Limited bandwidth
● Low power wireless protocol○ IEEE802.15.4
● Aggregated data packet● Unreliable medium● Event driven communication
image sampling &
enhancement
Image processing &understanding
on board
Smart Camera
Local image processing
QVGA 320x240
Filtering76800 pixels
8 bit / pixelOutput image
➢ 76800 pixels per frame x 25 FPS = 1.92 Mpix / s➢ the simplest filtering kernel uses 3x3 convolutional matrix
○ 8 iterations per pixel○ plus overhead for sum/subtraction ~4 iterations
➢ at least, 20 MOPS are requested only for a filtering
State of the art solution: ★ High frequency DSP or CPU - SLR engineering★ reduced image resolution - LittleSister project★ Custom processor - ASIC - Xetal-Pro
Are FPGAs suited for SCN?
Usual FPGA drawbacks:➢ require highly specialised knowledge of electronic design
○ HDL, methodology, different than software approach➢ custom IPs are targeted to a specific application
○ limited hardware reuse➢ Static hardware architecture
FPGAs combine:★ parallel processing★ flexible architecture★ integrated solution★ limited power consumption
Thesis’ goals
➢ Smart Camera architecture definition ○ low-cost (pervasiveness)○ easy to use
➢ Development of a flexible FPGA-architecture○ optimised processing (parallelism)○ ubiquitous applications (reconfigurable)
➢ Exploiting CV tasks with in-node processing○ Image processing hardware IP○ Hardware Library
Architecture of a SCN node (1/3)
FPGA + microcontroller architecture:● FPGA+SoftCore: image capture and heavy processing● uC: network and middleware handling
Architecture of a SCN node (2/3)
The FPGA provides:1. Camera interface2. Hardware image pre-processing (streaming paradigm)3. Feature extraction and object recognition
Architecture of a SCN node (3/3)
The FPGA extracts image features and sends aggregated data to the microcontroller
The microcontroller implements the resource abstraction on the network and handles
configuration settings.
RS232
FPGA-based solution:The Hw-Sw codesign approach
Hardware-Software codesign: Hardware and Software joint development technique to exploit both the HDL optimisation and the
code flexibility
● Optimised solutions● Dedicated architecture● Power efficiency
● Code flexibility● Dynamic configuration● Sequential operations
Why a mixed Hw/Sw solution?
"Pure-hardware" approachoptimised solution
"Pure-hardware" approach: dedicated and monolithic solution, focused only on a particular operation
power efficiency
Software
Hardware
While, a mixed Hw/Sw solution provides:● Optimised computer video pipeline
○ HW modules● Configurable parameters● Easy debug ● Constant output latency
FPGA architecture proposal
L.Maggiani, C.Salvadori, M.Petracca, P.Pagano, R.Saletti, “Reconfigurable FPGA Architecture for Computer Vision applications in Smart Camera Network”, in proceedings of the 7th International
Conference on Distributed Smart Camera, ICDSC, Palm Springs, Oct 2013
HDL abstraction
➢ Hardware architecture○ optimised performances
➢ Model based oriented○ Limited HDL knowledge○ Hardware IP reuse
What we mean as “reconfigurable”?Single in/Single out Double in/Double out
Single in/Double out Double in/Single out
Data flow redirection
Multiple inputs are allowed, but they have to be on differentoutput bus (stream collisions are avoided).
Software addressable IPEach hardware IP is configurable through the SoftCore databus
Altera NIOSII CPU
Why are we using a softcore CPU?1. Memory mapped peripheral2. Useful during debug3. Integrated development tool
50MHz, 32MB SDRAM, 600 Logic Elements,
royalty free
Our approach to Computer Vision
Where hardware optimization meets the software flexibility..
HDL module
HDL module
HDL module
HDL module
Computer vision pipeline
software configuration
HDL instance HDL instance
Embedded CV dataflow
Hardware Library tool
1. reduced development time
2. easy to use (drag&drop)
3. complete IP reuse
Hardware Library
● RouteMatrix
● VideoSampler
● RemoteImg
● GradientHW
● HistogramHW
● StreamStore
● NormHW testing
● StereoHW design phase
● HoughHW design phase
Hardware Library instance: GradientHW
Performs a spatial gradient extraction, with a fixed result latency (2 clock cycles)
Hardware Library instance: HistogramHW
Performs the histogram extraction over a configurable size window
Software flexibility● configurable cell width
5x5, 6x6, 8x8,full scale● configurable bins n°● threshold
Hardware optimisation● video stream ● low memory use● parallel read-modify-write
operations
Smart Camera for the IoT
L.Maggiani, G. M.Iodice, A.Gassani, C. Salvadori, A.Azzarà, R. Saletti, P.Pagano, “A novel architecture of a Smart Camera Networks tailored to the IoT”, Workshop on Architecture of Smart Camera,Seville,June 2013
Hardwarepre-
processing
SoftCoreelaboration
about 1 fps@ Q-VGA resolution
In our demo we show:
● local processing algorithm (GBHT)● a smart camera network based on IPv6 protocol
Hardware based elaboration
Software based elaboration (SoftCore)
Probabilistic shape recognition
Lineintercept
VideoSampler GradientHW StreamStore
Smart Camera for the IoT: Goals
SmartCamera1IPv6 address: 2001::a:a:ff:fe00:1 Smart Camera Network
○ In-node heavy processing
○ Every node connected to the SCN is addressable using IPv6
○ The SCN node made available to the network the resource of the triangle coordinates (when the triangle is detected)
SmartCamera2IPv6 address: 2001::a:a:ff:fe00:22
Histogram of Oriented Gradients
State of the art for pedestrian and vehicle detection
Histograms of Oriented Gradients for Human Detection, Dalal and Triggs, INRIA, 2005
HOG results
L.Maggiani, C.Salvadori, P.Pagano, “FPGA implementations of Histograms of Oriented Gradients for pedestrian detection - Hw-Sw codesign approach”, Workshop on Architecture of Smart Camera, Seville, June 2013
HOG performances
VideoSampler
GradientHW
HistogramHW
NormHW
200 LUT512 Byte (FIFO buffering)
1200 LUT960 Byte (Row buffering)32 DSP module 9x9
850 LUT16 kByte (Cell buffering)
400 LUT2 kByte (Block buffering)module still in testing
only 12% of EP4CE22 within DE0-nano
All the blocks inside the pipeline are implemented using the streaming paradigm:
● constant latency: the maximum value between all the block latencies
○ Maximum latency = Histogram latency (2560 clock cycles)
● works at the same fps as input○ the maximum manageable frame
rate depends on technological constraints
○ contingent case: 12 fps (~83ms)
Conclusion
● A Smart Camera Network node architecture has been designed and implemented
● The proposed reconfigurable FPGA architecture has been deployed in image processing tasks
● Due to the hardware abstraction, limited knowledge of FPGA programming is requested
● With the current Hardware Library, only simple operations are available
Future work
● Realise a one-chip solution, where the SoftCore manages the configuration and the network communications
● New Hardware Library modules
● Hardware classifier for HOG pipeline
● Deploy a Smart Camera Network test bed, where each node is able to exchange information with the others
Smart Camera SoPC
video
networkNIOS