use of wavelet for image processing in smart cameras with low hardware resources

7
Use of wavelet for image processing in smart cameras with low hardware resources Sébastien Courroux a , Stéphane Chevobbe a,, Mehdi Darouich a , Michel Paindavoine b a CEA, LIST, CEA Saclay Nano-INNOV, PC 172, F-91191 Gif-sur-Yvette, France b LEAD Laboratory, Université de Bourgogne, 21000 Dijon, France article info Article history: Available online 7 August 2013 Keywords: Wavelet DWT Demosaicing Denoising Recognition Embedded systems abstract Images from embedded sensors need digital processing to recover high-quality images and to extract fea- tures of a scene. Depending on the properties of the sensor and on the application, the designer fits together different algorithms to process images. In the context of embedded devices, the hardware sup- porting those applications is very constrained in terms of power consumption and silicon area. Thus, the algorithms have to be compliant with the embedded specifications i.e. reduced computational complex- ity and low memory requirements. We investigate the opportunity to use the wavelet representation to perform good quality image processing algorithms at a lower computational complexity than using the spatial representation. To reproduce such conditions, demosaicing, denoising, contrast correction and classification algorithms are executed over several well known embedded cores (Leon3, Cortex A9 and DSP C6x). Wavelet-based image reconstruction shows higher image quality and lower computational complexity (3x) than usual spatial reconstruction. The use of wavelet decomposition also permits to increase the recognition rate of faces while decreasing computational complexity by a factor 25. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction Images from embedded sensors are often noised and spatially distributed over the three primary color channels. Digital process- ing is mandatory to recover full-resolution and high-quality images or to extract features from a scene. Depending on the prop- erties of the sensor and on the application, the designer uses some digital processing algorithms to build a complete processing pipe- line, dedicated to the targeted application. Preprocessing opera- tions such as vignetting or fixed noise reduction are executed in the analog domain, prior to the ADC operation. Reconstruction and enhancement operations such as demosaicing, non-fixed noise reduction or gamma correction are usually part of the processing pipeline. Demosaicing operation recovers full resolution color images from a subsampled Color Filter Array (CFA) image. Basic methods interpolate each channel separately [1]. Advanced spatial and wavelet demosaicing methods exploit inter-channel correla- tion [2–5]. Denoising methods are employed to compensate the noise present in images. Most of the spatial denoising methods use statistical properties of the image [6,7]. Wavelet-based denois- ing methods exploit the ability of the DWT to decompose an image into frequency subbands and exploits statistical properties of wavelet subbands [8–11]. Contrast correction methods based on Histogram Equalization (HE) [12] have been proved to give good results on unevenly illuminated scenes. Advanced implementa- tions of the HE method are Regional Histogram Equalization (RHE) [13] or Block-based Histogram Equalization (BHE) [14]. RHE on approximation wavelet coefficients (RWHE) [15] is also employed in the literature. High level algorithms such as image compression can be implemented within the processing pipeline or out of the sensor as pedestrian detection or 3D reconstruction methods require a high amount of memory as well as high compu- tational capacities. Object detection and recognition is another example of high level algorithms and object classification is stud- ied in this work. General K-nn object classification [16] or more specific Eigenfaces [17,18] algorithm are among the simplest clas- sification methods and can be implemented on general embedded processor. Some hardware implementations of these algorithms can be found in the literature. In [19], a dedicated architecture performs bilinear demosaicing at high resolution while the architecture presented in [20] performs demosaicing operation based on inter-channel correlation. Dedicated denoising architectures, mostly done in the wavelet domain, can be found in [21,22]. These architectures are designed for a single type of operation. They offer high performance but low flexibility and require a high quantity of memory. Hardware architectures dedicated to Eigenfaces face 1383-7621/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sysarc.2013.07.007 Corresponding author. Tel.: +33 169082746. E-mail addresses: [email protected] (S. Chevobbe), firstname.name@- cea.fr (M. Darouich), [email protected] (M. Paindavoine). Journal of Systems Architecture 59 (2013) 826–832 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc

Upload: michel

Post on 30-Dec-2016

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Use of wavelet for image processing in smart cameras with low hardware resources

Journal of Systems Architecture 59 (2013) 826–832

Contents lists available at ScienceDirect

Journal of Systems Architecture

journal homepage: www.elsevier .com/locate /sysarc

Use of wavelet for image processing in smart cameras with lowhardware resources

1383-7621/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.sysarc.2013.07.007

⇑ Corresponding author. Tel.: +33 169082746.E-mail addresses: [email protected] (S. Chevobbe), firstname.name@-

cea.fr (M. Darouich), [email protected] (M. Paindavoine).

Sébastien Courroux a, Stéphane Chevobbe a,⇑, Mehdi Darouich a, Michel Paindavoine b

a CEA, LIST, CEA Saclay Nano-INNOV, PC 172, F-91191 Gif-sur-Yvette, Franceb LEAD Laboratory, Université de Bourgogne, 21000 Dijon, France

a r t i c l e i n f o

Article history:Available online 7 August 2013

Keywords:WaveletDWTDemosaicingDenoisingRecognitionEmbedded systems

a b s t r a c t

Images from embedded sensors need digital processing to recover high-quality images and to extract fea-tures of a scene. Depending on the properties of the sensor and on the application, the designer fitstogether different algorithms to process images. In the context of embedded devices, the hardware sup-porting those applications is very constrained in terms of power consumption and silicon area. Thus, thealgorithms have to be compliant with the embedded specifications i.e. reduced computational complex-ity and low memory requirements. We investigate the opportunity to use the wavelet representation toperform good quality image processing algorithms at a lower computational complexity than using thespatial representation. To reproduce such conditions, demosaicing, denoising, contrast correction andclassification algorithms are executed over several well known embedded cores (Leon3, Cortex A9 andDSP C6x). Wavelet-based image reconstruction shows higher image quality and lower computationalcomplexity (3x) than usual spatial reconstruction. The use of wavelet decomposition also permits toincrease the recognition rate of faces while decreasing computational complexity by a factor 25.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

Images from embedded sensors are often noised and spatiallydistributed over the three primary color channels. Digital process-ing is mandatory to recover full-resolution and high-qualityimages or to extract features from a scene. Depending on the prop-erties of the sensor and on the application, the designer uses somedigital processing algorithms to build a complete processing pipe-line, dedicated to the targeted application. Preprocessing opera-tions such as vignetting or fixed noise reduction are executed inthe analog domain, prior to the ADC operation. Reconstructionand enhancement operations such as demosaicing, non-fixed noisereduction or gamma correction are usually part of the processingpipeline. Demosaicing operation recovers full resolution colorimages from a subsampled Color Filter Array (CFA) image. Basicmethods interpolate each channel separately [1]. Advanced spatialand wavelet demosaicing methods exploit inter-channel correla-tion [2–5]. Denoising methods are employed to compensate thenoise present in images. Most of the spatial denoising methodsuse statistical properties of the image [6,7]. Wavelet-based denois-ing methods exploit the ability of the DWT to decompose an imageinto frequency subbands and exploits statistical properties of

wavelet subbands [8–11]. Contrast correction methods based onHistogram Equalization (HE) [12] have been proved to give goodresults on unevenly illuminated scenes. Advanced implementa-tions of the HE method are Regional Histogram Equalization(RHE) [13] or Block-based Histogram Equalization (BHE) [14].RHE on approximation wavelet coefficients (RWHE) [15] is alsoemployed in the literature. High level algorithms such as imagecompression can be implemented within the processing pipelineor out of the sensor as pedestrian detection or 3D reconstructionmethods require a high amount of memory as well as high compu-tational capacities. Object detection and recognition is anotherexample of high level algorithms and object classification is stud-ied in this work. General K-nn object classification [16] or morespecific Eigenfaces [17,18] algorithm are among the simplest clas-sification methods and can be implemented on general embeddedprocessor.

Some hardware implementations of these algorithms can befound in the literature. In [19], a dedicated architecture performsbilinear demosaicing at high resolution while the architecturepresented in [20] performs demosaicing operation based oninter-channel correlation. Dedicated denoising architectures,mostly done in the wavelet domain, can be found in [21,22]. Thesearchitectures are designed for a single type of operation. They offerhigh performance but low flexibility and require a high quantity ofmemory. Hardware architectures dedicated to Eigenfaces face

Page 2: Use of wavelet for image processing in smart cameras with low hardware resources

S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832 827

recognition ([23,24]) also require a high quantity of memory thushave limited learning database. Some architectures use the waveletrepresentation to classify faces ([25–27]). They employ DWT archi-tectures which require a high quantity of memory and floatingoperations.

In the context of embedded devices, the hardware that supportsthose applications is very constrained in terms of power consump-tion and silicon area. Thus, the algorithms have to be compliantwith the embedded specifications i.e. reduced computational com-plexity and low memory requirements (for instance, the Icyflexcore [28]: 50 MHz, 110 KGates core in 180 nm TSMC technology,128 KBytes of data/program memory). Consequently, some newmethods are studied to reduce computational complexity of thedifferent algorithms while keeping as much quality as possible.Compared to the spatial domain where raw data are processed,alternative domains allows to extract ’’features’’ from an image.For example, the complexity of the denoising operation is reducedwhen it is computed in frequency domain. The wavelet representa-tion is useful to extract information from a signal as it splits animage into several frequency, orientation and scale subbands. Thispaper investigates the opportunity to use the wavelet representa-tion instead of the spatial representation in image processing toget good quality algorithms at low computational complexity.The comparison methodology is presented in Section 2. Qualityand computational complexity analysis are presented in Section 3and 4 respectively. Both criteria are discussed in Section 5 and thepaper ends with a brief conclusion.

2. Comparison methodology

Rendering application needs RGB image while for scene analy-sis, a luminance image is sufficient. Thus, it makes no sense todesign a single processing pipeline to handle various applications.Consequently, depending on the application, different spatial andwavelet-based processing pipelines are considered. Both objectivequality and computational complexity of the processing pipelinesare evaluated in order to compare the use of spatial and waveletdomain. The two studied applications are CFA image reconstruc-tion at the output of the sensor and face recognition.

2.1. Reconstruction of CFA images

For reconstruction purpose, a demosaicing algorithm is manda-tory to recover the three color channels. A denoising algorithm isadded to the previous process to reduce the noise corruption dueto the poor quality of the image sensor used in embedded devices.The case of ‘‘Salt & Pepper’’ noise is not considered in this study.Consequently, we focus on Gaussian noise image corruption. Twospatial and two wavelet reconstruction and enhancement process-ing chains with different complexity/quality trade-off are consid-ered for the first application. Spatial1 and Wavelet1 representlow-quality processing while Spatial2 and Wavelet2 are standingfor mid-quality processing chains. Table 1 presents the four pro-cessing pipelines and the algorithms used at each step of the twostages. As the study focuses on the use of both spatial and waveletrepresentation in real time, in the embedded domain, high quality

Table 1Spatial and Wavelet-based reconstruction processing pipelines, composed of a demosaicin

Domain Quality Name

Spatial Low Spatial1Middle Spatial2

Wavelet Low Wavelet1Middle Wavelet2

algorithms of the state of the art are not considered since they takea few tens of seconds to be executed on x86 processors.

The Laplacian second-order color correction algorithm [2] is atwo-step algorithm. Green interpolation is based on the evaluationof the linear combination of gradient on green samples and Lapla-cian differential operator on red and blue pixels, in both verticaland horizontal directions. Red and blue samples are interpolatedusing the same process and based on the previously computedgreen samples and the original red and blue channels, respectively.The AHD algorithm [3] recovers the full-resolution color imagebased on horizontal and vertical interpolations combined to a localhomogeneity criterion. Three main steps can be extracted from thealgorithm. First, frequency-based filters are designed to computehorizontal and vertical color images. The CIELAB color space is usedas a distance metric measuring the similarity between pixels toestablish a homogeneity map. The horizontal and vertical imagesare combined depending on the local value of the homogeneitymaps. Finally, an artifact reduction algorithm is applied to enhancethe final full-resolution images. Concerning the spatial denoisingalgorithm, for each sample, the Wiener filter [6] computes mean(l) and variance (r) in a given N � N neighborhood, the samplebeing its center. In the following, a 3 � 3 neighborhood is consid-ered to keep as much detail as possible in the denoised image andto reduce computational complexity. The denoised sample is basedon the mean value of the neighborhood and refined by a factordepending on theoretical and estimated noise standard deviation.

The wavelet-based demosaicing algorithm used in this study [5]exploits inter-channel correlation to reconstruct the three chan-nels. It can be roughly divided into two steps after an initial bilin-ear and/or edge-directed interpolation. The green component isupdated based on detail wavelet coefficients of red and blue chan-nels. Once the green channel has been updated, its detail waveletcoefficients are used to refine red and blue initial interpolation.The two wavelet-based denoising algorithms studied are basedon wavelet detail coefficients threshold. The first algorithm [8] isnon adaptive and estimates the global noise variance. In waveletrepresentation, this can easily be done using HH coefficients corre-sponding to the highest scale (e.g., HH1) as done in (1).

r ¼ medianðjWijjÞK

; Wij 2 HH1; K ¼ 0:6745 ð1Þ

The second wavelet-based denoising algorithm [9] performssoft thresholding with a subband-adaptive threshold. The varianceof each subband of the noisy image can be found using

r2y ¼

1N2

s

XN

i;j¼1

W2ij; Ns : size of the subband ð2Þ

Thus, it is possible to estimate the threshold that minimizes theBayesian risk as follows:

Tb ¼r2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

maxððr2y�r2Þ;0Þ

p ifr2y > r2

1 otherwise

8<: ð3Þ

2.2. Face recognition

Eigenfaces face classifier is used to achieve face recognitionboth in spatial and wavelet domains. In the latter case, only

g and a denoising algorithm

Stage 1 Stage 2

Hamilton et al. [2] Wiener et al. [6]Hirakawa et al. [3] Wiener et al. [6]

Courroux et al. [5] Donoho et al. [8]Courroux et al. [5] Chang et al. [9]

Page 3: Use of wavelet for image processing in smart cameras with low hardware resources

Fig. 1. Example of RGB images from Kodak database. Images are 768x512 pixels. Images are mosaiced according to the Bayer pattern and artificially noised.

Table 2Face recognition processing chains. Classification is processed on approximationcoefficients in wavelet processing chains while it is processes on full resolution orsubsampled images in the spatial case (N*M: image resolution, S: subsampling factor,K: number of wavelet decomposition levels)

Domain Subsampling Preprocessing Eigenfaces classification

or DWT

Spatial – – on ðN �MÞ pixels# S - on ðN �MÞ=2S pixels– RHE on ðN �MÞ pixels# S RHE on ðN �MÞ=2S pixels

Wavelet DWT K levels - on ðN �MÞ=22K

DWT K levels RWHE on ðN �MÞ=22K

828 S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832

approximation coefficients are classified instead of the whole im-age. Unevenly face illumination highly degrades recognition rate.Consequently, a contrast correction algorithm is applied prior tothe recognition task. These chains are composed of a preprocessingpart and Eigenfaces face classifier. Preprocessing operations canmodify image resolution, the contrast or both. Naïve spatial ap-proach is to classify the whole image, without any preprocessing.For wavelet-based processing pipelines, 2D DWT is applied beforeany operation. Detail coefficients are discarded to reduce thevector size and consequently the computational complexity ofthe following operations, resulting in a lower resolution approxi-mation face image [15], its size depending on the number ofdecomposition levels. Regional Wavelet Histogram Equalization(RWHE) can then be applied on the low resolution image, beforethe classification stage. To reproduce the subsampling effect ofthe 2D DWT, downsampling operation is also applied on the spatialimage, at several scales, to reduce the vector length at the input ofthe classifier. The result of Eigenfaces classification depends on theEuclidian distance between pixels or a set of pixels of the test im-age and the different entries of the database.

2.3. Objective quality criteria

Concerning the CFA image reconstruction, the 24 standard colorimages from Kodak1 database are artificially mosaiced and noised toreproduce embedded sensor conditions. A sample of original imagesis available at Fig. 1. Original and processed images through demosa-icing and denoising processing pipelines are compared with anobjective index to qualitatively estimate the ability of these process-ing chains to produce high quality images. Common Color-PSNR in-dex is not able to detect local changes between two images. Instead,Mean Structural Similarity Index Measure (MSSIM, [29]) is used anobjective quality index. The MSSIM index permits to detect lumi-nance, contrast and structure changes in images (see Table 2).

Concerning the face recognition application, unevenly illumi-nated cropped faces form the YaleB database [30] are used. A sam-ple of these face images is presented at Fig. 2. Images are processedthrough a processing pipeline composed of a contrast correction

1 http://r0k.us/graphics/kodak/.

algorithm followed by a face classifier. The corresponding recogni-tion rates permits to evaluate the interest of each representation.The train database is composed of a single pose, well illuminated,per subject. The test database contains 64 poses per subject, gath-ered into 5 sets. Set1 and Set2 are standing for well illuminatedfaces while Set3, Set4 and Set5 represent badly illuminated faces.In our experiment, we only considered the test images from set 1to 3.

2.4. Computational complexity criteria

The algorithms of the study are all described in C language and allthe data are coded in integer variables (no need of a Floating PointUnit). Four different targets are used to estimate the computationalcomplexity of the different processing algorithms and to assess theirability to exploit such algorithms in real time. For each target, thesame instance is used to perform the tests. The first processor isAntX [31], a small footprint processor (8 KGates in 40 nm TSMC)developed by the Embedded Computing Laboratory at CEA LIST. Itis a 32-bit architecture designed specifically to be used as a low-costcontrol core. Consequently, it has no hardware multiplier or branchprediction mechanism. We also consider the GAISLER LEON3 soft-core processor [32]. The LEON3 is a multipurpose 7 stages 32 bitsRISC processor. It can handle hardware multiplications and software

Page 4: Use of wavelet for image processing in smart cameras with low hardware resources

Fig. 2. Example of face images. Left image is part of the learning database. Other are test images, from Set1 on the left to Set3 on the right.

Table 3

S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832 829

divisions. The third processor of the study is a DSP-like processorbased on the C6x instruction set. Consequently, it offers some opti-mizations for image processing operations and can exploit 8 ways inparallel including two which handle hardware multiplications. Fi-nally, we also used the ARM CortexA9 processor to estimate compu-tational complexity of the selected algorithms. The CortexA9 is amultipurpose processor which uses the ARM v7 instruction set. Itoffers single clock MAC operations as well as a DSP/SIMD extensionfor audio and video processing, and optimal cache management. Itcan operate from 400 MHz to 2 GHz while its core consumes0.4 W and occupies 1.5 mm2 in TSMC 65 nm technology. To allowperformance comparison, the operating frequency of each processoris set at 400 MHz, which is the maximum operating frequency of theslowest processor, the AntX. C codes are compiled using dedicatedcompilation tool chain and ‘‘-o2’’ optimization.

3. Quality comparison study

In this section, the objective quality of the different processingchains is compared. MSSIM is used to compare original and pro-cessed images of the first application. The recognition rate result-ing from Eigenfaces classifier is used to compare spatial andwavelet pre-processing pipelines in the second application.

3.1. 1st application: demosaicing and denoising of CFA images

Fig. 3 presents the objective quality results of the four consid-ered processing chains. Three different noisy conditions are estab-lished from this figure. When the noise is really low, (r 6 2), bothwavelet chains outperform spatial processing chains. However,when the noise increase (2 < r 6 10), the quality of Wavelet1highly decrease while Wavelet2 produces the best quality images.In high noisy conditions (r > 10), one would choose Spatial2 pro-cessing chain. Image quality of Spatial1 and Wavelet2 are stillacceptable.

3.2. 2nd application: recognition of unevenly illuminated faces

Table 3 presents recognition rate results of the Eigenfaces clas-sification method. Several observations can be made. First, in thespatial domain, spatially subsampled images by a factor 2 (# 2)

Fig. 3. MSSIM index for different spatial and wavelet processing chains, r 2 0; 20.

result in better recognition rates than naïve approach and a reduc-tion of the vector length at a quarter of its original size. This obser-vation is right for the 3 different conditions of illumination. Betterrecognition rates are also available when the RHE contrast correc-tion algorithm is applied, even for well illuminated images, butwithout vector size reduction. A combination of the twopreprocessing methods gives the best spatial results. In bad illumi-nation conditions, results are improved from 50% to 81% of suc-cessful identifications. However, the recognition rate drops if ahigher subsampling is applied (with or without RHE). Severalwavelet decomposition levels are also applied on test images.One decomposition level reduces the images size to 25% of itsoriginal size and improves recognition rates. A significant improve-ment is observed when RHE is applied on wavelet approximationimages with a single level transform. Three levels wavelet decom-position offers a good recognition rate comparable to the bestresults obtained in spatial domain, both in good and bad illumina-tion conditions. The best results can be observed when applyingthree level wavelet decomposition and RHE preprocessing on facesimages. Compared to optimal spatial case and naïve approach, 4%and 34% performance improvement are made respectively. Morer-over, the input vector is reduced by ratio of 64, compared to thereduction by 4 for the optimal spatial case.

4. Computational complexity study

In this part, computational complexities of the differentprocessing chains of the two applications are compared. Results,in cycles per pixel are made available from executions on differentprocessors: LEON3, CortexA9, DSP-like and AntX. An analysis of thememory needs is also performed for the second application.

4.1. 1st application: demosaicing and denoising of CFA images

The computational complexity of the four different reconstruc-tion and enhancement digital pipelines is presented in Fig. 4.Results are available for the four processors of the study. Cursorsindicate whether or not the real time requirements are met for dif-ferent image resolutions (25fps, 400 MHz). First observation is

Recognition rate comparison between spatial and wavelet recognition pipelines usingthe YaleB face database and the Eigenfaces classifier. Test images are gathered into 3groups, from Set1 (good illumination) to Set3 (bad illumination).

Domain Subsampling Preproc. %Imgi Set1 Set2 Set3

or DWT

Spatial – – 100 96 80 50# 2 – 25 99 82 58# 8 – 1.56 98 79 50– RHE 100 96 92 74# 2 RHE 25 99 93 81# 8 RHE 1.56 98 91 76

Wavelet DWT (K = 1) – 25 99 83 61DWT (K = 3) - 1.56 99 89 65DWT (K = 1) RWHE 25 99 94 84DWT (K = 3) RWHE 1.56 99 94 84

Page 5: Use of wavelet for image processing in smart cameras with low hardware resources

Fig. 4. Computational complexity of the different demosaicing and denoising processing chains executed over AntX, LEON3, CortexA9 and the DSP-like processor. Cursorsindicate the real-time requirements for different image resolutions at 400 MHz, 25 fps.

830 S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832

that, excepted the CortexA9, wavelet-based algorithms requiremuch less cycles to execute than spatial-based processing chains(2� to 7� for low quality chains, 5� to 7� for mid-quality process-ing chains). AntX, which is a control-oriented processor, can pro-cess Wavelet1 in real time for images in QVGA but can not meetthe real time requirements for other processing pipelines or reso-lutions. Spatial based pipelines require a high number of cyclesto execute on AntX since it takes more than 50 operations to emu-late the multiplication. The DSP-like processor is the only onewhich can process VGA images using Spatial1 pipeline. However,it is not possible to meet the real time requirements on any reso-lution using spatial based processing pipelines, even if the multi-plication operation takes only 2 cycles to execute. The CortexA9is the only processor able to use spatial pipeline in QVGA resolu-tion, in real time. It is also the only case where Spatial1 runs fasterthan others processing pipelines. A single cycle MAC operation,optimized division operation as well as SIMD extension explainthis result. However, the frequency of the CortexA9 has been setto 400 MHz during the execution. This processor can consequentlyhandle much larger images in real-time when running at higherfrequencies (up to 2 GHz). Executions on LEON3 also demonstratethan both wavelet-based chains require less operations per pixelthan spatial-based ones. Indeed, they achieve denoised and fullresolution images more than twice as fast as the low complexityspatial method. Moreover, they are about six times faster thanthe mid-complexity spatial chain.

4.2. 2nd application: recognition of unevenly illuminated faces

The computational complexity over three processors of naïve,optimal spatial and optimal wavelet recognition processing chainsare presented in Fig. 5. Real-time cursors are not available in thisfigure because real-time is not a mandatory requirement in these

Fig. 5. Computational complexity of the different recognition system

kind of systems. The classification step requires one multiplicationper cycle and per individual in the learning database. Results onAntX are discarded since several thousands of cycles are requiredusing this processor. Executions on general purpose processors(LEON3 and CortexA9) show that wavelet-based method requiresmuch less cycles than both naïve (15�) and optimal (4�) spatialprocessing pipelines. Same trends are observed for these two pro-cessors since spatial and wavelet processing chains require thesame number of multiplication, operations well optimized on theCortexA9. Indeed, the length of the input vector has been reducedto 1.56% of the original image size, while it is 100% in the naïve caseand 25% in optimal spatial case. Even with a larger vector size, spa-tial processing chains requires less cycles to execute on the DSP-like processor than wavelet processing chain. Concerning the twospatial chains, the compiler is able to provide a high instruction le-vel parallelism (ILP), which increases performances. The ILP of theclassification step is about 7:8 and 2:8 for RHE algorithm. Conse-quently, when RHE and classification occupy respectively 9 and91% of the total execution time on LEON3, these operations repre-sent respectively 40 and 60 % of the total execution time on Cortex-A9. As the ILP of the 2D DWT operation is about 1.6, wavelet-basedrecognition chains does not take advantage of the speedup due tothe instruction-level parallelism of the DSP-like processor. The vec-tor size reduction has a non-negligible impact on the memory foot-print for the database storage. Indeed, for the optimal spatial caseand naïve approach the memory needed is, respectively 600 KBytesand 2.5 MBytes, while it is only 40 KBytes for the wavelet case with3 levels of decomposition.

5. Conclusion

Fig. 6 represents the quality versus complexity space of the firstapplication. Rectangular areas define the minimum and maximum

s, executed over LEON3, CortexA9 and the DSP-like processor.

Page 6: Use of wavelet for image processing in smart cameras with low hardware resources

Fig. 6. Quality versus computational complexity space of the first application. Ideal processing chain wich high quality and low complexity would be places at top left of thefigure. Spatial implementations are highly dependant of the target but offers good image quality when noise level is high. Wavelet implementations are quite insensitive tothe target.

Fig. 7. Quality versus computational complexity space of the second application. Ideal processing chain with high quality and low complexity would be places at top left ofthe figure. Wavelet-based processing chain has high recognition rate and requires a low numbers of cycles, whatever the processor is.

S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832 831

values of each implementation. The width of each rectangle de-pends of the execution results on the four processors. The heightof each rectangle depends of the quality of each processing chain.In the first application, top edge of the rectangle represent imagequality in low noise conditions while bottom edge represent imagequality in high noise conditions. Concerning the second applica-tion, top edge of the rectangle represent recognition results in goodillumination conditions while bottom edge represent recognitionresults in bad illumination conditions. Ideal processing with higherobjective quality and lower computational complexity would beplaced at the top left of each figure.

The figure shows that wavelet-based processing chains offersthe best trade-off between image quality and computational com-plexity as they produce high quality images at a reasonable com-putational complexity cost, especially in low noise conditions.Wavelet-based pipelines is less dependent on the processor sincethey only use simple operations but are more sensitive to noise,especially for Wavelet1. Spatial-based algorithms require multiplemultiplications and divisions per pixel. When these operationsare well optimized (CortexA9), it is possible to process images fas-ter using the spatial domain. In the opposite case, they requiremuch more cycles to process.

Quality versus complexity space of the second application ispresented in Fig. 7. When illumination conditions are good, bothspatial and wavelet chains achieve the same quality but computa-tional complexity of spatial chain is highly dependant on the pro-cessor. When a DSP-like processor is used and VLIW capabilitiesare well exploited, optimal spatial case is a good solution. Other-wise, wavelet-based processing chains offer the best trade-off since

they have good recognition rate, especially when the illuminationis bad, with a low number of required cycles, whatever the tar-geted processor is.

Reconstruction and enhancement of a CFA image as well as therecognition of badly illuminated faces in the context of low re-sources cameras have been addressed in this document. Low andmid-quality processing chains have been designed and executedover different kind embedded processors such as control-oriented,general and DSP-like processors. Wavelet-based processing chainsoutperform regular spatial algorithms both in terms of objectivequality and computational complexity on low footprint generalembedded processors as well as on DSP-like processors. However,in some special cases that have been highlighted, it is possible to pro-cess regular spatial-based processing chains faster but at lower qual-ity than wavelet-based processing chains. Consequently, the use ofthe wavelet representation can help the designer to fit the require-ments of the embedded domain. Future work aims to decrease thecost of the wavelet transform, one of the most time-consumingoperation in the wavelet processing pipeline, to meet the real timerequirements. This can be done using a hardware accelerator dedi-cated to this operation. Power consumption is also an important as-pect of embedded systems. Wavelet transform does a high quantityof memory access, having a high impact on power consumption.Consequently, this topic will be addressed in future work.

References

[1] D. Cok, Signal processing method and apparatus for producing interpolatedchrominance values in a sampled color image signal, US Patent, 1987.

Page 7: Use of wavelet for image processing in smart cameras with low hardware resources

832 S. Courroux et al. / Journal of Systems Architecture 59 (2013) 826–832

[2] J. Hamilton, J. Adams, Adaptive color plan interpolation in single sensor colorelectronic camera, US Patent 5629734 to Eastman Kodak Company, Patent andTrademark Office, 1997.

[3] K. Hirakawa, T.W. Parks, Adaptive homogeneity-directed demosaicingalgorithm, IEEE Trans. Image Process. 14 (2005) 360–369.

[4] B.K. Gunturk, Y. Altunbasak, R.M. Mersereau, Color plane interpolation usingalternating projections, IEEE Trans. Image Process. 11 (2002) 997–1013.

[5] S. Courroux, S. Guyetant, S. Chevobbe, M. Paindavoine, A wavelet-baseddemosaicking algorithm for embedded applications, in: Conference on Designand Architectures for Signal and Image Processing (2010).

[6] N. Wiener, Extrapolation, interpolation, and smoothing of stationary timeseries with engineering applications, Bull. Am. Math. Soc 2 (1950) 09416-6–6.

[7] A. Buades, B. Coll, J.-M. Morel, A non-local algorithm for image denoising, in:IEEE Proc. Comp. Soc. Conf. on Computer Vision and, Pattern Recognition, vol.2, pp. 60–65.

[8] D. Donoho, I. Johnstone, I.M. Johnstone, Ideal spatial adaptation by waveletshrinkage, Biometrika 81 (1993) 425–455.

[9] S. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoisingand compression, IEEE Transactions on Image Processing 9 (2000) 1532–1546.

[10] J. Portilla, V. Strela, M. Wainwright, E. Simoncelli, Image denoising using scalemixtures of gaussians in the wavelet domain, IEEE Trans. Image Process. 12(2003) 1338–1351.

[11] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3-Dtransform-domain collaborative filtering, IEEE Trans. Image Process. 16 (2007)2080–2095.

[12] A.K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Inc., UpperSaddle River, NJ, USA, 1989.

[13] S. Shan, W. Gao, B. Cao, D. Zhao, Illumination normalization for robust facerecognition against varying lighting conditions, in: IEEE InternationalWorkshop on Analysis and Modeling of Faces and Gestures, AMFG 2003,IEEE, 2003, pp. 157–164.

[14] X. Xie, W. Zheng, J. Lai, P. Yuen, Face illumination normalization on large andsmall scale features, Pattern Recognit. 1 (2008) 1–8.

[15] S. Du, R. Ward, Adaptive region-based image enhancement method for robustface recognition under variable illumination conditions, IEEE Trans. CircuitsSyst. Video Technol. 20 (2010) 1165–1175.

[16] D. Li, K. Wong, Y. Hu, A. Sayeed, Detection, classification, and tracking oftargets, IEEE Signal Process. Mag. 19 (2002) 17–29.

[17] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognitionusing class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19(1997) 711–720.

[18] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci. 3 (1991)71–86.

[19] J. Garcia-Lamont, M. Aleman-Arce, J. Waissman-Vilanova, A digital real timeimage demosaicking implementation for high definition video cameras, in:Electronics, Robotics and Automotive Mechanics Conference, CERMA’08, IEEE,2008, pp. 565–569.

[20] S. Hsia, M. Chen, P. Tsai, VLSI implementation of low-power high-quality colorinterpolation processor for ccd camera, IEEE Trans. Very Large ScaleIntegration (VLSI) Syst. 14 (2006) 361–369.

[21] M. Katona, A. Pizurica, N. Teslic, V. Kovacevic, W. Philips, A real-time wavelet-domain video denoising implementation in fpga, EURASIP J. Embedded Syst.2006 (2006). 6–6.

[22] J. Joshi, N. Nabar, R. Adyanthaya, P. Batra, An efficient pipelined architectureformation multilevel wavelet based image denoising, in: IET InternationalConference on, Visual Information Engineering, VIE 2006, pp. 351–355, 2006.

[23] A. Pavan Kumar, V. Kamakoti, S. Das, System-on-programmable-chipimplementation for on-line face recognition, Pattern Recognit. Lett. 28(2007) 342–349.

[24] R. Gottumukkal, H. Ngo, V. Asari, Multi-lane architecture for eigenface basedreal-time face recognition, Microprocessors Microsyst. 30 (2006) 216–224.

[25] Y. Yeh, H. Li, W. Hwang, C. Fang, Fpga implementation of k nn classifier basedon wavelet transform and partial distance search, Image Anal. (2007) 512–521.

[26] A. Jammoussi, S. Ghribi, D. Masmoudi, Implementation of face recognitionsystem in virtex ii pro platform, in: 3rd International Conference on Signals,Circuits and Systems (SCS), 2009, pp. 1–6.

[27] N. Shams, I. Hosseini, M. Sadri, E. Azarnasab, Low cost fpga-based highlyaccurate face recognition system using combined wavelets with subspacemethods, in: IEEE International Conference on, Image Processing, 2006, pp.2077–2080.

[28] C. Arm, S. Gyger, J. Masgonty, M. Morgan, J. Nagel, C. Piguet, F. Rampogna, P.Volet, Low-power 32-bit dual-MAC 120 lW/MHz 1.0 v icyflex1 DSP/MCU core,J. Solid-State Circuits 44 (2009) 2055–2064.

[29] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment:From error visibility to structural similarity, IEEE Trans. Image Process. 13(2004) 600–612.

[30] A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illuminationcone models for face recognition under variable lighting and pose, IEEE Trans.Pattern Anal. Mach. Intell. 23 (2001) 643–660.

[31] C. Bechara, A. Berhault, N. Ventroux, S. Chevobbe, Y. Lhuillier, R. David, D.Etiemble, A small footprint interleaved multithreaded processor for embeddedsystems, in: 18th IEEE International Conference on Electronics, Circuits andSystems (ICECS), IEEE, 2011, pp. 685–690.

[32] A.-G. Research, Leon3 processor user’s manual, 2004.

Sebastien Courroux is a PhD Student at CEA LIST. Hiswork focuses on embedded architectures for real-timevision applications. He has a Master’s Degree, in elec-trical engineering from Polytech Nice-Sophia. During hisPhD thesis, he studied the opportunity to use alternativerepresentations such as the wavelet domain to handleimage processing in embedded vision architectures.

Dr. Stéphane Chevobbe is an expert and a researchengineer of the CEA LIST institute, in the domain ofembedded computing architecture. He has a Master’sDegree, in electrical engineering from INSA of Lyon anda PhD in microelectronic and signal processing fromUniversity of Rennes 1, for the design of a dynamicreconfigurable in asynchronous technology. From 2006to 2009, he participated to several national and Euro-pean research projects that lead to the realizations ofASIC and reconfigurable architectures for embeddedsystems. Since 2009, he has participated to the design ofcomputing architectures (reconfigurable, programma-

ble, and dedicated) in the domain of embedded vision for applications going fromimage reconstruction (image enhancement, denoising, demosaicing, filtering, ...) toimage analysis (human detection, movement detection, key point extraction). His

research interests include reconfigurable, programmable and dedicated embeddedarchitecture and embedded architecture for image processing.

Mehdi Darouich is a research engineer at CEA LIST. Hiswork focuses on embedded architectures for real-timevision applications. During his PhD thesis, he studiedthe opportunity to use embedded reconfigurable archi-tecture for real-time stereovision in advanced driverassistance systems.

Michel Paindavoine received his PhD in electronics andsignal processing from Montpellier University, France,in 1982. He was with Fairchild CCD Company for twoyears as an engineer specializing in CCD sensors. Hejoined Burgundy University in 1985 as maitre de con-férence and is currently full professor at LE2I UMR-CNRS, Laboratory of Electronic, Computing and ImagingSciences, Burgundy University, France. His mainresearch topics are image acquisition and real-timeimage processing. He is also one of the main managersof ISIS (a research group in signal and image processingof the French National Scientific Research Committee).