[ieee 2006 ieee international symposium on signal processing and information technology - vancouver,...

4
Adaptive Video Motion Estimation Algorithm via Estimation of Motion Length Distribution and Bayesian Classification Mahdi Asefi and Mohamed-Yahia Dabbagh Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Email: masefi@uwaterloo.ca, [email protected] Abstract— Real videos contain mixture of motions with slow and fast contents. No fixed fast block matching algorithm can efficiently remove temporal redundancy of video sequences with wide motion contents. In this paper, an adaptive fast block match- ing algorithm, called classification based adaptive search (CBAS) has been proposed. A Bayes classifier is applied to classify the motions into slow and fast categories. Accordingly, appropriate search strategy is applied for each class. The algorithm switches between different search patterns according to the content of motions within video frames. Experimental results show the proposed technique outperforms conventional standalone fast block matching methods in terms of both peak signal to noise ratio (PSNR) and computational complexity. I. I NTRODUCTION Block matching is widely used for stereo vision, vision tracking, and video compression. Video coding standards such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264 use block based motion estimation algorithms due to their effectiveness and simplicity for hardware implementation. The main idea behind block matching estimation is partitioning the target (predicted) frame into square blocks of pixels, and finding the best match for these blocks in a current (anchor) frame. To find the best match, a search inside a previously coded frame is performed and the matching criterion is utilized on the candidate matching blocks. The displacement between the block in the predictor frame and the best match in current frame defines a motion vector. In the encoder, it is only necessary to send the motion vector and a residue block, defined as the difference between the current block and the predictor. The matching criterion is typically mean absolute differ- ences (MAD) or mean square differences(MSD), given respec- tively by: MAD = 1 N 2 N-1 X i=0 N-1 X j=0 |C ij - R ij | (1) MSD = 1 N 2 N-1 X i=0 N-1 X j=0 (C ij - R ij ) 2 (2) Where N × N is the size of the block, C ij and R ij are respectively the pixel values in current and reference blocks. Peak signal to noise ratio (PSNR) characterizes performance of the motion estimation algorithm. PSNR = 10 log 10 (peak to peak value of the original signal) 2 MSD = 10 log 10 255 × 255 MSD (3) Computational complexity is prohibitive for exhaustive search algorithm [1]. To reduce the computational require- ments, several fast block-matching algorithms (FBMAs) have been developed in past years, including the three-step search (TSS) [1], new three-step search (NTSS) [2], four-step search (FSS) [3], and diamond search (DS) [4]. In real video se- quences, various motions with variety of contents, are com- bined. The performance of FBMAs depends strongly on mo- tion contents. For example, TSS is suitable for blocks with rapid motion content, while for moderate and slow block motions (quasi-stationary and stationary) FSS and DS perform better, and the NTSS can handle moderate motions [2]. Hence, rather than fixed FBMA, an intelligent video encoder should be able to switch between search patterns according to the content of motions. An example of this work is an adaptive fast block matching search proposed in [5] which develops an adaptive scheme called A-TDB according to the characteristics of the predicted profit list. The A-TDB dynamically employs search patterns among the TSS, DS, and block based gradient descent search (BBGDS) to remove temporal redundancy of sequences with slow, moderate and fast motions. In this paper, we propose a new adaptive method based on bayesian classification technique which classifies predicted motion of each block within image either in slow or fast category. Slow category includes motions with both moderate and small lengths since same searching method has been applied for prediction of small and moderate motion lengths. In order to apply bayesian classifier, conditional probability distribution functions (PDFs), P (x|C slow ) and P (x|C f ast ), must be estimated where x is the length of block’s motion vector. We use Parzen window method with gaussian kernel to estimate required PDFs. After determination of motion class, appropriate search pattern is employed to find the best matching block within the frame. Experimental results on video sequences containing variety of motion contents are provided in order to demonstrate the performance of the proposed approach in comparison to standard fast block matching algorithms. 2006 IEEE International Symposium on Signal Processing and Information Technology 0-7803-9754-1/06/$20.00©2006 IEEE 807

Upload: mohamed-yahia

Post on 27-Feb-2017

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2006 IEEE International Symposium on Signal Processing and Information Technology - Vancouver, BC, Canada (2006.08.27-2006.08.30)] 2006 IEEE International Symposium on Signal

Adaptive Video Motion Estimation Algorithm via Estimation of MotionLength Distribution and Bayesian Classification

Mahdi Asefi and Mohamed-Yahia DabbaghDepartment of Electrical and Computer Engineering

University of WaterlooWaterloo, Ontario, Canada

Email: [email protected], [email protected]

Abstract— Real videos contain mixture of motions with slowand fast contents. No fixed fast block matching algorithm canefficiently remove temporal redundancy of video sequences withwide motion contents. In this paper, an adaptive fast block match-ing algorithm, called classification based adaptive search (CBAS)has been proposed. A Bayes classifier is applied to classify themotions into slow and fast categories. Accordingly, appropriatesearch strategy is applied for each class. The algorithm switchesbetween different search patterns according to the content ofmotions within video frames. Experimental results show theproposed technique outperforms conventional standalone fastblock matching methods in terms of both peak signal to noiseratio (PSNR) and computational complexity.

I. INTRODUCTION

Block matching is widely used for stereo vision, visiontracking, and video compression. Video coding standards suchas MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264use block based motion estimation algorithms due to theireffectiveness and simplicity for hardware implementation. Themain idea behind block matching estimation is partitioningthe target (predicted) frame into square blocks of pixels, andfinding the best match for these blocks in a current (anchor)frame. To find the best match, a search inside a previouslycoded frame is performed and the matching criterion is utilizedon the candidate matching blocks. The displacement betweenthe block in the predictor frame and the best match in currentframe defines a motion vector. In the encoder, it is onlynecessary to send the motion vector and a residue block,defined as the difference between the current block and thepredictor.

The matching criterion is typically mean absolute differ-ences (MAD) or mean square differences(MSD), given respec-tively by:

MAD =1

N2

N−1∑

i=0

N−1∑

j=0

|Cij −Rij | (1)

MSD =1

N2

N−1∑

i=0

N−1∑

j=0

(Cij −Rij)2 (2)

Where N × N is the size of the block, Cij and Rij arerespectively the pixel values in current and reference blocks.Peak signal to noise ratio (PSNR) characterizes performanceof the motion estimation algorithm.

PSNR = 10 log10

[(peak to peak value of the original signal)2

MSD

]

= 10 log10

[255× 255

MSD

] (3)

Computational complexity is prohibitive for exhaustivesearch algorithm [1]. To reduce the computational require-ments, several fast block-matching algorithms (FBMAs) havebeen developed in past years, including the three-step search(TSS) [1], new three-step search (NTSS) [2], four-step search(FSS) [3], and diamond search (DS) [4]. In real video se-quences, various motions with variety of contents, are com-bined. The performance of FBMAs depends strongly on mo-tion contents. For example, TSS is suitable for blocks withrapid motion content, while for moderate and slow blockmotions (quasi-stationary and stationary) FSS and DS performbetter, and the NTSS can handle moderate motions [2]. Hence,rather than fixed FBMA, an intelligent video encoder shouldbe able to switch between search patterns according to thecontent of motions. An example of this work is an adaptivefast block matching search proposed in [5] which develops anadaptive scheme called A-TDB according to the characteristicsof the predicted profit list. The A-TDB dynamically employssearch patterns among the TSS, DS, and block based gradientdescent search (BBGDS) to remove temporal redundancy ofsequences with slow, moderate and fast motions.

In this paper, we propose a new adaptive method basedon bayesian classification technique which classifies predictedmotion of each block within image either in slow or fastcategory. Slow category includes motions with both moderateand small lengths since same searching method has beenapplied for prediction of small and moderate motion lengths.In order to apply bayesian classifier, conditional probabilitydistribution functions (PDFs), P (x|Cslow) and P (x|Cfast),must be estimated where x is the length of block’s motionvector. We use Parzen window method with gaussian kernelto estimate required PDFs. After determination of motionclass, appropriate search pattern is employed to find thebest matching block within the frame. Experimental resultson video sequences containing variety of motion contentsare provided in order to demonstrate the performance ofthe proposed approach in comparison to standard fast blockmatching algorithms.

2006 IEEE International Symposium on Signal Processing and Information Technology

0-7803-9754-1/06/$20.00©2006 IEEE 807

Page 2: [IEEE 2006 IEEE International Symposium on Signal Processing and Information Technology - Vancouver, BC, Canada (2006.08.27-2006.08.30)] 2006 IEEE International Symposium on Signal

The remainder of this paper is organized as follows: Insection II, the proposed adaptive method is described. Theperformance of the proposed method is demonstrated by pre-senting experimental results in section III. Finally, conclusionremarks are drawn in section IV.

II. THE PROPOSED ADAPTIVE ALGORITHM

A. Overview

We have formulated the design of adaptive scheme as a two-category classification problem. The motion length of eachmacro block is predicted from neighbor blocks. Then a Bayesclassifier is applied to label the motion as slow or fast. Finally,appropriate search pattern is applied with respect to the labelof motion to find the best matching block within the imageframe. Adaptive rood pattern, proposed in [6], is selected forlarge motion estimation and diamond pattern is selected forslow motions due to reasons given in subsection II-C.

B. Estimation and Learning

In classification problem, the Bayes classifier achieves theminimum probability of error [7]. Therefore, it is a suitableclassifier for problems with known class conditional probabil-ities, p(x|ci). If the density functions are not known apriori,it is still possible to estimate an approximation from labeledsample data. Since the functional form of class probabilitydensity functions are not known, the non-parametric estimationis applied. Different non-parametric estimation approaches areavailable: Histogram Estimation, k-Nearest-Neighbor (KNN)Estimation and Parzen windowing which employs Kernelsmoothing functions to estimate the PDF. The Parzen win-dowing method can be summarized as follows: Given a sampleX1, .., Xn with a continues, univariate density f , the Parzendensity estimation is

f̂(x, h) =1

nh

n∑

i=1

K(x−Xi

h) (4)

where K is the Kernel and h is the bandwidth. Under mildconditions (h must decrease when n increasing), the estimationconverges in probability to true probability. The histogrammethod introduces a tradeoff: for good resolution along x,small-sized regions are required. In KNN methods, the reso-lution along the PDF axis is data depended, i.e. the resolutionalong the x-axis is explicitly controlled. The principal virtueof the KNN scheme is that it avoids setting p(x) identicallyto zero in regions which happen not to have any samples,rather it results in a more realistic non-zero probability. Theprincipal drawback of the KNN method is that the estimatedPDF is highly peaked and non-normalized. In addition, KNNmethod is usually time-consuming and complex which mightbe undesirable in practical implementations with online accessand limited storage space. We have applied Parzen windowingwith gaussian smoothing function to estimate the conditionalPDF from sample labeled data. The main advantages of Parzenwindow method are its simplicity and fast implementation.

The selected feature for classification is length of motionvectors (MVs). For the first image frame, since there is nopervious data to be used for PDF estimation, the algorithmfollows a rigid thresholding approach which is comparingthe length of motions to a predefined threshold and clas-sify the vectors in two groups of slow and fast motions.After motion classification for each macroblock, appropriatesearching scheme will be employed for that macroblock. Aftercomputation of all the motion vectors and their class label,they can be utilized by parzen window method to estimateclass conditional PDFs. Starting from second frame, motionvector of each macroblock, x, within the image is predictedby motion vector of immediate left macroblock. Then usingclass conditional PDFs that are estimated from pervious frame,Baysian classifier is applied to classify x as either slow or fastand accordingly apply relevant search scheme.

Knowing the class probability density functions, p(x|Cfast)and P (x|Cslow), the bayes classifier can be expressed asfollows:

The motion is classified as fast motion, if:

P (CFast|x) > P (CSlow|x) (5)

From the bayes formula we have,

P (Ci|x) =P (x|Ci)P (Ci)

P (x)(6)

substituting (6) into (5) we obtain

P (x|CFast)P (CFast)P (x)

>P (x|CSlow)P (CSlow)

P (x)(7)

orP (x|CFast)P (CFast) > P (x|CSlow)P (CSlow) (8)

Moreover, if the probabilities of having image frames withrapid or slow motion content are assumed to be equal, theclassification criterion can be simplified as follows:

P (x|CFast) > P (x|CSlow) (9)

An important point for consideration is the appropriateselection of region of support (ROS) which is defined asneighboring blocks whose MVs are used to predict the motionvectors in the current block and the algorithm used to computethe predicted motion vectors for each class. Exhaustive exper-iments on considering different sets of immediate left, above-left, above and above-right to the current block and two typesof prediction criteria–mean and median operation on lengths ofmotions in ROS, have been performed in [6]. The experimentsshow that the results have fairly similar performance in termsof PSNR. Hence, we apply the least complex choice, i.e.,using the immediate left block for predicting the motion vectorof current block. After computing all the motion vectors ofcurrent frame, we can update the PDFs to be used for motionclassification in next frame. The procedure is repeated forsubsequent frames. This technique is able to adapt itself to thecontents of motions and establish higher performance quality

808

Page 3: [IEEE 2006 IEEE International Symposium on Signal Processing and Information Technology - Vancouver, BC, Canada (2006.08.27-2006.08.30)] 2006 IEEE International Symposium on Signal

compared to standalone fast block matching algorithms. Sim-ulation results provided in section III are provided in supportof our proposed algorithm.

C. Selection of Search Patterns

A series of experiments on standard block matching tech-niques were conducted on selected video sequences containingvariety of motion contents. The performance parameters foreach of the algorithms in each video sequences were recordedand compared to each other. The peak signal to noise ratio(PSNR) and computational complexity have been employedto evaluate performance of the algorithms on sequences withdifferent motion contents. Observation shows that for smallmotions (less than 3 blocks), the algorithms with compactlyspaced points result in more accurate approximations of mo-tion vectors. Among the tested search patterns, DS showssuperior results for sequences with small motions. Prohibitivenature of DS, i.e. prevention from being trapped into localminima, in addition to appropriate accuracy, led us to selectthis algorithm for block matching search when the motion isclassified as slow. In addition, DS is successful for predictionof motions with moderate lengths (3 to 4 blocks). As wementioned before, both small and moderate motion vectorshave been classified under Slow category and same searchingalgorithm, i.e. DS, is employed for prediction of motions inthis class.

In [6], a rood pattern with one point at center and four searchpoints located at the four vertices has been proposed. The mainstructure has a symmetrical rood shape, and its size refers todistance between vertices and the center point of the rood.The choice of rood shape is based on the observations on real-world video sequences. The MV distribution in vertical andhorizontal directions are higher than that in other directions [6]since most of the camera movements occur in these directions.The size of the rood is adaptive with respect to predicted lengthof the current block’s motion vector. The prediction of targetMV is obtained through MVs of neighbor blocks’ vectors.The flexible size of the rood prevents the search to be trappedin local minima which is of importance when searching forblocks with fast motions. The rood search will be followedby small diamond search pattern (SDSP) steps until the bestmatch occurs at the center of the pattern.

III. EXPERIMENTAL RESULTS

Simulations are based on the encoding platform underMPEG-4 test conditions where each sequence contains 100frames and has QCIF or CIF formats. For comparison, theaverage peak to peak signal to noise ratio of our classificationbased adaptive search (CBAS) has been computed for variousvideo sequences and compared to other standard block match-ing motion estimation methods including FS, DS, TSS, NTSSand FSS. Computational complexity is measured by computingaverage number of checking points per MV generation whichis also related to speed of match finding. The computationalgain is defined as the ratio of the search speed of ES to thatof our algorithm.

All comparison results for Diskus and flower garden se-quences in terms of PSNR performance and computationalcomplexity are provided in Table I. Diskus sequence, shownin Fig.2, contains a wide range of motion contents from slow tofast movement. Also CBAS algorithm was tested on sequenceswith less motion content variations such as flower gardensequence and compared to standard algorithms. The PSNRand computation gain performance results shown in Fig.2 andTable I illustrate that CBAS has better PSNR performance andless computation than other algorithms, including state-of-the-art DS algorithm.

In comparison with ES, our algorithm greatly improves thesearch speed. CBAS is almost 12.75 times faster than ESwhile the PSNR level closely follows that of the ES withslight degradation less than 0.10 − 0.13 dB. The algorithmis able to maintain rather constant PSNR performance. Theefficiency of our algorithm is largely depended on the precisionof estimated probability functions and selection of suitablesearch scheme for each class. By applying computationalreduction techniques such as zero motion prejudgment [8],the computational gain can be improved.

IV. CONCLUSIONS

In this paper a statistical pattern classification scheme fordesign of adaptive motion estimation algorithm was proposed.The simulation results demonstrate that the proposed techniqueoutperforms conventional fast block matching methods interms of higher PSNR and less computational complexity. Insummary, an intelligent encoder should apply adaptive motionestimation techniques instead of relying on fixed patterns.The ideas of machine learning and pattern recognition can beapplied for the design of adaptive intelligent motion estimationtechniques.

REFERENCES

[1] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, ”Motioncompensated interframe coding for video conferencing,” in Proc. Nat.Telecommun. Conf., New Orleans, LA, Nov. 1981, pp. G5.3.1-G5.3.5.

[2] R. Li, B. Zeng, and M. L. Liou, A new three-step search algorithm forblock motion estimation, IEEE Trans. Circuits Syst. Video Technol., vol.4, no. 8, pp. 438442, Aug. 1994. L. M. Po and W. C.

[3] Ma, A novel four-step search algorithm for fast block motion estimation,IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 6, pp. 313317, Jun.1996.

[4] S. Zhu and K.-K. Ma, ”A new diomond search algorithm for fast blockmatching motion estimation,”IEEE Trans. Image Process.,vol.9, no. 2, pp.287-290, Feb. 2000

[5] S.-Y. Huang, C.-Y Cho,and J.-S. Wang,”Adaptive fast block matchingalgorithm by switching search patterns for sequences with wide-rangeMotion Content,”IEEE Trans. Circuit Syst. Video Technol., vol.15, no.11,pp1373-1384, Nov. 2005.

[6] Y. Nie and K.-K. Ma, ”Adaptive Rood Pattern Search for Fast BlockMatching Motion Estimation,”IEEE Trans. Image Process.,vol.11, no. 12,pp. 1442-1449, Dec. 2002

[7] R. O. Duda, P. E. Hart, D. G. Stork, ”Pattern Classification”, SecondEdition, Wiley Interscience Publication, NewYork, 2001.

[8] J. -F. Yang, S.-C. Chang and C.-Y. Chen, ”Computation Reduction forMotion Search in Low Rate Video Coders,”IEEE Trans. Circuit Syst.Video Technol.,vol.12, no. 10, pp. 948-951, Oct. 2002

809

Page 4: [IEEE 2006 IEEE International Symposium on Signal Processing and Information Technology - Vancouver, BC, Canada (2006.08.27-2006.08.30)] 2006 IEEE International Symposium on Signal

Predicted Frame

(a) Diskus frame sequence

0 10 20 30 40 50 60 70 80 90 1000

50

100

150

200

250

frame number

com

puta

tions

AdaptiveDSES

(b) Computational complexity per frame

0 10 20 30 40 50 60 70 80 90 10022

24

26

28

30

32

34

36

frame number

PS

NR

AdaptiveNTSS4SS

(c) PSNR

0 10 20 30 40 50 60 70 80 90 10022

24

26

28

30

32

34

36

frame number

PS

NR

AdaptiveESDS

(d) PSNR

Fig. 1. Frame-based PSNRperformance and computational compleity of ES, DS, NTSS, 4SS and Adaptive Algorithm (a)Diskus Frame Sequence (b)Comparision of computational complexity per frame for ES, DS, and CBAS. (c) Comparison of frame based PSNR for ES, DS, and CBAS. (d) Comparisonof frame based PSNR for NTSS, 4SS, and CBAS.

TABLE I

AVERAGE PSNR AND NUMBER OF SEARCH POINTS PER MOTION VECTOR

FOR STANDARD MOTION ESTIMATION ALGORITHMS AND CBAS

Diskus Flower Garden

PSNR Computations PSNR Computations

ES 29.1827 204.2828 20.1102 202.048

CBAS 28.6038 16.0245 19.8445 14.7756

DS 28.077 20.4175 18.9605 19.6835

NTSS 28.1470 24.4261 19.6148 25.7251

FSS 28.024 20.8600 19.6148 25.7251

TSS 28.1785 23.3588 19.3537 23.3791

0 10 20 30 40 50 60 70 80 90 10016

17

18

19

20

21

22

23

frame number

PS

NR

AdaptiveESDS

Fig. 2. PSNR performance for ES, DS, and classification based adaptivesearch (CBAS) for flower garden sequence.

810