optimization of quadtree segmen of quadtree segmentation and hybrid... · optimization of quadtree...

33

Upload: duongxuyen

Post on 12-Mar-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Optimization of Quadtree Segmentation and Hybrid 2D and 3D

Motion Estimation in a Rate-Distortion Framework �

Dimitrios Tzovaras, Stavros Vachtsevanos

and Michael G. Strintzis, Senior Member, IEEE

Electrical and Computer Engineering DepartmentInformation Processing LaboratoryAristotle University of Thessaloniki

Thessaloniki 54006, Greecephone: (+30-31) 996-359, fax: (+30-31) 996-398

e-mail : [email protected]

Manuscript originally submitted in 31/8/96 and revised in 28/2/97

Abstract

A rate-distortion framework is used to de�ne a very low bit rate coding scheme based onquadtree segmentation and optimized selection of motion estimators. This technique achievesmaximum reconstructed image quality under the constraint of a target bitrate for the coding ofthe vector �eld and segmentation information . First, a complete scheme is proposed for hybrid2D and 3D motion estimation and compensation. The quadtree object segmentation is optimizedfor hybrid motion estimation in the rate distortion sense. This scheme adapts to the depth ofthe quadtree and the technique used for motion estimation for each leaf of the tree. A moresophisticated technique, adapted to the requirements of a very low bit rate coder is also proposedwhich considers also the transmission of the prediction error corresponding to the particular choiceof the motion estimator. Based on these coding schemes two versions of a very low bit rateimage sequence coder are developed. Experimental results illustrating the performance of theproposed techniques in very low bit rate image sequence coding application areas are presentedand evaluated.

Keywords: Very low bit-rate coding; rate-distortion theory; quadtree segmentation; hybrid

2D and 3D motion estimation.

�This work was supported by the EU CEC Projects PANORAMA (Package for New Autostereoscopic MultiviewSystems and Applications, ACTS project 092). and VIDAS (Video Assisted with Audio Coding and Representation,ACTS project 057). The assistance of COST 211ter is also gratefully acknowledged.

1

1 INTRODUCTION

The transmission of full motion video through limited capacity channels is critically dependent on

the ability of the compression schemes to achieve target bit rates while still maintaining acceptable

visual quality [1]. In order to achieve this, motion estimation and motion compensated prediction are

frequently used, so as to reduce temporal redundancies in image sequences [2].

Block-based motion estimation techniques have been extensively studied and applied for very low

bitrate coding [3]. However, the performance of these techniques in such low bitrates is restricted by

well known limitations such as the block and mosquito artifacts. Object-based techniques for image

sequence coding [4, 5, 6, 7, 8] have been proposed to solve these problems. A�ne 2D and 3D motion

estimation models are used for motion compensation in object-based techniques.

While much attention has been devoted to the coding of the intraframe and prediction error images,

the displacement vector �elds or the parameters of the motion models are usually coded losslessly

using DPCM/Hu�man coding resulting in limited compression. The reason for this is that digital

video coding systems for many applications have at their disposal rates ranging from 1 Mbit=sec to

25 Mbits=sec. At such rates, only a minor part of the global rate is devoted to the transmission

of the motion information, hence the bitrate overhead produced by lossless encoding of the vector

�elds or motion model parameters is negligible. In many emerging application areas however, lossy

compression of the vector �elds is often highly desirable, and sometimes unavoidable. For example,

mobile videophone or multimedia transmission channels are often limited to capacities of 4.8 - 64

kbps. In such cases, it is clearly desirable to reduce as much as possible the bitrate needed to transmit

the motion vector �elds, provided that this reduction does not produce intolerable distortion in the

reconstructed image. It is also desirable to allocate the bitrate devoted to the coding of motion �elds

adaptively, depending on the complexity of the sequence and also on the overall bitrate availability

when the latter varies with time.

Furthermore, it is desirable to select both the image segmentation into objects and the motion

estimation method for each object, adaptively so as to best represent the motion of each part of the

image. For example, in a typical videophone sequence rough object subdivision combined with a

block-based or a�ne 2D motion estimation model would su�ce for the description of the motion of

most parts of the foreground object while much �ner object subdivision perhaps down to the size of

single blocks and more sophisticated 3D motion models would be best suited for the description of

mouth and eye motion.

2

An elegant framework for the de�nition of such a strategy is provided by the classical rate-

distortion constrained minimization procedure. This has been recently used in many coding applica-

tions including bit allocation for vector quantization [9], wavelet packet image coding [10] quadtree

still image coding [11] and generic video compression [12]. In [13] the rate distortion function was eval-

uated for image sequence coding under the assumption of Gaussian intensity distribution. Recently,

rate-distortion optimization was also used for the development of e�cient motion and disparity esti-

mation strategies [14, 15]. In this schemes a rate-distortion framework is used to de�ne a displacement

vector-�eld estimation technique for use in video coding.

The present paper investigates the use of this methodology for quadtree segmentation and hybrid

motion �eld estimation under the constraint of a target bitrate for the coding of the vector information.

Quadtree segmentation is performed using rate-distortion criteria and is fused with motion estimation

by selecting for each node of the quadtree, the optimum motion estimator from a predetermined set of

candidate motion estimators. As an extension, the rate-distortion optimization scheme was used also

to optimize the allocation of the prediction error corresponding to the motion estimation procedures

in the transmitted information. Also, two possible codecs are proposed and evaluated experimentally

: In the �rst, (Rate-distortion optimized hybrid codec, RDHC) the image sequence is divided into

groups (GOF) of ten frames and rate-distortion optimization is directed to each GOF separately. The

�rst frame of each GOF is transmitted as a still image (intra-coded) and the succeeding frames are

coded using motion compensation from the reconstructed version of the previous frame (see Figure

1). The prediction error is not transmitted. In the second (Rate-distortion optimized hybrid codec

with error transmission, RDHCE), the optimization algorithm is applied to a much longer sequence

of (up to 100) frames (see Figure 2). In this case the �rst frame is coded as a still image and all

other frames are coded using rate distortion optimization of the quadtree segmentation, the motion

estimation and additionally the prediction error transmission.

The paper is organized as follows. The hybrid technique used for motion estimation is described in

Section 2 and a brief review is given of each candidate technique. The determination of the optimal

quadtree segmentation based on rate distortion optimization for the identi�cation of the optimal

quadtree and the optimal motion estimator for each leaf of the quadtree is described in Section 3.

Also in Section 4 the proposed technique is extended to include the transmission of the prediction error

corresponding to the motion compensated estimates. Finally, experimental results demonstrating the

performance of the proposed algorithm for the coding of typical videophone and videoconferencing

sequences are given in Section 5 and conclusions are drawn in Section 6.

3

2 HYBRID MOTION ESTIMATION

Several schemes have been proposed in the literature for the coding of videophone or videoconference

image sequences [1, 2, 3]. Motion estimation and compensation is the basic approach used in all these

schemes. Modeling of the motion information by translation, zoom and pan, or a 3D rotation and

translation, has been used in block-based, a�ne and 3D motion estimators. Experimental results

have shown that a�ne 2D motion or 3D motion models may represent e�ciently the displacement

occuring in typical scenes ; however most parts of the image may be coded very satisfactorily using

only translational motion (e.g. the background). Moreover the complexity of the a�ne and 3D motion

estimation algorithms is higher than the complexity of the block-based scheme.

Based on the above observations we propose the use of all these models for the motion compensated

coding of the objects of a scene, within a rate-distortion framework optimizing both the segmentation

and the motion estimation (see Figure 3). The alternatives are :

1. The motion of the object is insigni�cant. No motion vector is transmitted and the previous

estimate for this frame is considered su�cient.

2. Translational motion is used to compensate the motion of an object. A two-component motion

vector is transmitted.

3. An a�ne 2Dmotion model is used to represent the motion of an object The six model parameters

are transmitted.

4. A 3D motion model represents best the motion of an object in the scene. The eight motion

model parameters are transmitted.

5. The 3D motion corresponding to the same block in the preceding in time frame is used.

In other words, the optimum image segmentation together with the optimum of the above motion

estimator candidates are selected so as to minimize a distortion index subject to a ceiling on the

available rate. Classical DFD (Displaced Frame Di�erence) minimization de�nes the block-based

motion estimator. To de�ne the remaining motion estimator candidates, a brief review is given below

of the a�ne and the 3D motion estimation methods [16, 17].

4

2.1 A�ne 2D Motion Estimation

The general representation of an a�ne transformation is

[x y 1] = [u v 1]

264 a11 a12 0a21 a22 0a31 a32 1

375 ; (1)

equivalently,

x = a11u+ a21v + a31 ;

y = a12u+ a22v + a32 :

If w = (a11; a12; a13; a21; a22; a23) is the vector of the motion parameters, the following system of

equations must be solved for each object in the scene [16] :

Aw = b ;

where

A =

" 0

0

#;

and

=

264

X1 Y1 1...

......

XN YN 1

375 ;

and

b =�bx by

�T = [X1 + dx1; : : : ; XN + dxN ; Y1 + dy1; : : : ; YN + dyN ]T ;

where N is the number of points with coordinates (Xi, Yi) in the working object.

The solution to the above overdetermined set of equations may be obtained by use of a least-

squares method, or alternately by the robust least median of squares technique described in details

in [18].

2.2 3D Motion Estimation

In order to identify the objects in the scene, the original image is segmented into areas having uniform

motion characteristics. The 3-D motion of each object in the scene is modeled using a six-parameter

model. More speci�cally, we assume that if (x(t); y(t); z(t)) are the coordinates of a point at time

instant t, its coordinates (x(t� 1); y(t� 1); z(t� 1)) at time instant t � 1, are given by264 x(t� 1)y(t� 1)z(t � 1)

375 =

264 1 �wz wy

wz 1 �wx

�wy wx 1

375264 x(t)y(t)z(t)

375+

264 txtytz

375 ; (2)

5

where three translational parameters (tx, ty , tz) and three rotational parameters (wx, wy , wz) are used

to describe the motion of the underlying object. The goal of the 3-D motion estimation procedure is

to compute the motion parameter vector (wx, wy, wz , tx, ty , tz) for each object in the scene.

If (X; Y ) are the coordinates of the perspective projection of the 3-D point (x(t); y(t); z(t)) on the

image plane at time t, then :

X = fx(t)

z(t)and Y = f

y(t)

z(t): (3)

From (2),(3) the 2-D motion vectors vm(X; Y ) that correspond to the pixels (X; Y ) of each object

are de�ned by projection of the 3-D motion on the 2-D image plane, as follows:

vmx(X; Y ) = X(t� 1)�X(t) = fX � wzY + fwy + ftx=z(t)

�wyX + wxY + f + ftz=z(t)�X ;

vmy(X; Y ) = Y (t� 1)� Y (t) = fwzX + Y � fwx + fty=z(t)

�wyX + wxY + f + ftz=z(t)� Y : (4)

If�� tzz

�� << 1, f = 1, and the �eld of view of the camera, i.e. the visual angle corresponding to the

whole image, is not very large, the displacement vector may be approximated by

vmx(X; Y ) = �wxXY + wy(1 +X2)� wzY +(tx � tzX)

z(t)

vmy(X; Y ) = �wx(1 + Y 2) + wyXY + wzX +ty � tzY

z(t): (5)

Furthermore by making the assumption that the object surface is a plane i.e.

kxx+ kyy + kzz(t) = 1 ;

or, equivalently,

1

z(t)= kxX + kyY + kz ;

and substituting in (4)

vmx(X; Y ) = a1 + a2X + a3Y + a7X2 + a8XY

vmy(X; Y ) = a4 + a5X + a6Y + a7XY + a8Y2 ; (6)

where

a1 = wx + kztx; a2 = kxtx � kztx; a3 = �wz + kytx ;

a4 = �wx + kzty ; a5 = wz + kxty ; a6 = kyty � kztz ;

a7 = wy � kxtz ; a8 = �wx � kytz : (7)

6

Equation (6), can be evaluated for N of the initially estimated 2D vectors, forming a system of 2*N

equations and 8 unknowns. With N � 4 this system is overdetermined and can be solved using least-

squares methods or alternately, by the robust least median of squares motion estimation algorithm

described in details in [18].

3 QUADTREE SEGMENTATION USING RATE DISTORTION

OPTIMIZATION

Let a segmentation S of the image plane consisting of Ns objects si : S = fsi; i = 1; : : : ; Nsg. For each

candidate motion estimator Mj 2 M, M = fMj ; j = 1; : : : ;Mg, let V(j) = fv(j)i ; j = 1; : : : ;M; i =

1; : : : ; Nsg be the corresponding set of object motion vectors. The general joint vector �eld estimation

and quadtree segmentation algorithm aims to minimize the distortion D of the reconstructed image

sequence, under a constraint Rbudget on the rate for the transmission of the vector �eld and the

corresponding segmentation information. This corresponds to the following constrained optimization

problem :

minfv

(j)i2V(j);j=1;:::;M & si2Sg

NsXi=1

D(vi; si) ; (8)

subject toNsXi=1

R(vi; si) � Rbudget ;

where Ns is the total number of objects in the image, D(vi; si) is the contribution of the decision

(vi; si) to the distortion function and R(vi; si) is the contribution of the same to the total rate or

cost of the transmission of the motion vectors and the segmentation information.

The methodology in [19] permits the transformation of the above into an unconstrained op-

timization problem. In fact, as shown in [19] (the proof is also contained in [9]), the solution

fv?i (�); s?i (�); i= 1; : : : ; Nsg of the problem of unconstrained minimization of

J(�) =NsXi=1

J(vi(�); si(�)) =NsXi=1

D(vi(�); si(�)) + �NsXi=1

R(vi(�); si(�)) ; (9)

is also a solution of (8) if

Rbudget =NsXi=1

R(v?i (�); s?i (�)) : (10)

The problem therefore, reduces to ensuring that (10) has a solution for f(vi(�); si(�); i= 1; : : : ; Ns)g

and determining this solution. This was investigated from a general viewpoint in [9], where it was

shown that R(v(�); s(�)) and D(v(�); s(�)) are monotonic functions of the Langrange multiplier

�, which may be interpreted as a quality index, with values ranging from 0 (highest rate, lowest

7

distortion) to 1 (lowest rate, highest distortion). Further investigation in [10] proved that the

solution of (10) may be obtained using any fast convex algorithm such as the bisection algorithm [20].

One such algorithm, which gave very good results in both [10] and [11] is also adopted in the present

paper (Section 3.3). The algorithm for the determination of the optimal segmentation and motion

estimator consists of the following steps.

3.1 Initialization

In the present work the segmentation S is assumed to be completely described by a quadtree T =

fnij ; i = 0; : : : ; log(D); j = 1; : : : ; 2ig, (i.e. nij is the jth node at scale i) where D�D is the dimension

of the image. Associated with each node nij of the quadtree T is a data structure of the form :

f ~Rij;~Dij ;~J ij; split(n

ij)g where

~Rij is the entropy associated with the current node, ~Di

j is the distortion

corresponding to the reconstruction of the block associated with the current quadtree node, ~J ij is the

corresponding Langrangian cost and split(nij) is a bit indicating if the current node is split or not.

In the following, for the sake of simplicity, node nij will be denoted as node n.

The proposed joint motion estimation and quadtree segmentation algorithm begins by gathering

the motion estimator set dependent (Rm(n); Dm(n)) values for each node n 2 T , and for each m 2

M(n), to generate the R versus D values for each node. In the following the two stages of the

rate-distortion optimization algorithm are presented i.e. (a) �nding the optimal solution for a given

operating slope � and (b) determining the optimal slope �. Stage (a) of the algorithm is run for a

given slope value � and could be considered a subroutine called by stage (b) of the algorithm.

3.2 Finding the optimal solution for a given operating slope �

For the current �, populate all the nodes n of the tree with their minimum Langrangian costs Jn(�),

equivalently J ij(�) when referring to the jth node at quadtree scale i, i.e.:

Jn(�) = minm2M

[Dm(�) + �Rm(�)] :

The cost is minimized with respect to the choice of motion estimation for node n.

� Step 1 Initialize i 0. Let d = log(N) is the maximum tree depth. For n = n0j , if m?n is the

value of m that minimizes Jn(�) initialize

~Rij Rm?

n; ~Di

j Dm?n; ~J ij Jm?

n(11)

� Step 2 Set i i+ 1. If i = d go to Step 5.

8

� Step 3 If

J ij(�) >~J i+12j (�) + ~J i+12j�1(�) +

~J i+12j�2(�) +~J i+12j�3(�)

then set split(nij) = 1 and

~Dij = Di+1

2j +Di+12j�1 +Di+1

2j�2 +Di+12j�3

~Rij = Ri+1

2j + Ri+12j�1 + Ri+1

2j�2 + Ri+12j�3

~J ij = J i+12j + J i+12j�1 + J i+12j�2 + J i+12j�3 (12)

� Step 4 Go to Step 2.

� Step 5 Starting from the root node n0 and using in a linked-like fashion the node data-structure

element split(nij), selected optimally for all the nodes of T , construct the optimal quadtree S?(�)

and its associated optimal motion estimator set choice M?(�).

3.3 Determining the optimal slope �

First two values �l < �u of � are found so that

NsXi=1

R(vi(�u); si(�u)) � Rbudget �NsXi=1

R(vi(�l); si(�l)) :

Note that the initial segmentation S(�u) of the �rst frame of the image sequence, selects the whole

image to be a single object, while for the subsequent frames S(�u) is the optimal segmentation

corresponding to the previous in time frame. Similarly, S((�l) corresponds to the segmentation

resulting by full splitting of the quadtree until the minimum allowed object (block) size is reached.

For the coding of a sequence of frames, the values of �l, �u are chosen to be �l = 0, �u = 1 for

the initial frame and �l = 0:8��, �u = 1:2�� for subsequent frames, where �� is the solution of (10)

for the previous frame. The bracketing interval is then successively decreased in size by the following

procedure :

� Step 1 For each object i, i = 1; : : : ; Ns, compute D(vi(�l); si(�l)) and D(vi(�u); si(�u)) and

the corresponding R(vi(�l); si(�l)) and R(vi(�u); si(�u)).

� Step 2 Set

�new =

�����PNs

i=1[D(vi(�l); si(�l))�D(vi(�u); si(�u))]PNs

i=1[R(vi(�l); si(�l))� R((�u); si(�u))]

�����+ �

where � is a vanishingly small positive number.

9

� Step 3 Compute the fR(vi(�new); si(�new))gi and fD(vi(�new); si(�new))gi minimizing J(�)

for � = �new .

� Step 4 IfPNs

i=1R(vi(�new); si(�new)) =PNs

i=1R(vi(�u); si(�u)), then stop, � = �u

Else ifPNs

i=1R(vi(�new); si(�new)) > Rbudget, �l �new . Go to step 2.

Else �u �new . Goto step 2.

Note that the distortion corresponding to each motion vector in a speci�c search area is computed

only once, at the �rst iteration of the algorithm. Thus the computational load of the algorithm

consists of updating the entropy of the vector �eld and �nding the minimum J(�).

3.4 Computation of the Entropy and Distortion Functions

The speci�c way the vector �eld a�ects the quality of the reconstructed image will determine the

distortion index D(vi(�); si(�)). A number of such distortion measures have been proposed in the

literature. In case of quadtree-based segmentation, the simplest and most commonly used is the

temporally displaced frame di�erence

D(vi; si) =bxXk=0

byXl=0

���imt(m+ k; n+ l)� imt�1(m+ k + v(i)x ; n+ l+ v(i)y )��� ;

where (m;n) are the upper left hand corner coordinates of block i corresponding to node n, imt(),

and imt�1() is the image at time instant t and t � 1, respectively, (v(i)x ; v

(i)y ) is the projected 2D

motion vector corresponding to the motion estimation method used and bx; by are the dimensions of

the working object (block).

Also the transmission cost R(vi(�); si(�)) will depend on the speci�c method used for the coding

of the vector �elds. The motion parameter vectors corresponding to either the a�ne or the 3D motion

estimation methods are �rst quantized uniformly and the corresponding entropy is thus computed

with respect to the quantized parameters. The distortion is also computed based on the quantized

motion parameter vectors. The entropy of the current node, R(vi(�); si(�)), is computed by summing

the entropy of the already coded motion or motion parameter vectors with the entropy of the split ij

bit and the entropy of the parameter indicating which motion estimator is chosen for the already

coded quadtree objects.

In the present work, the use of entropy coding (e.g Hu�man or arithmetic coding) is assumed, with

an adaptive probability model, for the computation of the entropy of each component of the motion

or motion parameter vectors. Thus, the entropy Rd(vi(�); si(�)) for the coding of the component d

10

of vi computed using a speci�c motion estimator, (i.e. x- or y- component of the 2D motion vector

�eld in the case of block-matching motion estimation, or a parameter of the quantized motion vector

�eld in the case of a�ne or 3D motion estimation) is computed as

Rd(vi(�); si(�)) = �vdmaxX

x=�vdmin

px(vi(�))log2(px(vi(�)) ;

where px(vi(�)) is the probability that the vector �eld minimizing the index J(vi(�); si(�)) satis�es

v(i)d = x, and vdmin, vdmax, are the minimum and maximum allowed values for the speci�c component

d of the motion or motion parameter vector. The probability px(vi(�)) is computed for each operat-

ing point (vi(�); si(�)) of the algorithm using the information of all previously encoded parameters

corresponding to the speci�c motion estimator as follows :

px(vi(�)) =1

i

iXk=1

dx(vk(�)) ; (13)

where

dx(vk(�)) =

(1 if (v

(k)d (�) = x)

0 otherwise:

Note that (13) is equivalent to the following e�cient formula for the incremental computation of

px(vi(�)) :

px(vi+1(�)) =i

i+ 1px(vi(�)) + dx(vi+1(�)) : (14)

A more computationally e�cient approach, which does not involve incremental computation of the

probability density of the vector �eld or the �rst order vector �eld di�erences is to assume a model

for this probability density function. Speci�cally, the assumption of Gauss-Markov Random Field

to describe motion [21, 22] vector di�erences could be used so as to accelerate the rate-distortion

minimization procedure.

4 TRANSMISSIONOF PREDICTIONERROR INFORMATION

In many applications the transmission of motion and segmentation information alone is insu�cient for

the reconstruction of an image sequence with acceptable quality. Then the choice must be permitted

of transmitting the prediction error corresponding to the motion estimator, especially for blocks

containing artifacts in the reconstruction image. The optimization technique described in detail

in the previous sections is easily extended so as to accommodate the choice of the transmission of

prediction error. It will be assumed that the prediction error is coded using DCT transformation and

Hu�man entropy coding as is the case in JPEG.

11

Let again a segmentation S of the image plane consisting ofNs objects si : S = fsi; i = 1; : : : ; Nsg.

For each candidate motion estimator Mj , j = 1; : : : ;M , let V(j) = fv(j)i ; j = 1; : : : ;M; i = 1; : : : ; Nsg

be the corresponding set of object motion vectors and E(j) = fe(j)i ; j = 1; : : : ;M; i = 1; : : : ; Nsg

be the corresponding set of prediction errors. The general joint vector �eld estimation and quadtree

segmentation algorithm aims to minimize the distortion D of the reconstructed image sequence, under

a constraint Rbudget on the rate for the transmission of the vector �eld, the corresponding segmentation

and prediction error information. This corresponds to the following constrained optimization problem

:

minfv

(j)i2V(j);j=1;:::;M & si2S & ei2Eg

NsXi=1

D(vi; si; ei) ; (15)

subject toNsXi=1

R(vi; si; ei) � Rbudget ;

where Ns is the total number of objects in the image, D(vi; si; ei) is the contribution of the decision

(vi; si; ei) to the distortion function and R(vi; si; ei) is the contribution of the same to the total rate

or cost of the transmission of the motion vectors, the segmentation map and the prediction error

information. As discussed in the previous section, the solution fv?i (�); s?i (�); e

?i(�); i = 1; : : : ; Nsg of

the problem of unconstrained minimization of

J(�) =NsXi=1

J(vi(�); si(�); ei(�)) =NsXi=1

D(vi(�); si(�); ei(�)) + �NsXi=1

R(vi(�); si(�); ei(�)) ; (16)

is also a solution of (8) if

Rbudget =NsXi=1

R(v?i (�); s?i (�); e

?i (�)) : (17)

The problem therefore, reduces to ensuring that (10) has a solution for f(vi(�); si(�); ei(�)); i =

1; : : : ; Ns)g and determining this solution.

The rate-distortion optimization algorithm presented in the previous section is used again for the

computation of the optimal segmentation and the corresponding motion estimator for each object.

In the case of error transmission the distortion function used is

D(vi; si) =bxXk=0

byXl=0

���imt(m+ k; n+ l)� ^imt(m+ k + v(i)x ; n+ l + v(i)y )��� ;

where

^imt(m+ k; n+ l) = imt�1(m+ k + v(i)x ; n+ l + v(i)y ) + e(m+ k + v(i)x ; n+ l+ v(i)y )

where e(m;n) is the decoded prediction error corresponding to pixel (m;n).

12

5 APPLICATION TOVERY LOWBIT RATE IMAGE SEQUENCE

CODING

5.1 Computational Complexity of the Proposed Approach

The proposed algorithm consists of the initialization and the optimization stages. During the com-

putationally involved initialization stage, all candidate algorithms for motion estimation are tested

and their performance is stored in memory. Note that the distortion function is computed only in

the �rst iteration of the algorithm and thus the computational load of the remainder of the algorithm

reduces to updating of the entropy of the vector �eld and �nding the minimum J(�). Note also that

following the �rst frame, the search for � is con�ned to narrower intervals and hence fewer iterations

are needed for the completion of the optimization stage. Also, the choice of the segmentation map

corresponding to the previous frame as an initial segmentation for the current frame, further reduces

the computational load of the proposed algorithm.

The execution time of the encoding phase of the algorithm in a R4400 INDIGO II Silicon Graphics

workstation is approximately 1 minute for each frame. Most of this time (about 60%) is devoted to

the initialization stage where the distortion functions are computed. The remaining 40% is used to

complete the optimization procedure. As elaborated in the sequel the optimization algorithm was run

for many values of Rbudget and target bitrates and was seen to converge very rapidly, never requiring

more than �fteen iterations in any of our experiments. This convergence to the desired bitrate for a

videoconference scene (\Claire") and a non-videoconference scene (\Tunnel") is depicted in Figures

4 and 5, respectively.

The computational complexity of the decoding phase of the proposed approach is very low and

even a software decoder may be implemented in real time. This makes the proposed scheme an

attractive candidate for use in asymmetrical coding applications such as multimedia communication,

teleshopping and �xed-location-to-mobile or broadcast video communication.

5.2 Experimental Results

In order to evaluate the performance of the proposed approach for very low bitrate coding, the algo-

rithm was applied to the typical QCIF sequences \Claire", \Miss America", \Salesman", \Foreman"

and a QCIF version of the MPEG-4 test sequence \Tunnel". The frame rate of the sequences was

10 frames=sec.

Objects were de�ned using the segmentation procedure described in Section 3. The construction

of the quadtree representation for the �rst frame of the video sequence may start with the hypothesis

13

that the whole image may be represented by only one node (root) and proceed with tests deciding if

further splitting is necessary, as described in Section 3. However, it was found experimentally that

in practice it is preferable to start by testing smaller blocks (typically 32� 32 pixels each) instead of

the entire image, so as to expedite the identi�cation of the optimal segmentation. Similar constraints

were imposed on the size of the smallest blocks in order to maintain the segmentation overhead

information within acceptable limits. The size of the smallest block in our experiments was chosen

to be xmin � ymin = 4 � 4. As noted in Section 3, after the �rst frame a good choice for the initial

segmentation mask is the segmentation mask corresponding to the immediately preceding in time

frame.

As a �rst experiment, the optimized motion estimation and quadtree segmentation algorithm was

applied for the coding of speci�c frames only of the \Miss America" and \Claire" sequences. More

speci�cally, the algorithm was applied between the zeroth and �fth frames of \Miss America" and the

zeroth and second frames of \Claire". The original zeroth and �fth frames of \Miss America" and

zeroth and second frames of \Claire" are shown in Figures 6a and 6b and 7a and 7b, respectively,

while the reconstructed ones and the corresponding prediction errors are illustrated in Figures 6c, 6d

and 7c, 7d, respectively. The resulting quadtree segmentation is also shown in Figure 8. Figure 9

shows the MSE (mean-square error) versus bitrate for the coding of the �fth frame of \Miss America".

This curve was obtained by running the proposed algorithm for various Rbudget and computing the

MSE after the convergence.

For very low bitrate video coding applications the proposed algorithm may also be applied to

groups of frames (GOF), with the bit allocation assigned adaptively to each frame of the sequence

in order to optimize the transmission bitstream. More speci�cally, the total bitrate is given for the

coding of the whole GOF and the rate is allocated to each frame according to the frame di�erence

between the current frame and the preceding in time frame. Thus, if Rtarget is the target bitrate for

the coding of the whole GOF and

FDi =N1Xk1

N2Xk2

jimi(k1; k2)� imi�1(k1; k2)j

is the frame di�erence between frames i and i� 1, and N1; N2 are the image dimensions, the bitrate

is allocated as follows

R(i)budget

= Rtarget

FDiP10i=1 FDi

:

In this way, the coding of the motion and segmentation information is optimized for the whole GOF.

14

For the coding of a group of frames, the �rst version of the optimized motion estimation algorithm

(RDHC) was applied for the coding of the �rst ten frames of \Miss America" and \Claire" using

motion compensation. Figures 10 and 11 illustrate the comparison of the proposed algorithm with

the simple block matching algorithm with a block size of 8 � 8 and 16 � 16, used in the existing

standards (MPEG, H.261), in terms of PSNR versus frame number, for the coding of the �rst ten

frames of the two above sequences, respectively. The simple block matching approach consists of

absolute displaced frame di�erence minimization, by searching exhaustively within a search area of

�15; : : : ; 15 half-pixels in the previous in time frame, centered at the position of the examined block.

In both coders, it was assumed that each frame was predicted using the reconstructed previous frame,

and that the prediction error was not transmitted. The Rbudget was selected to be 24 Kbits and 12

Kbits for \Miss America" and 24 Kbits and 10 Kbits for \Claire", respectively, for the coding of

the nine frames following the initial frame. As seen, the performance of the proposed algorithm is

very high compared to the standard block matching technique, since both segmentation and motion

estimation are optimized for a speci�c bitrate.

The second version of the proposed coder (RDHCE) which includes error transmission was also

used for the coding of the above QCIF sequences. The technique described earlier, of adapting bit

allocation to frame di�erences is also used in this coder version ; however the transmission of the

error allows e�cient communication of much longer groups of frames. Three target bit rates were

tested : 14:4 Kbits=sec, 28:8 Kbits=sec and 64 Kbits=sec. Figures 12, 13, 14, 15 and 16 illustrate

the performance of the proposed scheme in terms of PSNR versus bitrate for the coding of the �rst

�fty frames of \Miss America", \Claire", \Salesman" and \Foreman" and the �rst twenty frames of

\Tunnel". Note that the coder performance remains stable providing a high quality image without

the intervention of a new intra-coded image before the end of the whole sequence. Figures 17, 18

and 19, 20, respectively, show original and reconstructed frames of the sequences \Salesman" and

\Claire" coded at 64 Kbits=sec.

Regarding the performance of optimization, Figures 21a and 21c show the resulting quadtree

corresponding to the tenth frame of \Salesman" and the eighth frame of \Foreman". The complexity of

the quadtree for the representation of both sequences is relatively high, since \Salesman" is a sequence

with large rigid and exible motion while in the \Foreman" sequence both object motion and camera

motion exist. Also , Figures 21b and 21d present the motion estimator index map corresponding to

the segmentation maps of Figures 21a and 21c. In these �gures each object is colored depending on

the motion estimator choice. White color corresponds to no motion and predicted motion, dark gray

15

to a�ne 2D motion, light gray to block-matching and black corresponds to 3D motion. Also Table

1 shows an average percentage of the motion estimator choices for the coding of all the QCIF image

sequences.

In addition to videoconferencing schemes, the algorithm was also tested in the more compli-

cated MPEG-4 test sequence \Tunnel". The results are comparable to the results obtained using

videophone-related sequences. Figure 22 shows original and reconstructed frames of the sequence

\Tunnel" coded at 64 Kbits=sec. The segmentation map and the motion estimator index map cor-

responding to this sequence is shown in Figures 23a, 23b. The performance in terms of the PSNR

versus bitrate for the coding of the �rst twenty frames of \Tunnel" is illustrated in Fig. 16.

6 CONCLUSIONS

A rate-distortion framework was used to de�ne a very low bit rate coding scheme based on quadtree

segmentation and optimized selection of motion estimation. This technique achieved maximum re-

constructed image quality under the constraint of a target bitrate for the coding of the vector and

segmentation information. Joint optimization of quadtree object segmentation and motion estima-

tion method for each leaf of the tree, i.e. each object, was achieved subject to this target bitrate

restriction. For experimental evaluation, the proposed algorithm was combined with an appropriate

rate control strategy to optimize the coding of the motion vectors corresponding to all the frames of

a group of frames of an image sequence. Experimental results in application for the coding of typical

videophone sequences have demonstrated the performance of the proposed very low bit-rate video

coding scheme.

References

[1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bitrates - A

Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1994.

[2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc. IEEE,

vol. 73, pp. 523{548, Apr. 1985.

[3] M. I. Sezan and R. L. Lagendijk, Motion Analysis and Image Sequence Processing. Kluwer

Academic Publishers, Boston, 1993.

[4] H. G. Mussman, M. Hotter, and J. Ostermann, \Object-oriented analysis-synthesis coding of

moving images," Signal Processing : Image Communication, vol. 1, pp. 117{138, Oct. 1989.

[5] M. Hotter, \Optimization and E�ciency of an Object-Oriented Analysis-Synthesis Coder," Signal

Processing : Image Communication, vol. 4, pp. 181{194, Apr. 1994.

16

[6] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo Image Sequence

Coding Based on 3-D Motion Estimation and Compensation," Signal Processing : Image Com-

munication, vol. 7, pp. 129{145, Jan. 1995.

[7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \3-D Motion/Disparity Segmentation for

Object-Based Image Sequence Coding," Optical Engineering, special issue on Visual Communi-

cations and Image Processing, vol. 35, pp. 137{145, Jan. 1996.

[8] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo Image Se-

quences using Joint 3-D Motion/Disparity Compensation," IEEE Trans. on Circuits and Systems

for Video Technology, vol. 7, No. 2, pp. 312-328, Apr. 1997.

[9] Y. Shoham and A. Gersho, \E�cient Bit Allocation for an Arbitrary Set of Quantizers," IEEE

Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453, Sep. 1988.

[10] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion Sense," IEEE

Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993.

[11] G. J. Sullivan and R. Baker, \E�cient Quadtree Coding of Images and Video," IEEE Trans. on

Image Processing, vol. 3, pp. 327{331, May 1994.

[12] E. Reusens, \Joint Optimization of Representation Model and Frame Segmentation for Generic

Video Compression," Signal Processing, vol. 46, pp. 105{117, Sep. 1995.

[13] G. Tziritas, \Rate Distortion Theory for Image and Video Coding," in 27th Int'l Conf. on Digital

Signal Processing, (Limassol, Cyprus), Jun. 1995.

[14] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory for Very

Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommunications '96, (Istanbul,

Turkey), Apr. 1996.

[15] D. Tzovaras and M. G. Strintzis, \Motion and Disparity Estimation Using Rate Distortion

Theory for Very Low Bit Rate and Multiview Image Sequence Coding," in VCIP '97, (San Jose,

California), Feb. 1997.

[16] G. Wolberg, Digital Image Warping. IEEE Computer Society Press, Los Alamitos, California,

1988.

[17] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow Generated

by Several Moving Objects," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 7,

pp. 384{401, Jul. 1985.

[18] S. S. Sinha and B. G. Schunck, \A Two-Stage Algorithm for Discontinuity-Preserving Surface

Reconstruction," IEEE Trans. on PAMI, vol. 14, Jan. 1992.

[19] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Optimum Allo-

cation of Resources," Operation Res., vol. 11, pp. 399{417, 1963.

[20] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, \Numerical Recipes in C :

The Art of Scienti�c Computing," tech. rep., Cambridge, U.K., Cambridge Univ. Press, 1988.

[21] J. Konrad and E. Dubois, \Bayesian estimation of motion vector �elds," IEEE Trans. Pattern

Anal. and Mach. Intell., vol. 14, pp. 910{927, September 1992.

[22] S. Malassiotis and M. G. Strintzis, \Joint Motion / Disparity MAP Estimation for Stereo Image

Sequences," IEE Proceedings: Vision, Image & Signal Processing, vol. 143, pp. 101{108, Apr.

1996.

17

List of Figures

1 The �rst version (RDHC) of the proposed coding scheme (no option of error transmis-

sion). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20

2 The second version (RDHCE) of the proposed coding scheme (error transmission is an

option). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20

3 Rate distortion framework for the selection of the optimal motion estimator. : : : : : : 20

4 Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20)budget = 0.064

bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21

5 Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14)budget = 0.24

bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21

6 (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c)

Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm

at 41:31 dB PSNR. (d) The corresponding prediction error image. : : : : : : : : : : : 22

7 (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed

frame 2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d)

The corresponding prediction error image. : : : : : : : : : : : : : : : : : : : : : : : : 23

8 Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded

with the frame 0 as reference using the proposed algorithm. : : : : : : : : : : : : : : 24

9 MSE versus bitrate (in bits=pixel) for the coding of the �fth frame of \Miss America". 24

10 Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the

block-matching method (BM) in terms of PSNR vs frame number for the coding of the

image sequence \Miss America" at 12 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : 25

11 Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the

block-matching method (BM) in terms of PSNR vs frame number for the coding of the

image sequence \Claire" at 10 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : : : : : 25

12 Performance of the proposed rate-distortion optimized hybrid coder with transmission

of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image

sequence \Miss America" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec : : : : 26

13 Performance of the proposed rate-distortion optimized hybrid coder with transmission

of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image

sequence \Claire" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : : 26

18

14 Performance of the proposed rate-distortion optimized hybrid coder with transmission

of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image

sequence \Salesman" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : 27

15 Performance of the proposed rate-distortion optimized hybrid coder with transmission

of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image

sequence \Foreman" at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 27

16 Performance of the proposed rate-distortion optimized hybrid coder with transmission

of the prediction error (RDHCE) for the coding of the �rst 20 frames of the image

sequence \Tunnel" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 28

17 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman. : : : : : : : 29

18 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at

64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29

19 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire. : : : : : : : : : 30

20 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at

64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30

21 Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Sales-

man" and (c), (d) the eighth frame of \Foreman". White color corresponds to no mo-

tion and predicted motion, dark gray to a�ne 2D motion, light gray to block-matching

and black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : 31

22 Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10

(e) and 15 (f) of \Tunnel" coded at 64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : 32

23 (a) Segmentation map of the �fth frame of \Tunnel" interleaved with the image. (b)

The corresponding motion estimator index map. White color corresponds to no motion

and predicted motion, dark gray to a�ne 2D motion, light gray to block-matching and

black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32

19

List of Tables

1 Average percentage of selection of the candidate motion estimation methods used for

the coding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel". : : : : 28

20

... ...

GOF 2 GOF NGOF 1

Figure 1: The �rst version (RDHC) of the proposed coding scheme (no option of error transmission).

...

I−frame Predicted frames

Figure 2: The second version (RDHCE) of the proposed coding scheme (error transmission is anoption).

Block Matching Motion Estimation

Affine 2D MotionEstimation

No Motion

Previous Motion

3D MotionEstimation

Rate DistortionOptimization Framework Decision

Figure 3: Rate distortion framework for the selection of the optimal motion estimator.

21

0:02

0:04

0:06

0:08

0:1

1 2 3 4 5 6 7 8 9 10 11 12

BITRATE

[bpp]

Iteration

Claire, 28.8 Kbits/sec

Figure 4: Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20)budget = 0.064

bits/pixel.

0

0:2

0:4

0:6

0:8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

BITRATE

[bpp]

Iteration

Tunnel, 64 Kbits/sec

Figure 5: Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14)budget = 0.24

bits/pixel.

22

(a) (b)

(e) (f)

Figure 6: (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c)Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm at 41:31 dBPSNR. (d) The corresponding prediction error image.

23

(a) (b)

(e) (f)

Figure 7: (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed frame2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d) The correspondingprediction error image.

24

Figure 8: Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded withthe frame 0 as reference using the proposed algorithm.

3

3:5

4

4:5

5

5:5

6

6:5

0:002 0:004 0:006 0:008 0:01 0:012 0:014

MSE

Bitrate

\Miss America" 33

3

3

3 33

Figure 9: MSE versus bitrate (in bits=pixel) for the coding of the �fth frame of \Miss America".

25

35

37

39

41

43

1 2 3 4 5 6 7 8 9

PSNR[dB]

Frame Number

RDHC-24RDHC-12 ++

+

+

++

+ ++ +

BM-8x8 ??

?

?

?

?

? ??

?

BM-16x16

Figure 10: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with theblock-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence\Miss America" at 12 Kbits=sec and 24 Kbits=sec.

32

34

36

38

40

42

44

46

1 2 3 4 5 6 7 8 9

PSNR[dB]

Frame Number

RDHC-24RDHC-10 +

+

+

++

++ +

+ +

BM-8x8 ??

?

??

? ??

??

BM-16x16

Figure 11: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with theblock-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence\Claire" at 10 Kbits=sec and 24 Kbits=sec.

26

35

37

39

41

43

45

1 6 11 16 21 26 31 36 41 46 51

PSNR[dB]

Frame Number

64 Kbits/sec28.8 Kbits/sec ?

?

?

?

?

????

???

???????

???????

?????????????????

????????

14.4 Kbits/sec

Figure 12: Performance of the proposed rate-distortion optimized hybrid coder with transmissionof the prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \MissAmerica" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec

32

34

36

38

40

42

44

46

1 2 3 4 5 6 7 8 9

PSNR[dB]

Frame Number

RDHC-24RDHC-10 +

+

+

++

++ +

+ +

BM-8x8 ??

?

??

? ??

??

BM-16x16

Figure 13: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Claire" at64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.

27

31

33

35

37

39

41

43

45

1 6 11 16 21 26 31 36 41 46 51

PSNR[dB]

Frame Number

64 Kbits/sec28.8 Kbits/sec ?

?

???

?

?

?

?????????????????

??????????????????????????

14.4 Kbits/sec

Figure 14: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Salesman"at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.

27

29

31

33

35

37

39

1 6 11 16 21 26 31 36 41 46 51

PSNR[dB]

Frame Number

64 Kbits/sec28.8 Kbits/sec ?

?

?

?

???

??????????

????

???

??????

?????

????

?????????

???

14.4 Kbits/sec

Figure 15: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Foreman"at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec.

28

26

28

30

32

1 6 11 16 21

PSNR[dB]

Frame Number

64 Kbits/sec28.8 Kbits/sec ?

?

?

?? ?

? ? ?? ?

? ??

?

?

? ?? ? ?

14.4 Kbits/sec

Figure 16: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 20 frames of the image sequence \Tunnel"at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.

Method Percentage

of Selection (%)

No motion 37.11

Predicted motion 12.86

Block-Matching 30.38

A�ne 2D 13.1

3D motion 6.55

Table 1: Average percentage of selection of the candidate motion estimation methods used for thecoding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel".

29

(a) (b) (c)

(d) (e) (f)

Figure 17: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman.

(a) (b) (c)

(d) (e) (f)

Figure 18: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at64 Kbits=sec.

30

(a) (b) (c)

(d) (e) (f)

Figure 19: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire.

(a) (b) (c)

(d) (e) (f)

Figure 20: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at64 Kbits=sec.

31

(a) (b)

(c) (d)

Figure 21: Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Salesman"and (c), (d) the eighth frame of \Foreman". White color corresponds to no motion and predictedmotion, dark gray to a�ne 2D motion, light gray to block-matching and black corresponds to 3Dmotion.

32

(a) (b) (c)

(d) (e) (f)

Figure 22: Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10 (e)and 15 (f) of \Tunnel" coded at 64 Kbits=sec.

(a) (b)

Figure 23: (a) Segmentation map of the �fth frame of \Tunnel" interleaved with the image. (b) Thecorresponding motion estimator index map. White color corresponds to no motion and predictedmotion, dark gray to a�ne 2D motion, light gray to block-matching and black corresponds to 3Dmotion.

33