Optimization of Quadtree Segmentation and Hybrid 2D and 3D
Motion Estimation in a Rate-Distortion Framework �
Dimitrios Tzovaras, Stavros Vachtsevanos
and Michael G. Strintzis, Senior Member, IEEE
Electrical and Computer Engineering DepartmentInformation Processing LaboratoryAristotle University of Thessaloniki
Thessaloniki 54006, Greecephone: (+30-31) 996-359, fax: (+30-31) 996-398
e-mail : [email protected]
Manuscript originally submitted in 31/8/96 and revised in 28/2/97
Abstract
A rate-distortion framework is used to de�ne a very low bit rate coding scheme based onquadtree segmentation and optimized selection of motion estimators. This technique achievesmaximum reconstructed image quality under the constraint of a target bitrate for the coding ofthe vector �eld and segmentation information . First, a complete scheme is proposed for hybrid2D and 3D motion estimation and compensation. The quadtree object segmentation is optimizedfor hybrid motion estimation in the rate distortion sense. This scheme adapts to the depth ofthe quadtree and the technique used for motion estimation for each leaf of the tree. A moresophisticated technique, adapted to the requirements of a very low bit rate coder is also proposedwhich considers also the transmission of the prediction error corresponding to the particular choiceof the motion estimator. Based on these coding schemes two versions of a very low bit rateimage sequence coder are developed. Experimental results illustrating the performance of theproposed techniques in very low bit rate image sequence coding application areas are presentedand evaluated.
Keywords: Very low bit-rate coding; rate-distortion theory; quadtree segmentation; hybrid
2D and 3D motion estimation.
�This work was supported by the EU CEC Projects PANORAMA (Package for New Autostereoscopic MultiviewSystems and Applications, ACTS project 092). and VIDAS (Video Assisted with Audio Coding and Representation,ACTS project 057). The assistance of COST 211ter is also gratefully acknowledged.
1
1 INTRODUCTION
The transmission of full motion video through limited capacity channels is critically dependent on
the ability of the compression schemes to achieve target bit rates while still maintaining acceptable
visual quality [1]. In order to achieve this, motion estimation and motion compensated prediction are
frequently used, so as to reduce temporal redundancies in image sequences [2].
Block-based motion estimation techniques have been extensively studied and applied for very low
bitrate coding [3]. However, the performance of these techniques in such low bitrates is restricted by
well known limitations such as the block and mosquito artifacts. Object-based techniques for image
sequence coding [4, 5, 6, 7, 8] have been proposed to solve these problems. A�ne 2D and 3D motion
estimation models are used for motion compensation in object-based techniques.
While much attention has been devoted to the coding of the intraframe and prediction error images,
the displacement vector �elds or the parameters of the motion models are usually coded losslessly
using DPCM/Hu�man coding resulting in limited compression. The reason for this is that digital
video coding systems for many applications have at their disposal rates ranging from 1 Mbit=sec to
25 Mbits=sec. At such rates, only a minor part of the global rate is devoted to the transmission
of the motion information, hence the bitrate overhead produced by lossless encoding of the vector
�elds or motion model parameters is negligible. In many emerging application areas however, lossy
compression of the vector �elds is often highly desirable, and sometimes unavoidable. For example,
mobile videophone or multimedia transmission channels are often limited to capacities of 4.8 - 64
kbps. In such cases, it is clearly desirable to reduce as much as possible the bitrate needed to transmit
the motion vector �elds, provided that this reduction does not produce intolerable distortion in the
reconstructed image. It is also desirable to allocate the bitrate devoted to the coding of motion �elds
adaptively, depending on the complexity of the sequence and also on the overall bitrate availability
when the latter varies with time.
Furthermore, it is desirable to select both the image segmentation into objects and the motion
estimation method for each object, adaptively so as to best represent the motion of each part of the
image. For example, in a typical videophone sequence rough object subdivision combined with a
block-based or a�ne 2D motion estimation model would su�ce for the description of the motion of
most parts of the foreground object while much �ner object subdivision perhaps down to the size of
single blocks and more sophisticated 3D motion models would be best suited for the description of
mouth and eye motion.
2
An elegant framework for the de�nition of such a strategy is provided by the classical rate-
distortion constrained minimization procedure. This has been recently used in many coding applica-
tions including bit allocation for vector quantization [9], wavelet packet image coding [10] quadtree
still image coding [11] and generic video compression [12]. In [13] the rate distortion function was eval-
uated for image sequence coding under the assumption of Gaussian intensity distribution. Recently,
rate-distortion optimization was also used for the development of e�cient motion and disparity esti-
mation strategies [14, 15]. In this schemes a rate-distortion framework is used to de�ne a displacement
vector-�eld estimation technique for use in video coding.
The present paper investigates the use of this methodology for quadtree segmentation and hybrid
motion �eld estimation under the constraint of a target bitrate for the coding of the vector information.
Quadtree segmentation is performed using rate-distortion criteria and is fused with motion estimation
by selecting for each node of the quadtree, the optimum motion estimator from a predetermined set of
candidate motion estimators. As an extension, the rate-distortion optimization scheme was used also
to optimize the allocation of the prediction error corresponding to the motion estimation procedures
in the transmitted information. Also, two possible codecs are proposed and evaluated experimentally
: In the �rst, (Rate-distortion optimized hybrid codec, RDHC) the image sequence is divided into
groups (GOF) of ten frames and rate-distortion optimization is directed to each GOF separately. The
�rst frame of each GOF is transmitted as a still image (intra-coded) and the succeeding frames are
coded using motion compensation from the reconstructed version of the previous frame (see Figure
1). The prediction error is not transmitted. In the second (Rate-distortion optimized hybrid codec
with error transmission, RDHCE), the optimization algorithm is applied to a much longer sequence
of (up to 100) frames (see Figure 2). In this case the �rst frame is coded as a still image and all
other frames are coded using rate distortion optimization of the quadtree segmentation, the motion
estimation and additionally the prediction error transmission.
The paper is organized as follows. The hybrid technique used for motion estimation is described in
Section 2 and a brief review is given of each candidate technique. The determination of the optimal
quadtree segmentation based on rate distortion optimization for the identi�cation of the optimal
quadtree and the optimal motion estimator for each leaf of the quadtree is described in Section 3.
Also in Section 4 the proposed technique is extended to include the transmission of the prediction error
corresponding to the motion compensated estimates. Finally, experimental results demonstrating the
performance of the proposed algorithm for the coding of typical videophone and videoconferencing
sequences are given in Section 5 and conclusions are drawn in Section 6.
3
2 HYBRID MOTION ESTIMATION
Several schemes have been proposed in the literature for the coding of videophone or videoconference
image sequences [1, 2, 3]. Motion estimation and compensation is the basic approach used in all these
schemes. Modeling of the motion information by translation, zoom and pan, or a 3D rotation and
translation, has been used in block-based, a�ne and 3D motion estimators. Experimental results
have shown that a�ne 2D motion or 3D motion models may represent e�ciently the displacement
occuring in typical scenes ; however most parts of the image may be coded very satisfactorily using
only translational motion (e.g. the background). Moreover the complexity of the a�ne and 3D motion
estimation algorithms is higher than the complexity of the block-based scheme.
Based on the above observations we propose the use of all these models for the motion compensated
coding of the objects of a scene, within a rate-distortion framework optimizing both the segmentation
and the motion estimation (see Figure 3). The alternatives are :
1. The motion of the object is insigni�cant. No motion vector is transmitted and the previous
estimate for this frame is considered su�cient.
2. Translational motion is used to compensate the motion of an object. A two-component motion
vector is transmitted.
3. An a�ne 2Dmotion model is used to represent the motion of an object The six model parameters
are transmitted.
4. A 3D motion model represents best the motion of an object in the scene. The eight motion
model parameters are transmitted.
5. The 3D motion corresponding to the same block in the preceding in time frame is used.
In other words, the optimum image segmentation together with the optimum of the above motion
estimator candidates are selected so as to minimize a distortion index subject to a ceiling on the
available rate. Classical DFD (Displaced Frame Di�erence) minimization de�nes the block-based
motion estimator. To de�ne the remaining motion estimator candidates, a brief review is given below
of the a�ne and the 3D motion estimation methods [16, 17].
4
2.1 A�ne 2D Motion Estimation
The general representation of an a�ne transformation is
[x y 1] = [u v 1]
264 a11 a12 0a21 a22 0a31 a32 1
375 ; (1)
equivalently,
x = a11u+ a21v + a31 ;
y = a12u+ a22v + a32 :
If w = (a11; a12; a13; a21; a22; a23) is the vector of the motion parameters, the following system of
equations must be solved for each object in the scene [16] :
Aw = b ;
where
A =
" 0
0
#;
and
=
264
X1 Y1 1...
......
XN YN 1
375 ;
and
b =�bx by
�T = [X1 + dx1; : : : ; XN + dxN ; Y1 + dy1; : : : ; YN + dyN ]T ;
where N is the number of points with coordinates (Xi, Yi) in the working object.
The solution to the above overdetermined set of equations may be obtained by use of a least-
squares method, or alternately by the robust least median of squares technique described in details
in [18].
2.2 3D Motion Estimation
In order to identify the objects in the scene, the original image is segmented into areas having uniform
motion characteristics. The 3-D motion of each object in the scene is modeled using a six-parameter
model. More speci�cally, we assume that if (x(t); y(t); z(t)) are the coordinates of a point at time
instant t, its coordinates (x(t� 1); y(t� 1); z(t� 1)) at time instant t � 1, are given by264 x(t� 1)y(t� 1)z(t � 1)
375 =
264 1 �wz wy
wz 1 �wx
�wy wx 1
375264 x(t)y(t)z(t)
375+
264 txtytz
375 ; (2)
5
where three translational parameters (tx, ty , tz) and three rotational parameters (wx, wy , wz) are used
to describe the motion of the underlying object. The goal of the 3-D motion estimation procedure is
to compute the motion parameter vector (wx, wy, wz , tx, ty , tz) for each object in the scene.
If (X; Y ) are the coordinates of the perspective projection of the 3-D point (x(t); y(t); z(t)) on the
image plane at time t, then :
X = fx(t)
z(t)and Y = f
y(t)
z(t): (3)
From (2),(3) the 2-D motion vectors vm(X; Y ) that correspond to the pixels (X; Y ) of each object
are de�ned by projection of the 3-D motion on the 2-D image plane, as follows:
vmx(X; Y ) = X(t� 1)�X(t) = fX � wzY + fwy + ftx=z(t)
�wyX + wxY + f + ftz=z(t)�X ;
vmy(X; Y ) = Y (t� 1)� Y (t) = fwzX + Y � fwx + fty=z(t)
�wyX + wxY + f + ftz=z(t)� Y : (4)
If�� tzz
�� << 1, f = 1, and the �eld of view of the camera, i.e. the visual angle corresponding to the
whole image, is not very large, the displacement vector may be approximated by
vmx(X; Y ) = �wxXY + wy(1 +X2)� wzY +(tx � tzX)
z(t)
vmy(X; Y ) = �wx(1 + Y 2) + wyXY + wzX +ty � tzY
z(t): (5)
Furthermore by making the assumption that the object surface is a plane i.e.
kxx+ kyy + kzz(t) = 1 ;
or, equivalently,
1
z(t)= kxX + kyY + kz ;
and substituting in (4)
vmx(X; Y ) = a1 + a2X + a3Y + a7X2 + a8XY
vmy(X; Y ) = a4 + a5X + a6Y + a7XY + a8Y2 ; (6)
where
a1 = wx + kztx; a2 = kxtx � kztx; a3 = �wz + kytx ;
a4 = �wx + kzty ; a5 = wz + kxty ; a6 = kyty � kztz ;
a7 = wy � kxtz ; a8 = �wx � kytz : (7)
6
Equation (6), can be evaluated for N of the initially estimated 2D vectors, forming a system of 2*N
equations and 8 unknowns. With N � 4 this system is overdetermined and can be solved using least-
squares methods or alternately, by the robust least median of squares motion estimation algorithm
described in details in [18].
3 QUADTREE SEGMENTATION USING RATE DISTORTION
OPTIMIZATION
Let a segmentation S of the image plane consisting of Ns objects si : S = fsi; i = 1; : : : ; Nsg. For each
candidate motion estimator Mj 2 M, M = fMj ; j = 1; : : : ;Mg, let V(j) = fv(j)i ; j = 1; : : : ;M; i =
1; : : : ; Nsg be the corresponding set of object motion vectors. The general joint vector �eld estimation
and quadtree segmentation algorithm aims to minimize the distortion D of the reconstructed image
sequence, under a constraint Rbudget on the rate for the transmission of the vector �eld and the
corresponding segmentation information. This corresponds to the following constrained optimization
problem :
minfv
(j)i2V(j);j=1;:::;M & si2Sg
NsXi=1
D(vi; si) ; (8)
subject toNsXi=1
R(vi; si) � Rbudget ;
where Ns is the total number of objects in the image, D(vi; si) is the contribution of the decision
(vi; si) to the distortion function and R(vi; si) is the contribution of the same to the total rate or
cost of the transmission of the motion vectors and the segmentation information.
The methodology in [19] permits the transformation of the above into an unconstrained op-
timization problem. In fact, as shown in [19] (the proof is also contained in [9]), the solution
fv?i (�); s?i (�); i= 1; : : : ; Nsg of the problem of unconstrained minimization of
J(�) =NsXi=1
J(vi(�); si(�)) =NsXi=1
D(vi(�); si(�)) + �NsXi=1
R(vi(�); si(�)) ; (9)
is also a solution of (8) if
Rbudget =NsXi=1
R(v?i (�); s?i (�)) : (10)
The problem therefore, reduces to ensuring that (10) has a solution for f(vi(�); si(�); i= 1; : : : ; Ns)g
and determining this solution. This was investigated from a general viewpoint in [9], where it was
shown that R(v(�); s(�)) and D(v(�); s(�)) are monotonic functions of the Langrange multiplier
�, which may be interpreted as a quality index, with values ranging from 0 (highest rate, lowest
7
distortion) to 1 (lowest rate, highest distortion). Further investigation in [10] proved that the
solution of (10) may be obtained using any fast convex algorithm such as the bisection algorithm [20].
One such algorithm, which gave very good results in both [10] and [11] is also adopted in the present
paper (Section 3.3). The algorithm for the determination of the optimal segmentation and motion
estimator consists of the following steps.
3.1 Initialization
In the present work the segmentation S is assumed to be completely described by a quadtree T =
fnij ; i = 0; : : : ; log(D); j = 1; : : : ; 2ig, (i.e. nij is the jth node at scale i) where D�D is the dimension
of the image. Associated with each node nij of the quadtree T is a data structure of the form :
f ~Rij;~Dij ;~J ij; split(n
ij)g where
~Rij is the entropy associated with the current node, ~Di
j is the distortion
corresponding to the reconstruction of the block associated with the current quadtree node, ~J ij is the
corresponding Langrangian cost and split(nij) is a bit indicating if the current node is split or not.
In the following, for the sake of simplicity, node nij will be denoted as node n.
The proposed joint motion estimation and quadtree segmentation algorithm begins by gathering
the motion estimator set dependent (Rm(n); Dm(n)) values for each node n 2 T , and for each m 2
M(n), to generate the R versus D values for each node. In the following the two stages of the
rate-distortion optimization algorithm are presented i.e. (a) �nding the optimal solution for a given
operating slope � and (b) determining the optimal slope �. Stage (a) of the algorithm is run for a
given slope value � and could be considered a subroutine called by stage (b) of the algorithm.
3.2 Finding the optimal solution for a given operating slope �
For the current �, populate all the nodes n of the tree with their minimum Langrangian costs Jn(�),
equivalently J ij(�) when referring to the jth node at quadtree scale i, i.e.:
Jn(�) = minm2M
[Dm(�) + �Rm(�)] :
The cost is minimized with respect to the choice of motion estimation for node n.
� Step 1 Initialize i 0. Let d = log(N) is the maximum tree depth. For n = n0j , if m?n is the
value of m that minimizes Jn(�) initialize
~Rij Rm?
n; ~Di
j Dm?n; ~J ij Jm?
n(11)
� Step 2 Set i i+ 1. If i = d go to Step 5.
8
� Step 3 If
J ij(�) >~J i+12j (�) + ~J i+12j�1(�) +
~J i+12j�2(�) +~J i+12j�3(�)
then set split(nij) = 1 and
~Dij = Di+1
2j +Di+12j�1 +Di+1
2j�2 +Di+12j�3
~Rij = Ri+1
2j + Ri+12j�1 + Ri+1
2j�2 + Ri+12j�3
~J ij = J i+12j + J i+12j�1 + J i+12j�2 + J i+12j�3 (12)
� Step 4 Go to Step 2.
� Step 5 Starting from the root node n0 and using in a linked-like fashion the node data-structure
element split(nij), selected optimally for all the nodes of T , construct the optimal quadtree S?(�)
and its associated optimal motion estimator set choice M?(�).
3.3 Determining the optimal slope �
First two values �l < �u of � are found so that
NsXi=1
R(vi(�u); si(�u)) � Rbudget �NsXi=1
R(vi(�l); si(�l)) :
Note that the initial segmentation S(�u) of the �rst frame of the image sequence, selects the whole
image to be a single object, while for the subsequent frames S(�u) is the optimal segmentation
corresponding to the previous in time frame. Similarly, S((�l) corresponds to the segmentation
resulting by full splitting of the quadtree until the minimum allowed object (block) size is reached.
For the coding of a sequence of frames, the values of �l, �u are chosen to be �l = 0, �u = 1 for
the initial frame and �l = 0:8��, �u = 1:2�� for subsequent frames, where �� is the solution of (10)
for the previous frame. The bracketing interval is then successively decreased in size by the following
procedure :
� Step 1 For each object i, i = 1; : : : ; Ns, compute D(vi(�l); si(�l)) and D(vi(�u); si(�u)) and
the corresponding R(vi(�l); si(�l)) and R(vi(�u); si(�u)).
� Step 2 Set
�new =
�����PNs
i=1[D(vi(�l); si(�l))�D(vi(�u); si(�u))]PNs
i=1[R(vi(�l); si(�l))� R((�u); si(�u))]
�����+ �
where � is a vanishingly small positive number.
9
� Step 3 Compute the fR(vi(�new); si(�new))gi and fD(vi(�new); si(�new))gi minimizing J(�)
for � = �new .
� Step 4 IfPNs
i=1R(vi(�new); si(�new)) =PNs
i=1R(vi(�u); si(�u)), then stop, � = �u
Else ifPNs
i=1R(vi(�new); si(�new)) > Rbudget, �l �new . Go to step 2.
Else �u �new . Goto step 2.
Note that the distortion corresponding to each motion vector in a speci�c search area is computed
only once, at the �rst iteration of the algorithm. Thus the computational load of the algorithm
consists of updating the entropy of the vector �eld and �nding the minimum J(�).
3.4 Computation of the Entropy and Distortion Functions
The speci�c way the vector �eld a�ects the quality of the reconstructed image will determine the
distortion index D(vi(�); si(�)). A number of such distortion measures have been proposed in the
literature. In case of quadtree-based segmentation, the simplest and most commonly used is the
temporally displaced frame di�erence
D(vi; si) =bxXk=0
byXl=0
���imt(m+ k; n+ l)� imt�1(m+ k + v(i)x ; n+ l+ v(i)y )��� ;
where (m;n) are the upper left hand corner coordinates of block i corresponding to node n, imt(),
and imt�1() is the image at time instant t and t � 1, respectively, (v(i)x ; v
(i)y ) is the projected 2D
motion vector corresponding to the motion estimation method used and bx; by are the dimensions of
the working object (block).
Also the transmission cost R(vi(�); si(�)) will depend on the speci�c method used for the coding
of the vector �elds. The motion parameter vectors corresponding to either the a�ne or the 3D motion
estimation methods are �rst quantized uniformly and the corresponding entropy is thus computed
with respect to the quantized parameters. The distortion is also computed based on the quantized
motion parameter vectors. The entropy of the current node, R(vi(�); si(�)), is computed by summing
the entropy of the already coded motion or motion parameter vectors with the entropy of the split ij
bit and the entropy of the parameter indicating which motion estimator is chosen for the already
coded quadtree objects.
In the present work, the use of entropy coding (e.g Hu�man or arithmetic coding) is assumed, with
an adaptive probability model, for the computation of the entropy of each component of the motion
or motion parameter vectors. Thus, the entropy Rd(vi(�); si(�)) for the coding of the component d
10
of vi computed using a speci�c motion estimator, (i.e. x- or y- component of the 2D motion vector
�eld in the case of block-matching motion estimation, or a parameter of the quantized motion vector
�eld in the case of a�ne or 3D motion estimation) is computed as
Rd(vi(�); si(�)) = �vdmaxX
x=�vdmin
px(vi(�))log2(px(vi(�)) ;
where px(vi(�)) is the probability that the vector �eld minimizing the index J(vi(�); si(�)) satis�es
v(i)d = x, and vdmin, vdmax, are the minimum and maximum allowed values for the speci�c component
d of the motion or motion parameter vector. The probability px(vi(�)) is computed for each operat-
ing point (vi(�); si(�)) of the algorithm using the information of all previously encoded parameters
corresponding to the speci�c motion estimator as follows :
px(vi(�)) =1
i
iXk=1
dx(vk(�)) ; (13)
where
dx(vk(�)) =
(1 if (v
(k)d (�) = x)
0 otherwise:
Note that (13) is equivalent to the following e�cient formula for the incremental computation of
px(vi(�)) :
px(vi+1(�)) =i
i+ 1px(vi(�)) + dx(vi+1(�)) : (14)
A more computationally e�cient approach, which does not involve incremental computation of the
probability density of the vector �eld or the �rst order vector �eld di�erences is to assume a model
for this probability density function. Speci�cally, the assumption of Gauss-Markov Random Field
to describe motion [21, 22] vector di�erences could be used so as to accelerate the rate-distortion
minimization procedure.
4 TRANSMISSIONOF PREDICTIONERROR INFORMATION
In many applications the transmission of motion and segmentation information alone is insu�cient for
the reconstruction of an image sequence with acceptable quality. Then the choice must be permitted
of transmitting the prediction error corresponding to the motion estimator, especially for blocks
containing artifacts in the reconstruction image. The optimization technique described in detail
in the previous sections is easily extended so as to accommodate the choice of the transmission of
prediction error. It will be assumed that the prediction error is coded using DCT transformation and
Hu�man entropy coding as is the case in JPEG.
11
Let again a segmentation S of the image plane consisting ofNs objects si : S = fsi; i = 1; : : : ; Nsg.
For each candidate motion estimator Mj , j = 1; : : : ;M , let V(j) = fv(j)i ; j = 1; : : : ;M; i = 1; : : : ; Nsg
be the corresponding set of object motion vectors and E(j) = fe(j)i ; j = 1; : : : ;M; i = 1; : : : ; Nsg
be the corresponding set of prediction errors. The general joint vector �eld estimation and quadtree
segmentation algorithm aims to minimize the distortion D of the reconstructed image sequence, under
a constraint Rbudget on the rate for the transmission of the vector �eld, the corresponding segmentation
and prediction error information. This corresponds to the following constrained optimization problem
:
minfv
(j)i2V(j);j=1;:::;M & si2S & ei2Eg
NsXi=1
D(vi; si; ei) ; (15)
subject toNsXi=1
R(vi; si; ei) � Rbudget ;
where Ns is the total number of objects in the image, D(vi; si; ei) is the contribution of the decision
(vi; si; ei) to the distortion function and R(vi; si; ei) is the contribution of the same to the total rate
or cost of the transmission of the motion vectors, the segmentation map and the prediction error
information. As discussed in the previous section, the solution fv?i (�); s?i (�); e
?i(�); i = 1; : : : ; Nsg of
the problem of unconstrained minimization of
J(�) =NsXi=1
J(vi(�); si(�); ei(�)) =NsXi=1
D(vi(�); si(�); ei(�)) + �NsXi=1
R(vi(�); si(�); ei(�)) ; (16)
is also a solution of (8) if
Rbudget =NsXi=1
R(v?i (�); s?i (�); e
?i (�)) : (17)
The problem therefore, reduces to ensuring that (10) has a solution for f(vi(�); si(�); ei(�)); i =
1; : : : ; Ns)g and determining this solution.
The rate-distortion optimization algorithm presented in the previous section is used again for the
computation of the optimal segmentation and the corresponding motion estimator for each object.
In the case of error transmission the distortion function used is
D(vi; si) =bxXk=0
byXl=0
���imt(m+ k; n+ l)� ^imt(m+ k + v(i)x ; n+ l + v(i)y )��� ;
where
^imt(m+ k; n+ l) = imt�1(m+ k + v(i)x ; n+ l + v(i)y ) + e(m+ k + v(i)x ; n+ l+ v(i)y )
where e(m;n) is the decoded prediction error corresponding to pixel (m;n).
12
5 APPLICATION TOVERY LOWBIT RATE IMAGE SEQUENCE
CODING
5.1 Computational Complexity of the Proposed Approach
The proposed algorithm consists of the initialization and the optimization stages. During the com-
putationally involved initialization stage, all candidate algorithms for motion estimation are tested
and their performance is stored in memory. Note that the distortion function is computed only in
the �rst iteration of the algorithm and thus the computational load of the remainder of the algorithm
reduces to updating of the entropy of the vector �eld and �nding the minimum J(�). Note also that
following the �rst frame, the search for � is con�ned to narrower intervals and hence fewer iterations
are needed for the completion of the optimization stage. Also, the choice of the segmentation map
corresponding to the previous frame as an initial segmentation for the current frame, further reduces
the computational load of the proposed algorithm.
The execution time of the encoding phase of the algorithm in a R4400 INDIGO II Silicon Graphics
workstation is approximately 1 minute for each frame. Most of this time (about 60%) is devoted to
the initialization stage where the distortion functions are computed. The remaining 40% is used to
complete the optimization procedure. As elaborated in the sequel the optimization algorithm was run
for many values of Rbudget and target bitrates and was seen to converge very rapidly, never requiring
more than �fteen iterations in any of our experiments. This convergence to the desired bitrate for a
videoconference scene (\Claire") and a non-videoconference scene (\Tunnel") is depicted in Figures
4 and 5, respectively.
The computational complexity of the decoding phase of the proposed approach is very low and
even a software decoder may be implemented in real time. This makes the proposed scheme an
attractive candidate for use in asymmetrical coding applications such as multimedia communication,
teleshopping and �xed-location-to-mobile or broadcast video communication.
5.2 Experimental Results
In order to evaluate the performance of the proposed approach for very low bitrate coding, the algo-
rithm was applied to the typical QCIF sequences \Claire", \Miss America", \Salesman", \Foreman"
and a QCIF version of the MPEG-4 test sequence \Tunnel". The frame rate of the sequences was
10 frames=sec.
Objects were de�ned using the segmentation procedure described in Section 3. The construction
of the quadtree representation for the �rst frame of the video sequence may start with the hypothesis
13
that the whole image may be represented by only one node (root) and proceed with tests deciding if
further splitting is necessary, as described in Section 3. However, it was found experimentally that
in practice it is preferable to start by testing smaller blocks (typically 32� 32 pixels each) instead of
the entire image, so as to expedite the identi�cation of the optimal segmentation. Similar constraints
were imposed on the size of the smallest blocks in order to maintain the segmentation overhead
information within acceptable limits. The size of the smallest block in our experiments was chosen
to be xmin � ymin = 4 � 4. As noted in Section 3, after the �rst frame a good choice for the initial
segmentation mask is the segmentation mask corresponding to the immediately preceding in time
frame.
As a �rst experiment, the optimized motion estimation and quadtree segmentation algorithm was
applied for the coding of speci�c frames only of the \Miss America" and \Claire" sequences. More
speci�cally, the algorithm was applied between the zeroth and �fth frames of \Miss America" and the
zeroth and second frames of \Claire". The original zeroth and �fth frames of \Miss America" and
zeroth and second frames of \Claire" are shown in Figures 6a and 6b and 7a and 7b, respectively,
while the reconstructed ones and the corresponding prediction errors are illustrated in Figures 6c, 6d
and 7c, 7d, respectively. The resulting quadtree segmentation is also shown in Figure 8. Figure 9
shows the MSE (mean-square error) versus bitrate for the coding of the �fth frame of \Miss America".
This curve was obtained by running the proposed algorithm for various Rbudget and computing the
MSE after the convergence.
For very low bitrate video coding applications the proposed algorithm may also be applied to
groups of frames (GOF), with the bit allocation assigned adaptively to each frame of the sequence
in order to optimize the transmission bitstream. More speci�cally, the total bitrate is given for the
coding of the whole GOF and the rate is allocated to each frame according to the frame di�erence
between the current frame and the preceding in time frame. Thus, if Rtarget is the target bitrate for
the coding of the whole GOF and
FDi =N1Xk1
N2Xk2
jimi(k1; k2)� imi�1(k1; k2)j
is the frame di�erence between frames i and i� 1, and N1; N2 are the image dimensions, the bitrate
is allocated as follows
R(i)budget
= Rtarget
FDiP10i=1 FDi
:
In this way, the coding of the motion and segmentation information is optimized for the whole GOF.
14
For the coding of a group of frames, the �rst version of the optimized motion estimation algorithm
(RDHC) was applied for the coding of the �rst ten frames of \Miss America" and \Claire" using
motion compensation. Figures 10 and 11 illustrate the comparison of the proposed algorithm with
the simple block matching algorithm with a block size of 8 � 8 and 16 � 16, used in the existing
standards (MPEG, H.261), in terms of PSNR versus frame number, for the coding of the �rst ten
frames of the two above sequences, respectively. The simple block matching approach consists of
absolute displaced frame di�erence minimization, by searching exhaustively within a search area of
�15; : : : ; 15 half-pixels in the previous in time frame, centered at the position of the examined block.
In both coders, it was assumed that each frame was predicted using the reconstructed previous frame,
and that the prediction error was not transmitted. The Rbudget was selected to be 24 Kbits and 12
Kbits for \Miss America" and 24 Kbits and 10 Kbits for \Claire", respectively, for the coding of
the nine frames following the initial frame. As seen, the performance of the proposed algorithm is
very high compared to the standard block matching technique, since both segmentation and motion
estimation are optimized for a speci�c bitrate.
The second version of the proposed coder (RDHCE) which includes error transmission was also
used for the coding of the above QCIF sequences. The technique described earlier, of adapting bit
allocation to frame di�erences is also used in this coder version ; however the transmission of the
error allows e�cient communication of much longer groups of frames. Three target bit rates were
tested : 14:4 Kbits=sec, 28:8 Kbits=sec and 64 Kbits=sec. Figures 12, 13, 14, 15 and 16 illustrate
the performance of the proposed scheme in terms of PSNR versus bitrate for the coding of the �rst
�fty frames of \Miss America", \Claire", \Salesman" and \Foreman" and the �rst twenty frames of
\Tunnel". Note that the coder performance remains stable providing a high quality image without
the intervention of a new intra-coded image before the end of the whole sequence. Figures 17, 18
and 19, 20, respectively, show original and reconstructed frames of the sequences \Salesman" and
\Claire" coded at 64 Kbits=sec.
Regarding the performance of optimization, Figures 21a and 21c show the resulting quadtree
corresponding to the tenth frame of \Salesman" and the eighth frame of \Foreman". The complexity of
the quadtree for the representation of both sequences is relatively high, since \Salesman" is a sequence
with large rigid and exible motion while in the \Foreman" sequence both object motion and camera
motion exist. Also , Figures 21b and 21d present the motion estimator index map corresponding to
the segmentation maps of Figures 21a and 21c. In these �gures each object is colored depending on
the motion estimator choice. White color corresponds to no motion and predicted motion, dark gray
15
to a�ne 2D motion, light gray to block-matching and black corresponds to 3D motion. Also Table
1 shows an average percentage of the motion estimator choices for the coding of all the QCIF image
sequences.
In addition to videoconferencing schemes, the algorithm was also tested in the more compli-
cated MPEG-4 test sequence \Tunnel". The results are comparable to the results obtained using
videophone-related sequences. Figure 22 shows original and reconstructed frames of the sequence
\Tunnel" coded at 64 Kbits=sec. The segmentation map and the motion estimator index map cor-
responding to this sequence is shown in Figures 23a, 23b. The performance in terms of the PSNR
versus bitrate for the coding of the �rst twenty frames of \Tunnel" is illustrated in Fig. 16.
6 CONCLUSIONS
A rate-distortion framework was used to de�ne a very low bit rate coding scheme based on quadtree
segmentation and optimized selection of motion estimation. This technique achieved maximum re-
constructed image quality under the constraint of a target bitrate for the coding of the vector and
segmentation information. Joint optimization of quadtree object segmentation and motion estima-
tion method for each leaf of the tree, i.e. each object, was achieved subject to this target bitrate
restriction. For experimental evaluation, the proposed algorithm was combined with an appropriate
rate control strategy to optimize the coding of the motion vectors corresponding to all the frames of
a group of frames of an image sequence. Experimental results in application for the coding of typical
videophone sequences have demonstrated the performance of the proposed very low bit-rate video
coding scheme.
References
[1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bitrates - A
Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1994.
[2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc. IEEE,
vol. 73, pp. 523{548, Apr. 1985.
[3] M. I. Sezan and R. L. Lagendijk, Motion Analysis and Image Sequence Processing. Kluwer
Academic Publishers, Boston, 1993.
[4] H. G. Mussman, M. Hotter, and J. Ostermann, \Object-oriented analysis-synthesis coding of
moving images," Signal Processing : Image Communication, vol. 1, pp. 117{138, Oct. 1989.
[5] M. Hotter, \Optimization and E�ciency of an Object-Oriented Analysis-Synthesis Coder," Signal
Processing : Image Communication, vol. 4, pp. 181{194, Apr. 1994.
16
[6] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo Image Sequence
Coding Based on 3-D Motion Estimation and Compensation," Signal Processing : Image Com-
munication, vol. 7, pp. 129{145, Jan. 1995.
[7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \3-D Motion/Disparity Segmentation for
Object-Based Image Sequence Coding," Optical Engineering, special issue on Visual Communi-
cations and Image Processing, vol. 35, pp. 137{145, Jan. 1996.
[8] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo Image Se-
quences using Joint 3-D Motion/Disparity Compensation," IEEE Trans. on Circuits and Systems
for Video Technology, vol. 7, No. 2, pp. 312-328, Apr. 1997.
[9] Y. Shoham and A. Gersho, \E�cient Bit Allocation for an Arbitrary Set of Quantizers," IEEE
Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453, Sep. 1988.
[10] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion Sense," IEEE
Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993.
[11] G. J. Sullivan and R. Baker, \E�cient Quadtree Coding of Images and Video," IEEE Trans. on
Image Processing, vol. 3, pp. 327{331, May 1994.
[12] E. Reusens, \Joint Optimization of Representation Model and Frame Segmentation for Generic
Video Compression," Signal Processing, vol. 46, pp. 105{117, Sep. 1995.
[13] G. Tziritas, \Rate Distortion Theory for Image and Video Coding," in 27th Int'l Conf. on Digital
Signal Processing, (Limassol, Cyprus), Jun. 1995.
[14] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory for Very
Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommunications '96, (Istanbul,
Turkey), Apr. 1996.
[15] D. Tzovaras and M. G. Strintzis, \Motion and Disparity Estimation Using Rate Distortion
Theory for Very Low Bit Rate and Multiview Image Sequence Coding," in VCIP '97, (San Jose,
California), Feb. 1997.
[16] G. Wolberg, Digital Image Warping. IEEE Computer Society Press, Los Alamitos, California,
1988.
[17] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow Generated
by Several Moving Objects," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 7,
pp. 384{401, Jul. 1985.
[18] S. S. Sinha and B. G. Schunck, \A Two-Stage Algorithm for Discontinuity-Preserving Surface
Reconstruction," IEEE Trans. on PAMI, vol. 14, Jan. 1992.
[19] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Optimum Allo-
cation of Resources," Operation Res., vol. 11, pp. 399{417, 1963.
[20] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, \Numerical Recipes in C :
The Art of Scienti�c Computing," tech. rep., Cambridge, U.K., Cambridge Univ. Press, 1988.
[21] J. Konrad and E. Dubois, \Bayesian estimation of motion vector �elds," IEEE Trans. Pattern
Anal. and Mach. Intell., vol. 14, pp. 910{927, September 1992.
[22] S. Malassiotis and M. G. Strintzis, \Joint Motion / Disparity MAP Estimation for Stereo Image
Sequences," IEE Proceedings: Vision, Image & Signal Processing, vol. 143, pp. 101{108, Apr.
1996.
17
List of Figures
1 The �rst version (RDHC) of the proposed coding scheme (no option of error transmis-
sion). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20
2 The second version (RDHCE) of the proposed coding scheme (error transmission is an
option). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20
3 Rate distortion framework for the selection of the optimal motion estimator. : : : : : : 20
4 Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20)budget = 0.064
bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
5 Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14)budget = 0.24
bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21
6 (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c)
Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm
at 41:31 dB PSNR. (d) The corresponding prediction error image. : : : : : : : : : : : 22
7 (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed
frame 2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d)
The corresponding prediction error image. : : : : : : : : : : : : : : : : : : : : : : : : 23
8 Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded
with the frame 0 as reference using the proposed algorithm. : : : : : : : : : : : : : : 24
9 MSE versus bitrate (in bits=pixel) for the coding of the �fth frame of \Miss America". 24
10 Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the
block-matching method (BM) in terms of PSNR vs frame number for the coding of the
image sequence \Miss America" at 12 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : 25
11 Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the
block-matching method (BM) in terms of PSNR vs frame number for the coding of the
image sequence \Claire" at 10 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : : : : : 25
12 Performance of the proposed rate-distortion optimized hybrid coder with transmission
of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image
sequence \Miss America" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec : : : : 26
13 Performance of the proposed rate-distortion optimized hybrid coder with transmission
of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image
sequence \Claire" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : : 26
18
14 Performance of the proposed rate-distortion optimized hybrid coder with transmission
of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image
sequence \Salesman" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : 27
15 Performance of the proposed rate-distortion optimized hybrid coder with transmission
of the prediction error (RDHCE) for the coding of the �rst 50 frames of the image
sequence \Foreman" at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 27
16 Performance of the proposed rate-distortion optimized hybrid coder with transmission
of the prediction error (RDHCE) for the coding of the �rst 20 frames of the image
sequence \Tunnel" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 28
17 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman. : : : : : : : 29
18 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at
64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29
19 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire. : : : : : : : : : 30
20 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at
64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30
21 Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Sales-
man" and (c), (d) the eighth frame of \Foreman". White color corresponds to no mo-
tion and predicted motion, dark gray to a�ne 2D motion, light gray to block-matching
and black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : 31
22 Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10
(e) and 15 (f) of \Tunnel" coded at 64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : 32
23 (a) Segmentation map of the �fth frame of \Tunnel" interleaved with the image. (b)
The corresponding motion estimator index map. White color corresponds to no motion
and predicted motion, dark gray to a�ne 2D motion, light gray to block-matching and
black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32
19
List of Tables
1 Average percentage of selection of the candidate motion estimation methods used for
the coding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel". : : : : 28
20
... ...
GOF 2 GOF NGOF 1
Figure 1: The �rst version (RDHC) of the proposed coding scheme (no option of error transmission).
...
I−frame Predicted frames
Figure 2: The second version (RDHCE) of the proposed coding scheme (error transmission is anoption).
Block Matching Motion Estimation
Affine 2D MotionEstimation
No Motion
Previous Motion
3D MotionEstimation
Rate DistortionOptimization Framework Decision
Figure 3: Rate distortion framework for the selection of the optimal motion estimator.
21
0:02
0:04
0:06
0:08
0:1
1 2 3 4 5 6 7 8 9 10 11 12
BITRATE
[bpp]
Iteration
Claire, 28.8 Kbits/sec
Figure 4: Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20)budget = 0.064
bits/pixel.
0
0:2
0:4
0:6
0:8
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
BITRATE
[bpp]
Iteration
Tunnel, 64 Kbits/sec
Figure 5: Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14)budget = 0.24
bits/pixel.
22
(a) (b)
(e) (f)
Figure 6: (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c)Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm at 41:31 dBPSNR. (d) The corresponding prediction error image.
23
(a) (b)
(e) (f)
Figure 7: (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed frame2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d) The correspondingprediction error image.
24
Figure 8: Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded withthe frame 0 as reference using the proposed algorithm.
3
3:5
4
4:5
5
5:5
6
6:5
0:002 0:004 0:006 0:008 0:01 0:012 0:014
MSE
Bitrate
\Miss America" 33
3
3
3 33
Figure 9: MSE versus bitrate (in bits=pixel) for the coding of the �fth frame of \Miss America".
25
35
37
39
41
43
1 2 3 4 5 6 7 8 9
PSNR[dB]
Frame Number
RDHC-24RDHC-12 ++
+
+
++
+ ++ +
BM-8x8 ??
?
?
?
?
? ??
?
BM-16x16
Figure 10: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with theblock-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence\Miss America" at 12 Kbits=sec and 24 Kbits=sec.
32
34
36
38
40
42
44
46
1 2 3 4 5 6 7 8 9
PSNR[dB]
Frame Number
RDHC-24RDHC-10 +
+
+
++
++ +
+ +
BM-8x8 ??
?
??
? ??
??
BM-16x16
Figure 11: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with theblock-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence\Claire" at 10 Kbits=sec and 24 Kbits=sec.
26
35
37
39
41
43
45
1 6 11 16 21 26 31 36 41 46 51
PSNR[dB]
Frame Number
64 Kbits/sec28.8 Kbits/sec ?
?
?
?
?
????
???
???????
???????
?????????????????
????????
14.4 Kbits/sec
Figure 12: Performance of the proposed rate-distortion optimized hybrid coder with transmissionof the prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \MissAmerica" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec
32
34
36
38
40
42
44
46
1 2 3 4 5 6 7 8 9
PSNR[dB]
Frame Number
RDHC-24RDHC-10 +
+
+
++
++ +
+ +
BM-8x8 ??
?
??
? ??
??
BM-16x16
Figure 13: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Claire" at64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.
27
31
33
35
37
39
41
43
45
1 6 11 16 21 26 31 36 41 46 51
PSNR[dB]
Frame Number
64 Kbits/sec28.8 Kbits/sec ?
?
???
?
?
?
?????????????????
??????????????????????????
14.4 Kbits/sec
Figure 14: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Salesman"at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.
27
29
31
33
35
37
39
1 6 11 16 21 26 31 36 41 46 51
PSNR[dB]
Frame Number
64 Kbits/sec28.8 Kbits/sec ?
?
?
?
???
??????????
????
???
??????
?????
????
?????????
???
14.4 Kbits/sec
Figure 15: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 50 frames of the image sequence \Foreman"at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec.
28
26
28
30
32
1 6 11 16 21
PSNR[dB]
Frame Number
64 Kbits/sec28.8 Kbits/sec ?
?
?
?? ?
? ? ?? ?
? ??
?
?
? ?? ? ?
14.4 Kbits/sec
Figure 16: Performance of the proposed rate-distortion optimized hybrid coder with transmission ofthe prediction error (RDHCE) for the coding of the �rst 20 frames of the image sequence \Tunnel"at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.
Method Percentage
of Selection (%)
No motion 37.11
Predicted motion 12.86
Block-Matching 30.38
A�ne 2D 13.1
3D motion 6.55
Table 1: Average percentage of selection of the candidate motion estimation methods used for thecoding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel".
29
(a) (b) (c)
(d) (e) (f)
Figure 17: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman.
(a) (b) (c)
(d) (e) (f)
Figure 18: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at64 Kbits=sec.
30
(a) (b) (c)
(d) (e) (f)
Figure 19: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire.
(a) (b) (c)
(d) (e) (f)
Figure 20: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at64 Kbits=sec.
31
(a) (b)
(c) (d)
Figure 21: Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Salesman"and (c), (d) the eighth frame of \Foreman". White color corresponds to no motion and predictedmotion, dark gray to a�ne 2D motion, light gray to block-matching and black corresponds to 3Dmotion.
32
(a) (b) (c)
(d) (e) (f)
Figure 22: Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10 (e)and 15 (f) of \Tunnel" coded at 64 Kbits=sec.
(a) (b)
Figure 23: (a) Segmentation map of the �fth frame of \Tunnel" interleaved with the image. (b) Thecorresponding motion estimator index map. White color corresponds to no motion and predictedmotion, dark gray to a�ne 2D motion, light gray to block-matching and black corresponds to 3Dmotion.
33