[ieee 2013 national conference on communications (ncc) - new delhi, india (2013.2.15-2013.2.17)]...

Dynamic Texture Synthesis for Video Compression

Shruti Bansal, Santanu Chaudhury and Brejesh Lall

Electrical Engineering Department Indian Institute of Technology Delhi

New Delhi, India

Abstract—In this paper, a dynamic texture based compression

scheme is devised for videos. Correspondence analysis is explored

for the analysis of motion patterns in a video on the basis of optic

flow data and then clusters of different motion patterns are

created. Dynamic textures tend to disobey Horn and Schunck’s

assumption of brightness constancy and hence, optic flow

residual is used as an indicator of their presence. The

correspondence analysis results and optic flow residual are

combined together in a new segmentation scheme. The optic flow

data tracks the motion of groups of pixels to generate the flow

lines. These flow lines are used in a synthesis scheme for creating

an illusion of continuously flowing texture. The integration of this

synthesis scheme in the compression format gives us considerable

bit stream reduction corresponding to the dynamic texture

regions. The scheme is integrated with standard H.264/AVC

model, texture regions that do not fall under the above precinct

of dynamic textures and non-texture regions are encoded-

decoded by the H.264 model directly.

Keywords- Correspondence analysis; optic flow residue; flow

line; particle matrix

I. INTRODUCTION

Dynamic texture is a spatially repetitive, time-varying visual pattern that forms an image sequence. They are difficult to be analyzed as the underlying motion and pattern are complex and stochastic in nature. We propose a novel correspondence analysis scheme for segmentation of the dynamic textures in videos. We use correspondence analysis to understand the motion patterns in a video which may have single or multiple motions along with stationary objects. Optic flow gives a representation of the motion of pixels between two adjacent frames of a video, however, dynamic textures disobey Horn and Schunck’s [3] principle of brightness constancy. This is used to identify the motion pattern corresponding to the dynamic textures.

For representation of fluid flow, Schodl et al. [5] finds smooth transitions between parts of a video and concatenates subsequences at these transitions to create longer, even infinitely long videos. The flow based synthesis used here [1, 2] extends the above idea but along flow lines in fluids by describing their motion pattern and the texture evolution along the flow lines. The concept of using a smaller portion of the video to generate longer version has significance in compression. We nurture this idea to send select portions of the video corresponding to textures for only few frames and generate the texture in subsequent frames using them.

II. DYNAMIC TEXTURE SEGMENTATION

The segmentation of dynamic textures is implemented as a two step process:

1. Correspondence analysis 2. Study of Optic flow residue

A. Correspondence analysis

Correspondence analysis [4] is an exploratory data analysis technique that is used to identify systematic relations inherent between the rows and columns of tables. It takes a much larger set of data and collapses it into a compact form which better expresses these associations. The chi square statistic is calculated and this information allows us to see the correspondence of the row and column data. In our case, we implement correspondence analysis on the optic flow data obtained over a number of frames. We obtain the optic flow components along x and y direction, u and v using Horn and Schunck’s method [3], we look closely at the correspondence between the rows of the data table formed where each row represents the motion of a pixel in the video over different frames and find out the pixels that have a similar motion pattern. We follow the below algorithm: 1. The optic flow is calculated for F number of frames of a video, F is typically taken as 15. This data is representative of the motion in the video. 2. For an M X N image, F frames produce two sets of 3D data of dimensions M X N X F corresponding to the component of the optical flow in x and y direction, i.e. u and v. This is reduced to a single 3D matrix whose elements are the L-2 norm of u and v matrices:

(1) 3. The 3D matrix obtained above is converted to a two dimensional matrix of the form Z X F where Z=M X N is the number of rows and represents the pixels in a frame and the columns have the optic flow information for these pixels across frames. This is fed to the correspondence analysis to find the association between the motion of the pixels of the image.

978-1-4673-5952-8/13/$31.00 ©2013 IEEE

4. The output received from correspondence analysis is submitted to k-means clustering to identify the regions that have distinct motion patterns or are stationary.

(a) (b)

(c) (d)

Figure 1. Correspondence analysis based clustering for dynamic textures (a) and (c) give an original frame from sequences ‘waterfall’ and ‘industry smoke’, (b) and (d) gives the segmentation obtained after applying k-means clustering to the correspondence analysis results

B. Study of optic flow residue

The optic flow derived in section II.A is based on the brightness constancy assumption but this holds only for rigid objects with Lambertian surface. The fluid and gaseous substances which give rise to dynamic textures such as smoke or flowing water tend to disobey it. We exploit this to recognize our dynamic textures [6]. We obtain the optic flow residue and the regions where the residue levels are the highest are potential dynamic texture regions. However, the optic flow estimation can also be erroneous due to other reasons such as fast motion and therefore, optic flow residual alone is not a sufficient technique for detection of dynamic texture. The residual is compared against a threshold and the result is combined with the output from the correspondence analysis to identify regions that belong to a common motion cluster (obtained by correspondence analysis) and have a high optical residue.

(2)

is the original intensity and is the predicted intensity at

pixel position (x,y) at time t. The optic flow residual is calculated at regular frame intervals and is thresholded to get the final residue. On the basis of this residue, we chose that motion cluster from the correspondence analysis results for which the optic flow residue is the maximum. Fig. 2 gives the optic flow residue obtained in accordance with equation number (2). A change in the segmented regions is indicative of a rapidly changing trajectory of the dynamic texture. We obtain the optic flow residue between pairs of frames at fixed intervals, for e.g. the first and second frame and then, the eleventh and twelfth frame. If the optic flow residues obtained do not

coincide with each other, then we classify an incoming video as a random flow video. The synthesis scheme we propose works well for textures that have approximately continuous flow patterns and therefore the rapidly changing textures are not considered.

(a) (b) Figure 2. optic flow residue for ‘waterfall’ and ‘industry smoke’ sequence

(a) (b)

(c) (d)

Figure 3. The changing optical flow residue for a rapidly changing dynamic texture “fire” (effected by wind)- (a),(b),(c) and (d) give the optic flow residue based segmentation after the frame no 1,11,21 and 31. We observe that the segmented region changes at each evaluation and hence the trajectory is not

constant and the flow is random.

III. DYNAMIC TEXTURE SYNTHESIS

The flow based synthesis [1,2] considers the dynamic texture as a series of particles moving along a constant path. They can be represented by flow lines along which we consider patch of pixels as texture particles. The flow lines in [1,2] are user defined however, in our scheme we derive our flow lines as described below.

A. Flow line generation

For continuous flow patterns, we synthesize flow lines that approximate the paths along which the texture particles move. They describe the journey of a particle from its start position where the particle originates to end position where the particles cease to exist. We use the optic flow obtained to generate the flow lines, the u and v values give us the advection of particles (or pixels) over time. The basic idea is to plot this advection to obtain the flow line. The particle advection is obtained according to the equation:

(3)

(4)

Figure 5 illustrates some flow lines generated by this technique for the sequence ‘waterfall’ and ‘industry smoke’.

Figure 4. Flow lines and particle textures along the flow line (image courtesy reference [2])

All the particles along a particular flow line are assumed to follow the same trajectory. For a group of pixels, the median of their optic flow gives a better estimate of the net optic flow. We consider group of pixels and the optic flow at the centre of this group of pixels is given by the median of their optic flow values. We randomly select pixels in the segmented dynamic texture region and plot their motion over subsequent frames. The positions obtained are stored to define the flow lines.

Figure 5. Flow lines generated for the sequence ‘waterfall’ and ‘industry

smoke’

B. Flow based synthesis

Once the flow lines are generated, a texture synthesis scheme is used for textures such as waterfalls and chimney smoke that have a time-varying appearance but roughly stationary temporal dynamics. They can be represented by continuous motion of texture particles along flow lines obtained in III.A. The texture particles capture the dynamics of the flow and the texture variation at the particles

positions, along the flow line. The particle begins at and

then passes through a series of positions At a

texture particle is defined as a patch of size m x n and the next

position is obtained by using patch matching along the flow line, minimum SAD between the RGB values of the patches is the selection criterion. Figure 9 shows a flow line,

its particle positions and texture particles along the flow line extracted at the particle positions. The authors define a matrix as explained by the authors in [1, 2], figure 6 explains the synthesis along a single flow line

with 7 particle positions, to . M(d,t) =(p,f) is a matrix where p specifies a particle and f gives the frame in the input video where the particle is located at position d. The number in each cell in the image corresponds to f and the color of this

number to p. The first column, for example, shows the particles along this flow line in frame 1. The black particle, for

example, begins at position in frame 1 and dies at the last

position, in frame 7 while the green particle begins at

position in frame 2 and reaches in frame 8. If the sequence of particles is looped after the 8th frame, there will be an abrupt discontinuity which will appear at every pixel in the flow line as all the particles will change their positions together and will be a highly noticeable artifact. The modified technique described by figure 6 repeats the k diagonals of the matrix and temporal discontinuity is not observed in this technique as all the particles complete their journey from start to end. Here, there is only one discontinuity in one frame which is difficult to detect.

Figure 6. Plot of matrix M(d,t) (image courtesy reference [1] )

Given input particles from a fixed number of frames, the particles in the following frames are generated by sequential selection and moving them along the flow line. Once the particle positions are available to us, the patches of texture are drawn from the particle matrix and placed in the frames at the particle positions in accordance with the matrix M. For overlapping regions, particle texture is blended with the texture patches in the overlapping regions using a weighted means approach. Therefore, the dynamic texture regions in frames after the first F frames are reconstructed by the particle textures and particle positions derived from the first F frames and no additional information is required. Table I. grades the quality of the video sequences with dynamic texture regions synthesized by flow based synthesis. SSIM score [7] has been used as the quality measure, it gives the structural similarity of the synthesized video as compared to the original video. Since this dynamic texture scheme uses samples from the same video sequence, albeit from previous frames, it is a better choice of performance measurement tool here. We see that the visual quality of the results is good. The reconstruction uses texture particles from only 15 initial frames.

TABLE I. QUALITY ANALYSIS FOR DYNAMIC TEXTURE SYNTHESIS

Sequence Resolution SSIM score –

our scheme

SSIM score –

H.264 (QP 27)

Waterfall 352 X 288 0.9937 0.9855

Industry

Smoke 320 X 180 0.9820

0.9824

Burner 240 X 135 0.9701 0.9911

River 320 X 180 0.9792 0.9998

Tap 176 X 144 0.9388 0.9903

IV. INTEGRATION IN H.264

The dynamic texture analysis and synthesis scheme discussed in section 2 and 3 is combined with the existing H.264 encoder-decoder scheme to exploit the temporal redundancy of the dynamic textures.

A. Proposed encoding scheme

Figure 7. Proposed encoding scheme

We have already seen how the dynamic textures are

segmented out by a correspondence analysis based scheme combined with optical residue information. Once the dynamic texture regions suited for our scheme, i.e. that have a continuous flow pattern are identified and separated, we find out their flowlines and particle positions along these flow lines (section III). The texture particles along these flow lines are taken in the form of rectangular blocks centered at the particle position and these particle positions are stored. As explained in section III.B, the texture particles for some

initial frames need to be stored for regeneration of the texture. A particle matrix is created where the texture blocks along the flow line particle positions are stored for F frames. To integrate with H.264 block based encoding, the

macroblocks which are completely part of the texture particles are considered in this scheme, rest are encoded by H.264.These blocks, non-texture regions and randomly flowing dynamic textures are treated by standard H.264/AVC coding at low QP for good quality reconstruction. Thus the output bit stream consists of the encoded bits from H.264 encoder, the particle positions and the particle matrix.

B. Proposed decoding scheme

Figure 8. Proposed decoding scheme

The decoding of dynamic textures is relatively

straightforward. The input bit stream is divided into its components. The non texture and the random flow texture regions are reconstructed by the H.264/AVC decoder. For dynamic texture regions, the particle positions are identified and they are reconstructed using the known matrix M(d,t) (Figure 6). The texture blocks are taken from the particle matrix in accordance with the particle positions, both obtained

from the incoming bit stream. The texture and non-texture regions are combined in the picture buffer to form the reconstructed video sequence.

V. RESULTS

The quality of reconstruction and bit stream reduction achieved through our dynamic texture scheme is discussed by the means of the results in this section.

(a)

(b) (c)

(d) (e)

Figure 9. Industry smoke sequence: The figure shows 4 frames of the reconstructed ‘industry smoke’ sequence- (a)- original frame (b)- frame 6, (c)- frame 11, (d)-frame 16, (e)-frame 21. The black dots visible in the chimney smoke mark the flow lines along which the texture has been reconstructed, the smoke enclosed in the red boundary in (a) is reconstructed.

(a)

(b) (c)

(d)

(e) (f)

(g) (h) Figure 10. Tap sequence: The entire dynamic texture architecture is traversed for the sequence (a) gives the original frame of ‘tap’ sequence, (b)- correspondence analysis motion clusters , (c)- the optical residue which leads to the isolation of ‘blue’ cluster in (b) as dynamic texture , (d) gives the derived particle matrix and (e)-(h) are frames 10, 20, 30 and 40 with derived texture

(a)

(b) (c)

(d) (e) Figure 11. Fireplace sequence: (a) original frame from ‘fireplace’ sequence, (b)-(e) frames 35,45,55 and 65 of the ‘fireplace’ sequence reconstructed by our scheme , the marked red areas are the erroneous reconstruction in these frames due to the random nature of the sequence

Table II. gives us the compression results for the synthesized dynamic texture regions in the videos. The original bit stream size generated by H.264 is compared with the size of the particle matrix (nop x nol x blksize x 3) and the overhead of particle position information (nop x nol x 2) bytes where nop - number of particle positions along the flow line, nol - number of flow lines and blksize is the size of the texture particles that are stored in the particle matrix and are used for reconstruction. The size of our bit stream remains constant and hence, as the number of frames generated increases, the compression achieved increases. Figure 11 gives the case of failure of our scheme as discussed in section II.B. The dynamic texture motion is random in nature and therefore error is introduced during reconstruction. It can be encoded using H.264. The comparative bit stream

sizes for both techniques can be seen in the Table II. Only the texture regions reconstructed by proposed scheme have been used for calculation of the both the bit streams. For H.264, QP 27 has been used. The results are for 20 frames, however, the compression increases with the increase in number of frames as the proposed scheme’s bit stream size remains constant.

TABLE II. COMPRESSION RESULTS

VI. CONCLUSION

We have successfully developed a scheme for the

identification and segmentation of dynamic textures present in a video. The basic ingredient in the entire scheme are the optic flow properties of the dynamic textures, which have been deftly used in a novel combination of correspondence analysis and optical flow residue based analysis. The flow based synthesis technique is less computationally complex as compared to the other dynamic texture synthesis schemes being practiced and we have enhanced this existing scheme with the derivation of flow lines, again by the use of optic flow information. The scheme is successfully integrated with standard encoder-decoder models for video compression.

REFERENCES

[1] Jiayu Chen, Jinguo Liu, Jianzhi Li, Dezhu Kong and Da Yu , “A Video

Synthesis Method For Flow Patterns”, Proc. of 2nd IEEE International Conference on Network Infrastructure and Digital Content 2010, Page(s): 303 – 307.

[2] K. S. Bhat, S. M. Seitz, etc..“Flow-based Video Synthesis and Editing” ACM Transactions on Graphics Vol. 23, No. 3, pp. 360-363,2004.

[3] B.K.P. Horn and B.G. Schunck, “Determining Optical Flow”, Artificial Intelligence, 16:185–203, Aug. 1981.

[4] http://www.statsoft.com/textbook/correspondence-analysis/.

[5] Arno Schodl, Richard Szeliski, David H. Salesin and Irfan Essa, “Video Textures”, In Computer Graphics (SIGGRAPH'2000 Proceedings), pages 489-498, New Orleans, July 2000. ACM SIGGRAPH.

[6] Sándor Fazekas, Tomer Amiaz, Dmitry Chetverikov and Nahum Kiryati, “Dynamic Texture Detection Based on Motion Analysis”, Int. J. Comput. Vis. 82(1), 48–63 (2009).

[7] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004

[ieee 2013 national conference on communications (ncc) - new delhi, india (2013.2.15-2013.2.17)]...

Documents