r ate d istortion o ptimization u sing s sim in h .264

RATE DISTORTION OPTIMIZATION USING SSIM IN H.264

MULTIMEDIA PROCESSING

Babu Hemanth Kumar [email protected]

GuidanceDr. K.R.Rao

In the rate-distortion optimization for H.264 I-frame encoder, the distortion (D) is measured as the sum of the squared differences between the reconstructed and the original blocks, which is MSE.

Although PSNR and MSE are currently the most widely used objective metrics due to their low complexity and clear physical meaning, they were also widely criticized for not correlating well with Human Visual System (HVS) [2] for a long time.

The study from previous literature shows that structural similarity metric provides better image assessment than pixel error based metric (mean square error and peak signal-to-noise ratio).

2

Introduction

Mean Squared Error: Love It or Leave It?

3

So what is the secret of the MSE—why is it still so popular?

What is wrong with the MSE when it does not work well?

Just how wrong is the MSE in these cases?

If not the MSE, what else can be used?

What is MSE?

4

MSE is a signal fidelity measure. The goal of a signal fidelity measure is to compare two signals by providing a quantitative score that describes the degree of similarity/ fidelity or, conversely, the level of error/distortion between them.

Suppose that x = {xi |i = 1, 2, · · · , N} and y = {yi |i =1, 2, · · · , N}

are two finite-length, discrete signals , where N is the number of signal samples and xi and yi are the values of the i th samples in x and y, respectively. The MSE between the signals is

The MSE has many attractive features: It is simple. It is parameter free and inexpensive to compute, with a

complexity of only one multiply and two additions per sample. It is also memoryless—the squared error can be evaluated at each sample, independent of other samples.

It has a clear physical meaning—it is the natural way to define the energy of the error signal

The MSE is an excellent metric in the context of optimization

MSE is widely used simply because it is a convention

5

Why do we love MSE?

[FIG1] Comparison of image fidelity measures for “Einstein” image altered with different types of distortions. (a)Reference image. (b) Mean contrast stretch. (c) Luminance shift. (d) Gaussian noise contamination. (e) Impulsive noise contamination. (f) JPEGcompression. (g) Blurring. (h) Spatial scaling (zooming out). (i) Spatial shift (to the right). (j) Spatial shift (to the left). (k) Rotation(counter-clockwise). (l) Rotation (clockwise).[2] 6

What is wrong with MSE?

7

Implicit Assumptions when using MSE

Signal fidelity is independent of temporal or spatial relationships between the samples of the original signal. If the original and distorted signals are randomly re-ordered in the same way, then the MSE between them will be unchanged.

Signal fidelity is independent of any relationship between the original signal and the error signal. For a given error signal, the MSE remains unchanged, regardless of which original signal it is added to.

Signal fidelity is independent of the signs of the error signal samples.

All signal samples are equally important to signal fidelity.

8

Failures of MSE Metric

[FIG2] Failures of MSE Metric [2]

Alternative Approach

[FIG3] Examples of structural versus nonstructural distortions.[2] 9

If we view the HVS as an ideal information extractor that seeks to identify and recognize objects in the visual scene, then it must be highly sensitive to the structural distortions and automatically compensates for the nonstructural distortions. Consequently, an effective objective signal fidelity measure should simulate this functionality

Recent proposed approach for image quality assessment

Method for measuring the similarity between two images. Full reference metrics

The SSIM is designed to improve on traditional metrics like PSNR and MSE, which have proved to be inconsistent with human eye perception.

10

SSIM

Property of SSIM Value lies between [0,1]

Symmetry: S(x,y) = S(y,x)

Boundedness: S(x,y) <= 1

Unique maximum: S(x,y) = 1 if and only if x=y (in discrete representations xi = yi, for all i= 1,2…….,N ).

11

SSIM Measurement System

12

[FIG4] Block Diagram of Structural Similarity measurement system[4]

13

H.264

[FIG 5] Block Diagram of H.264 encoder

[FIG 6]. Intra 4 x 4 prediction mode directions (vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4, vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8)[5]

14

Intra-prediction H.264 is able to gain much of its efficiency by simplifying

redundant data not only across a series of frames, but also within a single frame, a technique called intraframe prediction [FIG 6].

The H.264 encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses details and gradients better than previous codecs.

H.264 I-Frame Encoder

15

The best prediction mode(s) are chosen utilizing the R-D optimization which is described as: J (s ,c,MODE | QP) = D(s , c,MODE | QP) + MODE * R(s,c ,MODE | QP)

Distortion D(s,c,MODE|QP) is measured as SSD between the original block s and the reconstructed block c, and QP is the quantization parameter, MODE is the prediction mode. R(s,c,MODE|QP) is the number of bits coding the block.

The modes(s) with the minimum J(s,c,MODE|QP) are chosen as the prediction mode(s) of the macroblock.

Proposal

16

The main idea of this project is to employ SSIM in the rate-distortion optimizations of H.264 I-frame encoder to choose the best prediction mode(s).

The required modifications will be done on the JVT reference software JM92 program.

Results in terms of total number of bits of the compressed image, SSIM of the whole reconstructed image for H.264-JM92 software and the new method will be compared.

The quality of the reconstructed picture is higher when its SSIM index is greater while the SSD performs the other way. Therefore the distortion in this method is measured as:

D (s, c, MODE|QP)== 1−SSIM(s, c) s and c are the original and reconstructed image block resp. The new-Rate Distortion can now be written as :

J (s , c,MODE|QP) = 1 - SSIM(s , c) + MODE * R( s, c,MODE |QP)

The algorithm uses SSIM index instead of SSD as the distortion measure in RDCost_for_4x4IntraBlock, RDCost_for_8x8IntraBlock and RDCost_for_macroblocks of H.264-JM92 software.

Proposal Method

18

Test Sequences

Coastguard Akiyo Bridge-close Car phone

Claire Container Grandma Miss-America

19

Simulation Results

[TABLE 1]Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=30

20

Simulation Results

[TABLE 2]Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=20

21

Simulation Results

[TABLE 3] Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=10

22

Simulation Results

Coastguard(Original ) Encoded by H.264 encoder with QP=30

Encoded by H.264 SSIM encoder with QP=30

MSSIM = 0.90197 MSSIM = 0.89390

Fig 7. The reconstructed image produced by the two methods respectively for Coastguard

23

Simulation Results

Akiyo(Original ) Encoded by H.264 SSIM encoder with QP=30

Encoded by H.264 encoder with QP=30MSSIM = 0.96416 MSSIM = 0.96067

Fig 8. The reconstructed image produced by the two methods respectively for Akiyo

24

Simulation Results

Suzie(Original ) Encoded by H.264 encoder with QP=30

Encoded by H.264 SSIM encoder with QP=30

MSSIM = 0.93649 MSSIM = 0.93370

Fig 9. The reconstructed image produced by the two methods respectively for Suzie

Simulations show that the proposed method can reduceapproximately 2~9% bit rate while maintaining almost thesame perceptual quality and costing almost the sameencoding time for QP=30, 4 ~20% bit rate reduction forQP=20, 18~35% bit rate reduction for QP=10.

25

Conclusions

References

26

[1] Zhi-Yi Mai, Chun-Ling Yang, Lai-Man Po, and Sheng-Li Xie “A New-Rate Distortion

Optimization using Structural Information in H.264 I-Frame Encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005.

[2] Z. Wang and A. C. Bovik,“Mean squared error: love it or leave it? - A new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98-117, Jan. 2009.

[3] JM Software website: http://iphome.hhi.de/suehring/tml/

[4] Z. Wang, et al., “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004 [Online] Available: www.cns.nyu.edu/~lcv/ssim/

[5] S.K. Kwon, A. Tamhankar and K.R. Rao “Overview of H.264 / MPEG-4 Part 10” J. VCIR, Vol. 17, pp. 186-216, April 2006, Special Issue on "Emerging H.264/AVC Video Coding Standard,"

[6] T. Wiegand and B. Girod, “Lagrange multiplier selection in hybrid video coder control,” in IEEE Int. Conf. on Image Processing, vol.3, pp. 542–545, 2001

http://iphome.hhi.de/suehring/tml/

[7] T. Wiegand , G. J Sullivan., G .Bjontegaard., and A Luthra., “Overview of the H.264/AVC Video coding Standard,” IEEE Trans. on CAS for Video Technology, no.7, Vol. 13, pp.560-576, July 2003.

[8] Z. Wang, A. C. Bovik and L. Lu, “Why is image quality assessment so difficult?” IEEE International Conference on Acoustics, Speech, & Signal Processing, May 2002.

[9] Z. Wang, L. Lu and A. C. Bovik “ Video quality assessment using structural distortion measurement” IEEE transactions on image processing ,vol.13, no 4, April 2004

[10] G. J. Sullivan and T. Wiegand., “Rate-distortion dptimization for video compression”, IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, Nov. 1999

[11] The SSIM Index for Image Quality Assessment http://www.ece.uwaterloo.ca/~z70wang/research/ssim/

[12] Z. Wang, and A. C. Bovik, “A universal image quality index,” IEEE Signal Processing Letters, vol. 9, no. 3, pp. 81-84, March 2002.

27

REFERENCES

http://www.ece.uwaterloo.ca/~z70wang/

http://live.ece.utexas.edu/people/bovik/

http://www.ece.uwaterloo.ca/~z70wang/publications/icassp02a.html

http://www.ece.uwaterloo.ca/~z70wang/research/ssim/

http://www.ece.uwaterloo.ca/~z70wang/

http://live.ece.utexas.edu/people/bovik/

http://www.ece.uwaterloo.ca/~z70wang/publications/uqi.html

r ate d istortion o ptimization u sing s sim in h .264

Documents

mse signal fidelity

original signal

error signal samples

signal fidelity measure

peak signal

given error signal

number of signal samples

context of optimization