coding technologies for video

7/27/2019 Coding Technologies for Video

1/27

Coding Technologies

for Video:acquisition, compression and

display.Dunai Fuentes

Maria Jos Giner

Pablo Lanaspa

ZhihengXu
http://www.google.dk/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=WlWAUuyuT5J4OM&tbnid=vvKV0eBkYmCoJM:&ved=0CAUQjRw&url=http://wayswelearnaboutfilms.wordpress.com/2012/03/05/pre-production-crew-schedule-budget-locations/&ei=e4j3UdLGFYKyOZuhgKAL&psig=AFQjCNGYVTAMMAp0LuQrVQhOGQTMdSUW9g&ust=1375263208994


2/27


3/27

3

1 CAPTURE AND EDITINGWe were told about our first group work which mainly consisted in recording 3 videos. They were meant

to highlight three different aspects; brightness, darkness and motion. Regarding the bright video, we

decided to capture it on the outside in the sunlight. Afterwards one of us captured a high-motion videoriding a bike around DTUs sight.

Concerning the dark one, as we had to include contrast with brightness in it, we decided to use the lab

room with the lights off and few rays of sunshine enclosed by the dark curtains.

Finished this part, we were introduced to Avidemux, a free video editor designed for simple cutting,

filtering and encoding tasks. After loading the clips, which had 1440 x 1080 resolution and 50 frames per

second, we started editing them, applying filters and encoding them. We had to notice that for coding

.yuv videos with Avidemux it was necessary to apply a swap in UV colours as a filter. The next step was

to compare our expectations about the file sizes we had just created. It was shocking how heavy .yuv

clips (with YUV codec) were compared to the ones encoded with H.264 Mpeg4 AVC (x264) codec. We

also needed to take into account which video players we were supposed to use for each coding type.

.yuv files describes the colour of the pixel with YUV colour space format (y for the luminance and v and u

for the chrominance). Although it contains the appropriate values for the amount of needed pixels it has

no clue of the resolution and fps, just what pixel is first and what is next. Thats why we had to specify

the resolution and fps of the video when trying to play it.

For instance, if we had indicated a resolution of 1440 x 540 we will be showed the tops half of the

image first and in the next frame the bottoms half.

We were also introduced to another cross-platform solution to convert and stream audio and video that

had to be used with the windows command line, FFMPEG. We mainly used it to convert a video file to

h.264 video or any other coding type as .yuvchange its frames number and its frames size as well as the

bitrate. Extended use of it will be discussed later in Annex 1.

Finally using the given function on Matlab called extractyuv420.m we could extract y, u and v

components of one image of the clip as big arrays which represented every pixel in the frame. Position is

indicated by the matrix indexes as well as colour properties by the y, u, v. There is a luminance value for

every pixel but only half of the

total amount has u value, and it

happens the same with the v

value.

To display the luminance part ofthe frame (using imshow();) we

are only interested in the y.

Therefore the command to be

used is: imshow(y,[0,255]). As we

are only representing the

luminance the image is grayscaled.


4/27

4

For calculating the mean luminance value we just used the Matlab function mean , first for calculating

the mean of every column

and afterwards for the meanthe resulting row.Mean_y =

mean(mean(y)); What rests is

a value with the mean

luminance of the frame. For

the upper frame Mean_y

=62.6867Knowing about this

feature tell us about how

bright or dark is the image in

a scale from 0 to 255.

Displaying the image incolour with imshow(); is

easier to do if we have the RGB colour space representation of it. The other given function yuv2rgb.m

simply converts our selected frame to RGB.

RGB = yuv2rgb(y,u,v);

imshow(RGB);

2 VIDEO COMPRESSIONFirstly we are going to describe the processing of digital video from capture to display and then

explain and illustrate H.264 coding.


5/27

5

Compression takes place in the camera while recording the video but further processes can be execute

on a computer.

It is essential to understand the changes the video suffers while we compress it so we will be able to

minimize the losses we are taking during the process.During the video compression process, a video

stream is analyzed and unnecessary parts of the data are discarded in order to make a large video

file smaller in size.There are essentially two ways to compress data in a video file: intraframe and

interframe.

Intraframe (I-frame) compression compresses each individual frame of the video (similar to JPEG

compression of a still image). Every frame of the image is considered as a still image. With

intraframe compression, the complete frame is only slightly compressed, so the file size isnt

that much smaller because each individual frame is included in the newly-compressed version.

Interframe compression takes a look at each frame in a video file, compares it to the previous

frame and stores only the changed data frame from frame so the file size is much smaller than

intraframe compression.

In the image below we have a diagram that summarize the changes that takes every frame during the

video compression:


6/27

6

The original image is divided into a set of square blocks, usually 8x8 pixels. The image data are

transformed using the DCT to a new set of coefficients. The transform coefficients are quantizedusing a

simple multiply-round-divide operation .

The quantized coefficients are zeroed by this operation, making the image well-suited for efficientlossless compression applied before to storage or transmission. To reconstruct the image, the quantized

coefficients are converted by the inverse DCT, creating a new image that approximates the original. The

error is the difference between the original and the reconstruction and it consists of mainly high

frequency texture.(References:https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-

multiresolution-image-representations/)

Seeing the effects of these parameters in your own video is always instructive so we created some clips

where we could see the differences (the how in Annex 1).

We adjusted the settings of the encoding in different ways which have produced diverse outputs:

Producing streams with constant bitrate:

High level bitrate low compression

Low level bitrate high compression

Producing streams with variable bitrate:

To produce streams with variable bitrate through ffmpeg you need to set a constant q-

quantization factor. Choosing a q-factor with the value of 5 we have created a low-compressed

video which had good quality. Then we changed the value to 40 in order to create a high-

compressed video and easily notice a huge difference in the video-quality.

In conclusion, we found out that the lower we chose bit rate, the lower quality the video had.

After trying several bitrate levels we decided that for a Full HD video resolution (1920x1080) the

minimum bit rate that produced an acceptable image quality was 0.6 Mb/s.

3 VIDEO ANALYSISIn this part of the work we had to change the tool programme. We used ElecardStreamEye to open

and analyse coded H.264 streams. The programme shows useful information for analysing the videoas GOP structure, picture size, Motion Vectors (MV)...

- The frames that normally consumes more space are the original ones and the frames that

normally consumes less space are the bi-directionally predicted ones.

- The motion vectors predict in which way the motion is going to go. They go pixel by pixel

looking around their surrounding pixels with the aim of finding the pixel that fits better with

themselves. In this way we can predict the next frame without losing quality in the video.
https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/


7/27

7

- The 16x16 squares prediction is normally used in blocks of the frame in the edges or where

there are lot of details in order to maintain as much quality as possible in the picture. Whereas the

4x4 prediction is usually used in parts of the frame where it is not expected to have a big change or

where the pixels in the zone are very similar.

Later we wrote a script in Matlab to calculate the average PSNR of the luminance component of our

videos (Annex 2). On the one hand we observed that PSNR is not always a good measure of the

quality because the PSNR value didnt change too much when we were comparing constant bitrate

on low and high compressed videos, whereas the quality difference was very obvious. In the other

hand PSNR was a good quality measure when we had to compare variable bitrate compressions.

(See the following table)

This is probably caused by the optimization algorithms of h.264. Whenever you indicated a low-

constant bitrate it tries to do the best compression and that implies a high PSNR. However, when

dealing with an imposed high q it can that much to solve the mess.

Bit-rate Q-quantizationcoefficient Average PSNR

Constant low compression (5Mb/s) Automatically set by FFMPEG 28,1844

Constant high compression (0,6Mb/s) Automatically set by FFMPEG 27,5440

Variable highcompression 40 42,3248

Variable lowcompression 5 61,0033

4 ENTROPY CODINGExercise 1:


8/27

8

Exercise 2:


9/27

9

4.1 HOMEMADE ENTROPY ENCODERAfter some rest we already had our minds structured and soon we achieved the working (just working)script that goes from the 1080x1920 matrix, 1 byte per pixel, to a huge string of binary states (0,1) which

codified the information stored in the previous matrix using the huffman coding.

function[SYM,PROB,sig_a,DICT,sig_encoded]=encoder(filename,width,height,framenumber)

COUNT = zeros(256,1);

[y,u,v]=extractyuv420(filename,width,height,framenumber);

sig_a=[1:2073600];

fori=1:1:1080

for j=1:1:1920

number= y(i,j);

sig_a((i-1)*1920+j)= number;

COUNT(number)=COUNT(number)+1;

end

end

PROB=COUNT/(1080*1920);

SYM=[0:1:255];

DICT = huffmandict(SYM,PROB);

sig_encoded=huffmanenco(sig_a,DIC);

sig_encoded had for the fifth frame of our video (framenumber=50) a total of 13428574 bits, way less

than the original one which needed 1920*1080*8=16588800 bits to represent the luminance of the

same frame.

Getting the original image from the coded one is just as simple as type DECO =

huffmandeco(sig_encoded, DICT) where DECO would be the original image again (because there is no

loss).

To calculate the entropy and the average number of bits used per symbol we implemented some new

scripts (entropy.m and average.m) that worked with the information we already had.


10/27

10

The results for this frame were: Entropy = 6.14083 ; Average = 6.4083

With an efficiency of 95.97% we can undoubtedly say that we have a really good compression. But we

had been missing something all the way round. The fact that we used a make-it-yourself dictionary

which fitted perfectly the frame we were working on.

This process makes no sense for a versatile encoder.Another dictionary with the probabilities of eachrepresentable luminance (or color), extracted from the experience with different kinds of video

environments, wont perform as well as ours did with our working frame but will get better results with

a wider range of videos and as it can be defined in the codec itself it wouldnt be necessary to send this

information to the decoder.

H.264 uses one of a few tables (dictionaries) depending on the properties of the video. That way it keeps

its flexibility while it improves its accuracy.

Benchmarking vs h.264 wasnt even in our expectations after seen that it took almost 20 mins to do the

Huffman encoding of a single frame.

5 TRANSFORM AND QUANTIZATIONTo get started with this part we began by coding the frame with a constant Q for every coefficient of the

DCT:

y_0 = dct2(y)

y_q40 = round(y_0/40);

y_q5 = round(y_0/5);

y_display40=idct2(y_q40*40);

imshow(y_display40,[0,255]);

figure;y_display5=idct2(y_q5*5);

imshow(y_display5,[0,255]);

Where y is the luminance of the second frame of video3.yuv

The results were as expected:

- High quantization level


11/27

11

- Low quantization level

We may optimize the algorithm by doing a scaled quantization being the high frequencies (responsible

of the details and not easily appreciate by the human eye) eliminated or highly quantitates while the low

frequencies are carefully quantitates so to not have losses on it.

The pattern we followed can be seen below this sentence:


12/27

12

Scripts used: quantization.m and reverse_quantization.m (check Annex 3 Q)

The resulting image:


13/27

13

6 CONTROLLING THE BACKLIGHT

For all the testing in this part we will be also using the second frame of the video:

[y,u,v]=extractyuv420('video3.yuv',1920,1080,2);


imshow(RGB);

Original

When displaying an image on a screen we have to set some backlight value. By decreasing this value we

can save energy and get better blacks (because a highly backlighted black looks like a grey). On the


14/27

14

other hand if we just decrease too much the backlight we will get a darker image and this is not what we

want.

Our goal is to reduce the backlight, especially in those dark areas where it also improves the image

quality, while maintaining the display as the original image. We will begin by calculating a single

backlight value for the whole picture.

This can be done in four ways:

1 Maximum LED values:

In RGB colour space three values determinate the colour of each pixel. These values are scaled from 0 to

1 but are rarely push to their limits. In our case the maximum value of one in the second frame is on the

blue for 0.5847

[y,u,v]=extractyuv420('video3.yuv',1920,1080,2);


red=RGB(:,:,1);

green=RGB(:,:,2);

blue=RGB(:,:,3);

R=max(max(red));

G=max(max(green));

B=max(max(blue));

We can take advantage of these limit to increase the colourage, decrease the backlight and keep

having the same final result.

Summing up, the new image for a backlight of 58.47% would be...

RGB2 = RGB/0.5847;


15/27

15

2 Maximum Luminance of the image:

For this we will introduce a new color space more suitable for this task, YCbCr. It is quite similar to YUV

because it has a luminance (Y) and two chrominance but they are all scaled from 0 to 1 instead of 0 to

255.

Again we can push the max number to 1 and divide every other one for the same factor so the backlight

will this very same factor, less than 100%

YCBCRMAP = rgb2ycbcr(RGB);

Y = ycb1(:,:,1);

Ymax = max(max(Y));

Ynew=Y/Ymax;

YCBCRMAP2=Ynew;

YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2);


RGB3 = ycbcr2rgb(YCBCRMAP2);

outframe2 = saveSIM2frame1Value(255*RGB3, BackLight2, 'testing2');

The resulting value for the backlight is 0.8273


16/27

16

3 Average Luminance of the image:

Now we are taking the average luminance to 1. This will make some values (every one bigger than the

average) bigger than 1 which is not possible because the LCD cant generate more light.

We round every value higher than 1 to 1 and lose some information.

Ymean=Y/mean(mean(Y))Ymean(Ymean>1)=1;

BackLight3=mean(mean(Y))

YCBCRMAP3=Ymean;




BackLight3 = 0.3759


17/27

17

4Square-root of the average luminance

Repeat the steps in 3 but with the square-root:

Yroot=Y/sqrt(mean(mean(Y)));

Yroot(Yroot>1)=1;

BackLight4=sqrt(mean(mean(Y)));

YCBCRMAP4=Yroot;YCBCRMAP4(:,:,2)=YCBCRMAP(:,:,2);




BackLight4 = 0.6131


18/27

18

The square-root has better quality because there we less higher-than-1 numbers after dividing.

The backlight for the last 2 are, for the average and the square-root respectively


19/27

19

Combining the pictures with their backlights:


20/27

20

To actually see the differences between the last two images and the original one we can subtract them

to the original, square the difference and display them as gray scale pictures.

MSE1 = (RGB RGB4*BackLight3).^2;

Grey1 = (MSE1,[],3);

max(max(Grey1)) 0.0482

imshow(Grey1, [0 , 0.0482]);


21/27

21

MSE2 = (RGB RGB5*BackLight4).^2;

Grey2 = (MSE2,[],3);

max(max(Grey2)) 0.0817

imshow(Grey2, [0 , 0.0817]);

The higher Grey2 value from the square-root of the average vs. Grey1 from the average indicates that

the maximum difference with the original is lower for Grey2.

Anyway, we can clearly see that the second image is overall darker than the first one which also means

that the differences with the original are lower. (0 difference 0 Black).

7 PROFESSIONAL STUFF (ALL-NIGHT LAUNCHING)Once we had learnt the whole video processing chain: Acquisition, Compression and backlight dimming,

we had to prepare our videos for display. We used some functions made at DTU to compare some of the

outputs created through different algorithms. We could use two different structures for the modeled

backlight, diverse algorithms such as full blacklight, maximum luminance value, average luminance value,square-root of the average luminance value and the homemade algo.

The DTU algorithms in Matlab could only work with uncompressed avi format. Thus, we had to change

the format of the videos and we created two different ones. The first one was just an uncompressed avi

version from the original one and the second one was a little trickier. We compressed the original one


22/27

22

with a constant bitrate of 0.6Mb and then we transformed it to an uncompressed avi version in order to

be able to work with it in MatLab. In this way we had two videos in the same format, the first one with

good quality and the second one with a worse one.

Some post-process calculation were required:

Avg, 8 rows 2 columns backlight, previously high compressed video PSNR = 9.626

Avg, 2202 LEDs, previously high compressed video PSNR = 10.2525

Bbgd, 8 rows 2 columns backlight, previously high compressed video PSNR = 8.522

We can notice some improvement in that very low quality video when using precise backlight dimming.


23/27

23

ANNEX 1FFMPEGFor the sake of simplicity we created several videos equals to the third original video (the one with more

contrasts) but with some special differences in format and compression:

1 Original, reduced to 10 seconds, 250 frames, FullHD, uncompressed .avi

ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodecrawvideo -s 1920x1080

uncompressed.avi

2 Original, reduced to 10 seconds, 250 frames, Full HD, (uncompressed) .yuv

ffmpeg -i uncompressed.avi video3.yuv

3 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, low bitrate.mp4

ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080

b 0.6M

lowbitrate.mp4

4 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, high bitrate .mp4

ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 b 5M

highbitrate.mp4

5 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, low

q .mp4

ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmax 5

lowq.mp4

6 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, high

q .mp4

ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmin 40

highq.mp4

7 Original, reduced to 10 seconds, 250 frames, Full HD, from previously high compressed video,

uncompressed .avi

ffmpeg -ilowbitrate.mp4 -vcodecrawvideo uncompressedfromcompressed.avi

NOTE: many more videos were created but those are the ones remaining and used in this report.


24/27

24

ANNEX 2PSNRfunction [PSNR]=PSNR(filename,width,height,fi lename_o)

blbl=[1:1:250];

framenumber = 1;

whileframenumber< 251


[y_o,u_o,v_o]=extractyuv420(filename_o,width,height,framenumber);

lum2 = (y - y_o).^2;

count = 1080*1920;

sumX = sum((sum(lum2))');

mse = sumX/count;

blbl(framenumber)=20*log10((255)/(sqrt(mse)));

framenumber = framenumber + 1;

end

PSNR = mean(blbl);

ANNEX 3ENTROPYfunction [average] = average (filename,width,height,framenumber,DICT,PROB)

average=0;

n_boits=zeros(256,1);

fori=1:1:256

number= size(DICT{i,2});

n_bits = number(2);

average = average + PROB(i)*n_bits;

end


25/27

25

function [SYM, PROB, DICT, result]=entropy(f ilename,width,height,f ramenumber)

COUNT = zeros(256,1);


fori=1:1:1080

for j=1:1:1920

number= y(i,j);

COUNT(number)=COUNT(number)+1;

end

end

PROB=COUNT/(1080*1920);

SYM=[0:1:255];

DICT = huffmandict(SYM,PROB);

result = 0;

fori=1:1:255

if (PROB(i)~= 0)

result = result -PROB(i)*log2(PROB(i));

end

end

ANNEX 4Qfunction [Matrix]=quantization(fi lename,width,height,framenumber)


y_0 = dct2(y);

Matrix = zeros(height,width);

fori=1:1:height/4


26/27

26

for j=1:1:width/4

Matrix(i,j)=round(y_0(i,j)/5);

end

end

fori=height/4:1:height/2

for j=width/4:1:width/2


end

end

fori=height/2:1:height*3/4

for j=width/2:1:width*3/4


end

end

fori=height*3/4:1:height

for j=width*3/4:1:width

Matrix(i,j)=0;

end

end

function [Matrix_R]=reverse_quantization(Matrix,width,height)

Matrix_R = zeros(height,width);

fori=1:1:height/4

for j=1:1:width/4

Matrix_R(i,j)=Matrix(i,j)*5;

end

end


27/27

fori=height/4:1:height/2

for j=width/4:1:width/2


end

end

fori=height/2:1:height*3/4

for j=width/2:1:width*3/4


end

end

Matrix_R = idct2(Matrix_R);

coding technologies for video

Documents