coding technologies for video
TRANSCRIPT
-
7/27/2019 Coding Technologies for Video
1/27
Coding Technologies
for Video:acquisition, compression and
display.Dunai Fuentes
Maria Jos Giner
Pablo Lanaspa
ZhihengXu
http://www.google.dk/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=WlWAUuyuT5J4OM&tbnid=vvKV0eBkYmCoJM:&ved=0CAUQjRw&url=http://wayswelearnaboutfilms.wordpress.com/2012/03/05/pre-production-crew-schedule-budget-locations/&ei=e4j3UdLGFYKyOZuhgKAL&psig=AFQjCNGYVTAMMAp0LuQrVQhOGQTMdSUW9g&ust=1375263208994 -
7/27/2019 Coding Technologies for Video
2/27
-
7/27/2019 Coding Technologies for Video
3/27
3
1 CAPTURE AND EDITINGWe were told about our first group work which mainly consisted in recording 3 videos. They were meant
to highlight three different aspects; brightness, darkness and motion. Regarding the bright video, we
decided to capture it on the outside in the sunlight. Afterwards one of us captured a high-motion videoriding a bike around DTUs sight.
Concerning the dark one, as we had to include contrast with brightness in it, we decided to use the lab
room with the lights off and few rays of sunshine enclosed by the dark curtains.
Finished this part, we were introduced to Avidemux, a free video editor designed for simple cutting,
filtering and encoding tasks. After loading the clips, which had 1440 x 1080 resolution and 50 frames per
second, we started editing them, applying filters and encoding them. We had to notice that for coding
.yuv videos with Avidemux it was necessary to apply a swap in UV colours as a filter. The next step was
to compare our expectations about the file sizes we had just created. It was shocking how heavy .yuv
clips (with YUV codec) were compared to the ones encoded with H.264 Mpeg4 AVC (x264) codec. We
also needed to take into account which video players we were supposed to use for each coding type.
.yuv files describes the colour of the pixel with YUV colour space format (y for the luminance and v and u
for the chrominance). Although it contains the appropriate values for the amount of needed pixels it has
no clue of the resolution and fps, just what pixel is first and what is next. Thats why we had to specify
the resolution and fps of the video when trying to play it.
For instance, if we had indicated a resolution of 1440 x 540 we will be showed the tops half of the
image first and in the next frame the bottoms half.
We were also introduced to another cross-platform solution to convert and stream audio and video that
had to be used with the windows command line, FFMPEG. We mainly used it to convert a video file to
h.264 video or any other coding type as .yuvchange its frames number and its frames size as well as the
bitrate. Extended use of it will be discussed later in Annex 1.
Finally using the given function on Matlab called extractyuv420.m we could extract y, u and v
components of one image of the clip as big arrays which represented every pixel in the frame. Position is
indicated by the matrix indexes as well as colour properties by the y, u, v. There is a luminance value for
every pixel but only half of the
total amount has u value, and it
happens the same with the v
value.
To display the luminance part ofthe frame (using imshow();) we
are only interested in the y.
Therefore the command to be
used is: imshow(y,[0,255]). As we
are only representing the
luminance the image is grayscaled.
-
7/27/2019 Coding Technologies for Video
4/27
4
For calculating the mean luminance value we just used the Matlab function mean , first for calculating
the mean of every column
and afterwards for the meanthe resulting row.Mean_y =
mean(mean(y)); What rests is
a value with the mean
luminance of the frame. For
the upper frame Mean_y
=62.6867Knowing about this
feature tell us about how
bright or dark is the image in
a scale from 0 to 255.
Displaying the image incolour with imshow(); is
easier to do if we have the RGB colour space representation of it. The other given function yuv2rgb.m
simply converts our selected frame to RGB.
RGB = yuv2rgb(y,u,v);
imshow(RGB);
2 VIDEO COMPRESSIONFirstly we are going to describe the processing of digital video from capture to display and then
explain and illustrate H.264 coding.
-
7/27/2019 Coding Technologies for Video
5/27
5
Compression takes place in the camera while recording the video but further processes can be execute
on a computer.
It is essential to understand the changes the video suffers while we compress it so we will be able to
minimize the losses we are taking during the process.During the video compression process, a video
stream is analyzed and unnecessary parts of the data are discarded in order to make a large video
file smaller in size.There are essentially two ways to compress data in a video file: intraframe and
interframe.
Intraframe (I-frame) compression compresses each individual frame of the video (similar to JPEG
compression of a still image). Every frame of the image is considered as a still image. With
intraframe compression, the complete frame is only slightly compressed, so the file size isnt
that much smaller because each individual frame is included in the newly-compressed version.
Interframe compression takes a look at each frame in a video file, compares it to the previous
frame and stores only the changed data frame from frame so the file size is much smaller than
intraframe compression.
In the image below we have a diagram that summarize the changes that takes every frame during the
video compression:
-
7/27/2019 Coding Technologies for Video
6/27
6
The original image is divided into a set of square blocks, usually 8x8 pixels. The image data are
transformed using the DCT to a new set of coefficients. The transform coefficients are quantizedusing a
simple multiply-round-divide operation .
The quantized coefficients are zeroed by this operation, making the image well-suited for efficientlossless compression applied before to storage or transmission. To reconstruct the image, the quantized
coefficients are converted by the inverse DCT, creating a new image that approximates the original. The
error is the difference between the original and the reconstruction and it consists of mainly high
frequency texture.(References:https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-
multiresolution-image-representations/)
Seeing the effects of these parameters in your own video is always instructive so we created some clips
where we could see the differences (the how in Annex 1).
We adjusted the settings of the encoding in different ways which have produced diverse outputs:
Producing streams with constant bitrate:
High level bitrate low compression
Low level bitrate high compression
Producing streams with variable bitrate:
To produce streams with variable bitrate through ffmpeg you need to set a constant q-
quantization factor. Choosing a q-factor with the value of 5 we have created a low-compressed
video which had good quality. Then we changed the value to 40 in order to create a high-
compressed video and easily notice a huge difference in the video-quality.
In conclusion, we found out that the lower we chose bit rate, the lower quality the video had.
After trying several bitrate levels we decided that for a Full HD video resolution (1920x1080) the
minimum bit rate that produced an acceptable image quality was 0.6 Mb/s.
3 VIDEO ANALYSISIn this part of the work we had to change the tool programme. We used ElecardStreamEye to open
and analyse coded H.264 streams. The programme shows useful information for analysing the videoas GOP structure, picture size, Motion Vectors (MV)...
- The frames that normally consumes more space are the original ones and the frames that
normally consumes less space are the bi-directionally predicted ones.
- The motion vectors predict in which way the motion is going to go. They go pixel by pixel
looking around their surrounding pixels with the aim of finding the pixel that fits better with
themselves. In this way we can predict the next frame without losing quality in the video.
https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/ -
7/27/2019 Coding Technologies for Video
7/27
7
- The 16x16 squares prediction is normally used in blocks of the frame in the edges or where
there are lot of details in order to maintain as much quality as possible in the picture. Whereas the
4x4 prediction is usually used in parts of the frame where it is not expected to have a big change or
where the pixels in the zone are very similar.
Later we wrote a script in Matlab to calculate the average PSNR of the luminance component of our
videos (Annex 2). On the one hand we observed that PSNR is not always a good measure of the
quality because the PSNR value didnt change too much when we were comparing constant bitrate
on low and high compressed videos, whereas the quality difference was very obvious. In the other
hand PSNR was a good quality measure when we had to compare variable bitrate compressions.
(See the following table)
This is probably caused by the optimization algorithms of h.264. Whenever you indicated a low-
constant bitrate it tries to do the best compression and that implies a high PSNR. However, when
dealing with an imposed high q it can that much to solve the mess.
Bit-rate Q-quantizationcoefficient Average PSNR
Constant low compression (5Mb/s) Automatically set by FFMPEG 28,1844
Constant high compression (0,6Mb/s) Automatically set by FFMPEG 27,5440
Variable highcompression 40 42,3248
Variable lowcompression 5 61,0033
4 ENTROPY CODINGExercise 1:
-
7/27/2019 Coding Technologies for Video
8/27
8
Exercise 2:
-
7/27/2019 Coding Technologies for Video
9/27
9
4.1 HOMEMADE ENTROPY ENCODERAfter some rest we already had our minds structured and soon we achieved the working (just working)script that goes from the 1080x1920 matrix, 1 byte per pixel, to a huge string of binary states (0,1) which
codified the information stored in the previous matrix using the huffman coding.
function[SYM,PROB,sig_a,DICT,sig_encoded]=encoder(filename,width,height,framenumber)
COUNT = zeros(256,1);
[y,u,v]=extractyuv420(filename,width,height,framenumber);
sig_a=[1:2073600];
fori=1:1:1080
for j=1:1:1920
number= y(i,j);
sig_a((i-1)*1920+j)= number;
COUNT(number)=COUNT(number)+1;
end
end
PROB=COUNT/(1080*1920);
SYM=[0:1:255];
DICT = huffmandict(SYM,PROB);
sig_encoded=huffmanenco(sig_a,DIC);
sig_encoded had for the fifth frame of our video (framenumber=50) a total of 13428574 bits, way less
than the original one which needed 1920*1080*8=16588800 bits to represent the luminance of the
same frame.
Getting the original image from the coded one is just as simple as type DECO =
huffmandeco(sig_encoded, DICT) where DECO would be the original image again (because there is no
loss).
To calculate the entropy and the average number of bits used per symbol we implemented some new
scripts (entropy.m and average.m) that worked with the information we already had.
-
7/27/2019 Coding Technologies for Video
10/27
10
The results for this frame were: Entropy = 6.14083 ; Average = 6.4083
With an efficiency of 95.97% we can undoubtedly say that we have a really good compression. But we
had been missing something all the way round. The fact that we used a make-it-yourself dictionary
which fitted perfectly the frame we were working on.
This process makes no sense for a versatile encoder.Another dictionary with the probabilities of eachrepresentable luminance (or color), extracted from the experience with different kinds of video
environments, wont perform as well as ours did with our working frame but will get better results with
a wider range of videos and as it can be defined in the codec itself it wouldnt be necessary to send this
information to the decoder.
H.264 uses one of a few tables (dictionaries) depending on the properties of the video. That way it keeps
its flexibility while it improves its accuracy.
Benchmarking vs h.264 wasnt even in our expectations after seen that it took almost 20 mins to do the
Huffman encoding of a single frame.
5 TRANSFORM AND QUANTIZATIONTo get started with this part we began by coding the frame with a constant Q for every coefficient of the
DCT:
y_0 = dct2(y)
y_q40 = round(y_0/40);
y_q5 = round(y_0/5);
y_display40=idct2(y_q40*40);
imshow(y_display40,[0,255]);
figure;y_display5=idct2(y_q5*5);
imshow(y_display5,[0,255]);
Where y is the luminance of the second frame of video3.yuv
The results were as expected:
- High quantization level
-
7/27/2019 Coding Technologies for Video
11/27
11
- Low quantization level
We may optimize the algorithm by doing a scaled quantization being the high frequencies (responsible
of the details and not easily appreciate by the human eye) eliminated or highly quantitates while the low
frequencies are carefully quantitates so to not have losses on it.
The pattern we followed can be seen below this sentence:
-
7/27/2019 Coding Technologies for Video
12/27
12
Scripts used: quantization.m and reverse_quantization.m (check Annex 3 Q)
The resulting image:
-
7/27/2019 Coding Technologies for Video
13/27
13
6 CONTROLLING THE BACKLIGHT
For all the testing in this part we will be also using the second frame of the video:
[y,u,v]=extractyuv420('video3.yuv',1920,1080,2);
RGB = yuv2rgb(y,u,v);
imshow(RGB);
Original
When displaying an image on a screen we have to set some backlight value. By decreasing this value we
can save energy and get better blacks (because a highly backlighted black looks like a grey). On the
-
7/27/2019 Coding Technologies for Video
14/27
14
other hand if we just decrease too much the backlight we will get a darker image and this is not what we
want.
Our goal is to reduce the backlight, especially in those dark areas where it also improves the image
quality, while maintaining the display as the original image. We will begin by calculating a single
backlight value for the whole picture.
This can be done in four ways:
1 Maximum LED values:
In RGB colour space three values determinate the colour of each pixel. These values are scaled from 0 to
1 but are rarely push to their limits. In our case the maximum value of one in the second frame is on the
blue for 0.5847
[y,u,v]=extractyuv420('video3.yuv',1920,1080,2);
RGB = yuv2rgb(y,u,v);
red=RGB(:,:,1);
green=RGB(:,:,2);
blue=RGB(:,:,3);
R=max(max(red));
G=max(max(green));
B=max(max(blue));
We can take advantage of these limit to increase the colourage, decrease the backlight and keep
having the same final result.
Summing up, the new image for a backlight of 58.47% would be...
RGB2 = RGB/0.5847;
-
7/27/2019 Coding Technologies for Video
15/27
15
2 Maximum Luminance of the image:
For this we will introduce a new color space more suitable for this task, YCbCr. It is quite similar to YUV
because it has a luminance (Y) and two chrominance but they are all scaled from 0 to 1 instead of 0 to
255.
Again we can push the max number to 1 and divide every other one for the same factor so the backlight
will this very same factor, less than 100%
YCBCRMAP = rgb2ycbcr(RGB);
Y = ycb1(:,:,1);
Ymax = max(max(Y));
Ynew=Y/Ymax;
YCBCRMAP2=Ynew;
YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2);
YCBCRMAP2(:,:,3)=YCBCRMAP(:,:,3);
RGB3 = ycbcr2rgb(YCBCRMAP2);
outframe2 = saveSIM2frame1Value(255*RGB3, BackLight2, 'testing2');
The resulting value for the backlight is 0.8273
-
7/27/2019 Coding Technologies for Video
16/27
16
3 Average Luminance of the image:
Now we are taking the average luminance to 1. This will make some values (every one bigger than the
average) bigger than 1 which is not possible because the LCD cant generate more light.
We round every value higher than 1 to 1 and lose some information.
Ymean=Y/mean(mean(Y))Ymean(Ymean>1)=1;
BackLight3=mean(mean(Y))
YCBCRMAP3=Ymean;
YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2);
RGB4 = ycbcr2rgb(YCBCRMAP3);
outframe3 = saveSIM2frame1Value(255*RGB4, BackLight3, 'testing3');
BackLight3 = 0.3759
-
7/27/2019 Coding Technologies for Video
17/27
17
4Square-root of the average luminance
Repeat the steps in 3 but with the square-root:
Yroot=Y/sqrt(mean(mean(Y)));
Yroot(Yroot>1)=1;
BackLight4=sqrt(mean(mean(Y)));
YCBCRMAP4=Yroot;YCBCRMAP4(:,:,2)=YCBCRMAP(:,:,2);
YCBCRMAP4(:,:,3)=YCBCRMAP(:,:,3);
RGB5 = ycbcr2rgb(YCBCRMAP4);
outframe4 = saveSIM2frame1Value(255*RGB5, BackLight4, 'testing4');
BackLight4 = 0.6131
-
7/27/2019 Coding Technologies for Video
18/27
18
The square-root has better quality because there we less higher-than-1 numbers after dividing.
The backlight for the last 2 are, for the average and the square-root respectively
-
7/27/2019 Coding Technologies for Video
19/27
19
Combining the pictures with their backlights:
-
7/27/2019 Coding Technologies for Video
20/27
20
To actually see the differences between the last two images and the original one we can subtract them
to the original, square the difference and display them as gray scale pictures.
MSE1 = (RGB RGB4*BackLight3).^2;
Grey1 = (MSE1,[],3);
max(max(Grey1)) 0.0482
imshow(Grey1, [0 , 0.0482]);
-
7/27/2019 Coding Technologies for Video
21/27
21
MSE2 = (RGB RGB5*BackLight4).^2;
Grey2 = (MSE2,[],3);
max(max(Grey2)) 0.0817
imshow(Grey2, [0 , 0.0817]);
The higher Grey2 value from the square-root of the average vs. Grey1 from the average indicates that
the maximum difference with the original is lower for Grey2.
Anyway, we can clearly see that the second image is overall darker than the first one which also means
that the differences with the original are lower. (0 difference 0 Black).
7 PROFESSIONAL STUFF (ALL-NIGHT LAUNCHING)Once we had learnt the whole video processing chain: Acquisition, Compression and backlight dimming,
we had to prepare our videos for display. We used some functions made at DTU to compare some of the
outputs created through different algorithms. We could use two different structures for the modeled
backlight, diverse algorithms such as full blacklight, maximum luminance value, average luminance value,square-root of the average luminance value and the homemade algo.
The DTU algorithms in Matlab could only work with uncompressed avi format. Thus, we had to change
the format of the videos and we created two different ones. The first one was just an uncompressed avi
version from the original one and the second one was a little trickier. We compressed the original one
-
7/27/2019 Coding Technologies for Video
22/27
22
with a constant bitrate of 0.6Mb and then we transformed it to an uncompressed avi version in order to
be able to work with it in MatLab. In this way we had two videos in the same format, the first one with
good quality and the second one with a worse one.
Some post-process calculation were required:
Avg, 8 rows 2 columns backlight, previously high compressed video PSNR = 9.626
Avg, 2202 LEDs, previously high compressed video PSNR = 10.2525
Bbgd, 8 rows 2 columns backlight, previously high compressed video PSNR = 8.522
We can notice some improvement in that very low quality video when using precise backlight dimming.
-
7/27/2019 Coding Technologies for Video
23/27
23
ANNEX 1FFMPEGFor the sake of simplicity we created several videos equals to the third original video (the one with more
contrasts) but with some special differences in format and compression:
1 Original, reduced to 10 seconds, 250 frames, FullHD, uncompressed .avi
ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodecrawvideo -s 1920x1080
uncompressed.avi
2 Original, reduced to 10 seconds, 250 frames, Full HD, (uncompressed) .yuv
ffmpeg -i uncompressed.avi video3.yuv
3 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, low bitrate.mp4
ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080
b 0.6M
lowbitrate.mp4
4 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, high bitrate .mp4
ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 b 5M
highbitrate.mp4
5 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, low
q .mp4
ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmax 5
lowq.mp4
6 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, high
q .mp4
ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmin 40
highq.mp4
7 Original, reduced to 10 seconds, 250 frames, Full HD, from previously high compressed video,
uncompressed .avi
ffmpeg -ilowbitrate.mp4 -vcodecrawvideo uncompressedfromcompressed.avi
NOTE: many more videos were created but those are the ones remaining and used in this report.
-
7/27/2019 Coding Technologies for Video
24/27
24
ANNEX 2PSNRfunction [PSNR]=PSNR(filename,width,height,fi lename_o)
blbl=[1:1:250];
framenumber = 1;
whileframenumber< 251
[y,u,v]=extractyuv420(filename,width,height,framenumber);
[y_o,u_o,v_o]=extractyuv420(filename_o,width,height,framenumber);
lum2 = (y - y_o).^2;
count = 1080*1920;
sumX = sum((sum(lum2))');
mse = sumX/count;
blbl(framenumber)=20*log10((255)/(sqrt(mse)));
framenumber = framenumber + 1;
end
PSNR = mean(blbl);
ANNEX 3ENTROPYfunction [average] = average (filename,width,height,framenumber,DICT,PROB)
average=0;
n_boits=zeros(256,1);
fori=1:1:256
number= size(DICT{i,2});
n_bits = number(2);
average = average + PROB(i)*n_bits;
end
-
7/27/2019 Coding Technologies for Video
25/27
25
function [SYM, PROB, DICT, result]=entropy(f ilename,width,height,f ramenumber)
COUNT = zeros(256,1);
[y,u,v]=extractyuv420(filename,width,height,framenumber);
fori=1:1:1080
for j=1:1:1920
number= y(i,j);
COUNT(number)=COUNT(number)+1;
end
end
PROB=COUNT/(1080*1920);
SYM=[0:1:255];
DICT = huffmandict(SYM,PROB);
result = 0;
fori=1:1:255
if (PROB(i)~= 0)
result = result -PROB(i)*log2(PROB(i));
end
end
ANNEX 4Qfunction [Matrix]=quantization(fi lename,width,height,framenumber)
[y,u,v]=extractyuv420(filename,width,height,framenumber);
y_0 = dct2(y);
Matrix = zeros(height,width);
fori=1:1:height/4
-
7/27/2019 Coding Technologies for Video
26/27
26
for j=1:1:width/4
Matrix(i,j)=round(y_0(i,j)/5);
end
end
fori=height/4:1:height/2
for j=width/4:1:width/2
Matrix(i,j)=round(y_0(i,j)/10);
end
end
fori=height/2:1:height*3/4
for j=width/2:1:width*3/4
Matrix(i,j)=round(y_0(i,j)/30);
end
end
fori=height*3/4:1:height
for j=width*3/4:1:width
Matrix(i,j)=0;
end
end
function [Matrix_R]=reverse_quantization(Matrix,width,height)
Matrix_R = zeros(height,width);
fori=1:1:height/4
for j=1:1:width/4
Matrix_R(i,j)=Matrix(i,j)*5;
end
end
-
7/27/2019 Coding Technologies for Video
27/27
fori=height/4:1:height/2
for j=width/4:1:width/2
Matrix_R(i,j)=Matrix(i,j)*10;
end
end
fori=height/2:1:height*3/4
for j=width/2:1:width*3/4
Matrix_R(i,j)=Matrix(i,j)*30;
end
end
Matrix_R = idct2(Matrix_R);