coding technologies for video

Upload: pablo-lanaspa

Post on 02-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Coding Technologies for Video

    1/27

    Coding Technologies

    for Video:acquisition, compression and

    display.Dunai Fuentes

    Maria Jos Giner

    Pablo Lanaspa

    ZhihengXu

    http://www.google.dk/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&docid=WlWAUuyuT5J4OM&tbnid=vvKV0eBkYmCoJM:&ved=0CAUQjRw&url=http://wayswelearnaboutfilms.wordpress.com/2012/03/05/pre-production-crew-schedule-budget-locations/&ei=e4j3UdLGFYKyOZuhgKAL&psig=AFQjCNGYVTAMMAp0LuQrVQhOGQTMdSUW9g&ust=1375263208994
  • 7/27/2019 Coding Technologies for Video

    2/27

  • 7/27/2019 Coding Technologies for Video

    3/27

    3

    1 CAPTURE AND EDITINGWe were told about our first group work which mainly consisted in recording 3 videos. They were meant

    to highlight three different aspects; brightness, darkness and motion. Regarding the bright video, we

    decided to capture it on the outside in the sunlight. Afterwards one of us captured a high-motion videoriding a bike around DTUs sight.

    Concerning the dark one, as we had to include contrast with brightness in it, we decided to use the lab

    room with the lights off and few rays of sunshine enclosed by the dark curtains.

    Finished this part, we were introduced to Avidemux, a free video editor designed for simple cutting,

    filtering and encoding tasks. After loading the clips, which had 1440 x 1080 resolution and 50 frames per

    second, we started editing them, applying filters and encoding them. We had to notice that for coding

    .yuv videos with Avidemux it was necessary to apply a swap in UV colours as a filter. The next step was

    to compare our expectations about the file sizes we had just created. It was shocking how heavy .yuv

    clips (with YUV codec) were compared to the ones encoded with H.264 Mpeg4 AVC (x264) codec. We

    also needed to take into account which video players we were supposed to use for each coding type.

    .yuv files describes the colour of the pixel with YUV colour space format (y for the luminance and v and u

    for the chrominance). Although it contains the appropriate values for the amount of needed pixels it has

    no clue of the resolution and fps, just what pixel is first and what is next. Thats why we had to specify

    the resolution and fps of the video when trying to play it.

    For instance, if we had indicated a resolution of 1440 x 540 we will be showed the tops half of the

    image first and in the next frame the bottoms half.

    We were also introduced to another cross-platform solution to convert and stream audio and video that

    had to be used with the windows command line, FFMPEG. We mainly used it to convert a video file to

    h.264 video or any other coding type as .yuvchange its frames number and its frames size as well as the

    bitrate. Extended use of it will be discussed later in Annex 1.

    Finally using the given function on Matlab called extractyuv420.m we could extract y, u and v

    components of one image of the clip as big arrays which represented every pixel in the frame. Position is

    indicated by the matrix indexes as well as colour properties by the y, u, v. There is a luminance value for

    every pixel but only half of the

    total amount has u value, and it

    happens the same with the v

    value.

    To display the luminance part ofthe frame (using imshow();) we

    are only interested in the y.

    Therefore the command to be

    used is: imshow(y,[0,255]). As we

    are only representing the

    luminance the image is grayscaled.

  • 7/27/2019 Coding Technologies for Video

    4/27

    4

    For calculating the mean luminance value we just used the Matlab function mean , first for calculating

    the mean of every column

    and afterwards for the meanthe resulting row.Mean_y =

    mean(mean(y)); What rests is

    a value with the mean

    luminance of the frame. For

    the upper frame Mean_y

    =62.6867Knowing about this

    feature tell us about how

    bright or dark is the image in

    a scale from 0 to 255.

    Displaying the image incolour with imshow(); is

    easier to do if we have the RGB colour space representation of it. The other given function yuv2rgb.m

    simply converts our selected frame to RGB.

    RGB = yuv2rgb(y,u,v);

    imshow(RGB);

    2 VIDEO COMPRESSIONFirstly we are going to describe the processing of digital video from capture to display and then

    explain and illustrate H.264 coding.

  • 7/27/2019 Coding Technologies for Video

    5/27

    5

    Compression takes place in the camera while recording the video but further processes can be execute

    on a computer.

    It is essential to understand the changes the video suffers while we compress it so we will be able to

    minimize the losses we are taking during the process.During the video compression process, a video

    stream is analyzed and unnecessary parts of the data are discarded in order to make a large video

    file smaller in size.There are essentially two ways to compress data in a video file: intraframe and

    interframe.

    Intraframe (I-frame) compression compresses each individual frame of the video (similar to JPEG

    compression of a still image). Every frame of the image is considered as a still image. With

    intraframe compression, the complete frame is only slightly compressed, so the file size isnt

    that much smaller because each individual frame is included in the newly-compressed version.

    Interframe compression takes a look at each frame in a video file, compares it to the previous

    frame and stores only the changed data frame from frame so the file size is much smaller than

    intraframe compression.

    In the image below we have a diagram that summarize the changes that takes every frame during the

    video compression:

  • 7/27/2019 Coding Technologies for Video

    6/27

    6

    The original image is divided into a set of square blocks, usually 8x8 pixels. The image data are

    transformed using the DCT to a new set of coefficients. The transform coefficients are quantizedusing a

    simple multiply-round-divide operation .

    The quantized coefficients are zeroed by this operation, making the image well-suited for efficientlossless compression applied before to storage or transmission. To reconstruct the image, the quantized

    coefficients are converted by the inverse DCT, creating a new image that approximates the original. The

    error is the difference between the original and the reconstruction and it consists of mainly high

    frequency texture.(References:https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-

    multiresolution-image-representations/)

    Seeing the effects of these parameters in your own video is always instructive so we created some clips

    where we could see the differences (the how in Annex 1).

    We adjusted the settings of the encoding in different ways which have produced diverse outputs:

    Producing streams with constant bitrate:

    High level bitrate low compression

    Low level bitrate high compression

    Producing streams with variable bitrate:

    To produce streams with variable bitrate through ffmpeg you need to set a constant q-

    quantization factor. Choosing a q-factor with the value of 5 we have created a low-compressed

    video which had good quality. Then we changed the value to 40 in order to create a high-

    compressed video and easily notice a huge difference in the video-quality.

    In conclusion, we found out that the lower we chose bit rate, the lower quality the video had.

    After trying several bitrate levels we decided that for a Full HD video resolution (1920x1080) the

    minimum bit rate that produced an acceptable image quality was 0.6 Mb/s.

    3 VIDEO ANALYSISIn this part of the work we had to change the tool programme. We used ElecardStreamEye to open

    and analyse coded H.264 streams. The programme shows useful information for analysing the videoas GOP structure, picture size, Motion Vectors (MV)...

    - The frames that normally consumes more space are the original ones and the frames that

    normally consumes less space are the bi-directionally predicted ones.

    - The motion vectors predict in which way the motion is going to go. They go pixel by pixel

    looking around their surrounding pixels with the aim of finding the pixel that fits better with

    themselves. In this way we can predict the next frame without losing quality in the video.

    https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/https://www.stanford.edu/group/vista/cgi-bin/FOV/chapter-8-multiresolution-image-representations/
  • 7/27/2019 Coding Technologies for Video

    7/27

    7

    - The 16x16 squares prediction is normally used in blocks of the frame in the edges or where

    there are lot of details in order to maintain as much quality as possible in the picture. Whereas the

    4x4 prediction is usually used in parts of the frame where it is not expected to have a big change or

    where the pixels in the zone are very similar.

    Later we wrote a script in Matlab to calculate the average PSNR of the luminance component of our

    videos (Annex 2). On the one hand we observed that PSNR is not always a good measure of the

    quality because the PSNR value didnt change too much when we were comparing constant bitrate

    on low and high compressed videos, whereas the quality difference was very obvious. In the other

    hand PSNR was a good quality measure when we had to compare variable bitrate compressions.

    (See the following table)

    This is probably caused by the optimization algorithms of h.264. Whenever you indicated a low-

    constant bitrate it tries to do the best compression and that implies a high PSNR. However, when

    dealing with an imposed high q it can that much to solve the mess.

    Bit-rate Q-quantizationcoefficient Average PSNR

    Constant low compression (5Mb/s) Automatically set by FFMPEG 28,1844

    Constant high compression (0,6Mb/s) Automatically set by FFMPEG 27,5440

    Variable highcompression 40 42,3248

    Variable lowcompression 5 61,0033

    4 ENTROPY CODINGExercise 1:

  • 7/27/2019 Coding Technologies for Video

    8/27

    8

    Exercise 2:

  • 7/27/2019 Coding Technologies for Video

    9/27

    9

    4.1 HOMEMADE ENTROPY ENCODERAfter some rest we already had our minds structured and soon we achieved the working (just working)script that goes from the 1080x1920 matrix, 1 byte per pixel, to a huge string of binary states (0,1) which

    codified the information stored in the previous matrix using the huffman coding.

    function[SYM,PROB,sig_a,DICT,sig_encoded]=encoder(filename,width,height,framenumber)

    COUNT = zeros(256,1);

    [y,u,v]=extractyuv420(filename,width,height,framenumber);

    sig_a=[1:2073600];

    fori=1:1:1080

    for j=1:1:1920

    number= y(i,j);

    sig_a((i-1)*1920+j)= number;

    COUNT(number)=COUNT(number)+1;

    end

    end

    PROB=COUNT/(1080*1920);

    SYM=[0:1:255];

    DICT = huffmandict(SYM,PROB);

    sig_encoded=huffmanenco(sig_a,DIC);

    sig_encoded had for the fifth frame of our video (framenumber=50) a total of 13428574 bits, way less

    than the original one which needed 1920*1080*8=16588800 bits to represent the luminance of the

    same frame.

    Getting the original image from the coded one is just as simple as type DECO =

    huffmandeco(sig_encoded, DICT) where DECO would be the original image again (because there is no

    loss).

    To calculate the entropy and the average number of bits used per symbol we implemented some new

    scripts (entropy.m and average.m) that worked with the information we already had.

  • 7/27/2019 Coding Technologies for Video

    10/27

    10

    The results for this frame were: Entropy = 6.14083 ; Average = 6.4083

    With an efficiency of 95.97% we can undoubtedly say that we have a really good compression. But we

    had been missing something all the way round. The fact that we used a make-it-yourself dictionary

    which fitted perfectly the frame we were working on.

    This process makes no sense for a versatile encoder.Another dictionary with the probabilities of eachrepresentable luminance (or color), extracted from the experience with different kinds of video

    environments, wont perform as well as ours did with our working frame but will get better results with

    a wider range of videos and as it can be defined in the codec itself it wouldnt be necessary to send this

    information to the decoder.

    H.264 uses one of a few tables (dictionaries) depending on the properties of the video. That way it keeps

    its flexibility while it improves its accuracy.

    Benchmarking vs h.264 wasnt even in our expectations after seen that it took almost 20 mins to do the

    Huffman encoding of a single frame.

    5 TRANSFORM AND QUANTIZATIONTo get started with this part we began by coding the frame with a constant Q for every coefficient of the

    DCT:

    y_0 = dct2(y)

    y_q40 = round(y_0/40);

    y_q5 = round(y_0/5);

    y_display40=idct2(y_q40*40);

    imshow(y_display40,[0,255]);

    figure;y_display5=idct2(y_q5*5);

    imshow(y_display5,[0,255]);

    Where y is the luminance of the second frame of video3.yuv

    The results were as expected:

    - High quantization level

  • 7/27/2019 Coding Technologies for Video

    11/27

    11

    - Low quantization level

    We may optimize the algorithm by doing a scaled quantization being the high frequencies (responsible

    of the details and not easily appreciate by the human eye) eliminated or highly quantitates while the low

    frequencies are carefully quantitates so to not have losses on it.

    The pattern we followed can be seen below this sentence:

  • 7/27/2019 Coding Technologies for Video

    12/27

    12

    Scripts used: quantization.m and reverse_quantization.m (check Annex 3 Q)

    The resulting image:

  • 7/27/2019 Coding Technologies for Video

    13/27

    13

    6 CONTROLLING THE BACKLIGHT

    For all the testing in this part we will be also using the second frame of the video:

    [y,u,v]=extractyuv420('video3.yuv',1920,1080,2);

    RGB = yuv2rgb(y,u,v);

    imshow(RGB);

    Original

    When displaying an image on a screen we have to set some backlight value. By decreasing this value we

    can save energy and get better blacks (because a highly backlighted black looks like a grey). On the

  • 7/27/2019 Coding Technologies for Video

    14/27

    14

    other hand if we just decrease too much the backlight we will get a darker image and this is not what we

    want.

    Our goal is to reduce the backlight, especially in those dark areas where it also improves the image

    quality, while maintaining the display as the original image. We will begin by calculating a single

    backlight value for the whole picture.

    This can be done in four ways:

    1 Maximum LED values:

    In RGB colour space three values determinate the colour of each pixel. These values are scaled from 0 to

    1 but are rarely push to their limits. In our case the maximum value of one in the second frame is on the

    blue for 0.5847

    [y,u,v]=extractyuv420('video3.yuv',1920,1080,2);

    RGB = yuv2rgb(y,u,v);

    red=RGB(:,:,1);

    green=RGB(:,:,2);

    blue=RGB(:,:,3);

    R=max(max(red));

    G=max(max(green));

    B=max(max(blue));

    We can take advantage of these limit to increase the colourage, decrease the backlight and keep

    having the same final result.

    Summing up, the new image for a backlight of 58.47% would be...

    RGB2 = RGB/0.5847;

  • 7/27/2019 Coding Technologies for Video

    15/27

    15

    2 Maximum Luminance of the image:

    For this we will introduce a new color space more suitable for this task, YCbCr. It is quite similar to YUV

    because it has a luminance (Y) and two chrominance but they are all scaled from 0 to 1 instead of 0 to

    255.

    Again we can push the max number to 1 and divide every other one for the same factor so the backlight

    will this very same factor, less than 100%

    YCBCRMAP = rgb2ycbcr(RGB);

    Y = ycb1(:,:,1);

    Ymax = max(max(Y));

    Ynew=Y/Ymax;

    YCBCRMAP2=Ynew;

    YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2);

    YCBCRMAP2(:,:,3)=YCBCRMAP(:,:,3);

    RGB3 = ycbcr2rgb(YCBCRMAP2);

    outframe2 = saveSIM2frame1Value(255*RGB3, BackLight2, 'testing2');

    The resulting value for the backlight is 0.8273

  • 7/27/2019 Coding Technologies for Video

    16/27

    16

    3 Average Luminance of the image:

    Now we are taking the average luminance to 1. This will make some values (every one bigger than the

    average) bigger than 1 which is not possible because the LCD cant generate more light.

    We round every value higher than 1 to 1 and lose some information.

    Ymean=Y/mean(mean(Y))Ymean(Ymean>1)=1;

    BackLight3=mean(mean(Y))

    YCBCRMAP3=Ymean;

    YCBCRMAP2(:,:,2)=YCBCRMAP(:,:,2);

    RGB4 = ycbcr2rgb(YCBCRMAP3);

    outframe3 = saveSIM2frame1Value(255*RGB4, BackLight3, 'testing3');

    BackLight3 = 0.3759

  • 7/27/2019 Coding Technologies for Video

    17/27

    17

    4Square-root of the average luminance

    Repeat the steps in 3 but with the square-root:

    Yroot=Y/sqrt(mean(mean(Y)));

    Yroot(Yroot>1)=1;

    BackLight4=sqrt(mean(mean(Y)));

    YCBCRMAP4=Yroot;YCBCRMAP4(:,:,2)=YCBCRMAP(:,:,2);

    YCBCRMAP4(:,:,3)=YCBCRMAP(:,:,3);

    RGB5 = ycbcr2rgb(YCBCRMAP4);

    outframe4 = saveSIM2frame1Value(255*RGB5, BackLight4, 'testing4');

    BackLight4 = 0.6131

  • 7/27/2019 Coding Technologies for Video

    18/27

    18

    The square-root has better quality because there we less higher-than-1 numbers after dividing.

    The backlight for the last 2 are, for the average and the square-root respectively

  • 7/27/2019 Coding Technologies for Video

    19/27

    19

    Combining the pictures with their backlights:

  • 7/27/2019 Coding Technologies for Video

    20/27

    20

    To actually see the differences between the last two images and the original one we can subtract them

    to the original, square the difference and display them as gray scale pictures.

    MSE1 = (RGB RGB4*BackLight3).^2;

    Grey1 = (MSE1,[],3);

    max(max(Grey1)) 0.0482

    imshow(Grey1, [0 , 0.0482]);

  • 7/27/2019 Coding Technologies for Video

    21/27

    21

    MSE2 = (RGB RGB5*BackLight4).^2;

    Grey2 = (MSE2,[],3);

    max(max(Grey2)) 0.0817

    imshow(Grey2, [0 , 0.0817]);

    The higher Grey2 value from the square-root of the average vs. Grey1 from the average indicates that

    the maximum difference with the original is lower for Grey2.

    Anyway, we can clearly see that the second image is overall darker than the first one which also means

    that the differences with the original are lower. (0 difference 0 Black).

    7 PROFESSIONAL STUFF (ALL-NIGHT LAUNCHING)Once we had learnt the whole video processing chain: Acquisition, Compression and backlight dimming,

    we had to prepare our videos for display. We used some functions made at DTU to compare some of the

    outputs created through different algorithms. We could use two different structures for the modeled

    backlight, diverse algorithms such as full blacklight, maximum luminance value, average luminance value,square-root of the average luminance value and the homemade algo.

    The DTU algorithms in Matlab could only work with uncompressed avi format. Thus, we had to change

    the format of the videos and we created two different ones. The first one was just an uncompressed avi

    version from the original one and the second one was a little trickier. We compressed the original one

  • 7/27/2019 Coding Technologies for Video

    22/27

    22

    with a constant bitrate of 0.6Mb and then we transformed it to an uncompressed avi version in order to

    be able to work with it in MatLab. In this way we had two videos in the same format, the first one with

    good quality and the second one with a worse one.

    Some post-process calculation were required:

    Avg, 8 rows 2 columns backlight, previously high compressed video PSNR = 9.626

    Avg, 2202 LEDs, previously high compressed video PSNR = 10.2525

    Bbgd, 8 rows 2 columns backlight, previously high compressed video PSNR = 8.522

    We can notice some improvement in that very low quality video when using precise backlight dimming.

  • 7/27/2019 Coding Technologies for Video

    23/27

    23

    ANNEX 1FFMPEGFor the sake of simplicity we created several videos equals to the third original video (the one with more

    contrasts) but with some special differences in format and compression:

    1 Original, reduced to 10 seconds, 250 frames, FullHD, uncompressed .avi

    ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodecrawvideo -s 1920x1080

    uncompressed.avi

    2 Original, reduced to 10 seconds, 250 frames, Full HD, (uncompressed) .yuv

    ffmpeg -i uncompressed.avi video3.yuv

    3 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, low bitrate.mp4

    ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080

    b 0.6M

    lowbitrate.mp4

    4 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, high bitrate .mp4

    ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 b 5M

    highbitrate.mp4

    5 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, low

    q .mp4

    ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmax 5

    lowq.mp4

    6 Original, reduced to 10 seconds, 250 frames, Full HD, h.264 MPEG-4 AVC codec, variable bitrate, high

    q .mp4

    ffmpeg -i 00003.MTS -ss 00:00:10 -vframes 250 -vfyadif -vcodec libx264 -s 1920x1080 qmin 40

    highq.mp4

    7 Original, reduced to 10 seconds, 250 frames, Full HD, from previously high compressed video,

    uncompressed .avi

    ffmpeg -ilowbitrate.mp4 -vcodecrawvideo uncompressedfromcompressed.avi

    NOTE: many more videos were created but those are the ones remaining and used in this report.

  • 7/27/2019 Coding Technologies for Video

    24/27

    24

    ANNEX 2PSNRfunction [PSNR]=PSNR(filename,width,height,fi lename_o)

    blbl=[1:1:250];

    framenumber = 1;

    whileframenumber< 251

    [y,u,v]=extractyuv420(filename,width,height,framenumber);

    [y_o,u_o,v_o]=extractyuv420(filename_o,width,height,framenumber);

    lum2 = (y - y_o).^2;

    count = 1080*1920;

    sumX = sum((sum(lum2))');

    mse = sumX/count;

    blbl(framenumber)=20*log10((255)/(sqrt(mse)));

    framenumber = framenumber + 1;

    end

    PSNR = mean(blbl);

    ANNEX 3ENTROPYfunction [average] = average (filename,width,height,framenumber,DICT,PROB)

    average=0;

    n_boits=zeros(256,1);

    fori=1:1:256

    number= size(DICT{i,2});

    n_bits = number(2);

    average = average + PROB(i)*n_bits;

    end

  • 7/27/2019 Coding Technologies for Video

    25/27

    25

    function [SYM, PROB, DICT, result]=entropy(f ilename,width,height,f ramenumber)

    COUNT = zeros(256,1);

    [y,u,v]=extractyuv420(filename,width,height,framenumber);

    fori=1:1:1080

    for j=1:1:1920

    number= y(i,j);

    COUNT(number)=COUNT(number)+1;

    end

    end

    PROB=COUNT/(1080*1920);

    SYM=[0:1:255];

    DICT = huffmandict(SYM,PROB);

    result = 0;

    fori=1:1:255

    if (PROB(i)~= 0)

    result = result -PROB(i)*log2(PROB(i));

    end

    end

    ANNEX 4Qfunction [Matrix]=quantization(fi lename,width,height,framenumber)

    [y,u,v]=extractyuv420(filename,width,height,framenumber);

    y_0 = dct2(y);

    Matrix = zeros(height,width);

    fori=1:1:height/4

  • 7/27/2019 Coding Technologies for Video

    26/27

    26

    for j=1:1:width/4

    Matrix(i,j)=round(y_0(i,j)/5);

    end

    end

    fori=height/4:1:height/2

    for j=width/4:1:width/2

    Matrix(i,j)=round(y_0(i,j)/10);

    end

    end

    fori=height/2:1:height*3/4

    for j=width/2:1:width*3/4

    Matrix(i,j)=round(y_0(i,j)/30);

    end

    end

    fori=height*3/4:1:height

    for j=width*3/4:1:width

    Matrix(i,j)=0;

    end

    end

    function [Matrix_R]=reverse_quantization(Matrix,width,height)

    Matrix_R = zeros(height,width);

    fori=1:1:height/4

    for j=1:1:width/4

    Matrix_R(i,j)=Matrix(i,j)*5;

    end

    end

  • 7/27/2019 Coding Technologies for Video

    27/27

    fori=height/4:1:height/2

    for j=width/4:1:width/2

    Matrix_R(i,j)=Matrix(i,j)*10;

    end

    end

    fori=height/2:1:height*3/4

    for j=width/2:1:width*3/4

    Matrix_R(i,j)=Matrix(i,j)*30;

    end

    end

    Matrix_R = idct2(Matrix_R);