t305: digital communications arab open university-lebanon tutorial 121 t305: digital communications...

30
T305: DIGITAL COMMUNICATIONS Arab Open University- Lebanon Tutorial 12 1 T305: Digital Communications Block III – Video Coding

Upload: gwendolyn-lamb

Post on 17-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

T305: DIGITAL COMMUNICATIONS 3 Arab Open University-Lebanon Tutorial 12 Introduction The message coding technique adopted for Digital Video Broadcasting DVB was MPEG-2, a pre-existing standard (MPEG) which is appropriate for a wide range of video applications. MPEG stands for Motion Picture Experts Group. MPEG has defined a number of standards for the compression of moving pictures:

TRANSCRIPT

Page 1: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

Arab Open University-Lebanon Tutorial 12 1

T305: Digital Communications

Block III – Video Coding

Page 2: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

2Arab Open University-Lebanon

Tutorial 12

IntroductionDigital video has a number of advantages over analogue for broadcast TV.

The effect of transmission impairments on picture quality is far less than in the analogue case. In particular, ‘ghost’ pictures due to the presence of multiple signal transmission paths and reflections are eliminated.bandwidth is nearly always at a premium. Digital television allows more channels to be accommodated in a given bandwidthDifferent types of programme material such as teletext or sub-titles in several languages can be accommodated much more flexibly with digital coding.

Page 3: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

3Arab Open University-Lebanon

Tutorial 12

IntroductionThe message coding technique adopted for Digital Video Broadcasting DVB was MPEG-2, a pre-existing standard (MPEG) which is appropriate for a wide range of video applications.MPEG stands for Motion Picture Experts Group. MPEG has defined a number of standards for the compression of moving pictures:

Page 4: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

4Arab Open University-Lebanon

Tutorial 12

IntroductionBoth in films and television, moving scenes are shown as a series of fixed pictures, usually generated at a rate of about 25 per second.There is often very little change between consecutive pictures, and MPEG-2 coding takes advantage of this to achieve high degrees of compression.

Page 5: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

5Arab Open University-Lebanon

Tutorial 12

The structure of video pictures

Fig. A simple rasterscanning pattern. The trace is blanked outduring the flyback.

Monochrome picturesThe electron beam in the cathode ray tube (CRT) is made to scan the whole visible surface of the screen in a zig-zag pattern called a raster, shown schematically below.

Page 6: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

6Arab Open University-Lebanon

Tutorial 12

Monochrome picturesDuring this forward motion, the beam current is modulated, that is, its magnitude is varied so as to produce a spot of varying brightness on the screen in such a way as to build up a display of the transmitted image. The beam moves much faster and is blanked (effectively zero current) as it flies back from right to left. When a scan is complete, the beam flies back to the top of the screen and starts producing the next picture.This is very much like the cinema in which the illusion of motion is created by a succession of still pictures. In early cinemas, mechanical constraints on the film projectors led to there being too few pictures or frames per second. This produced a flickering sensation, hence the name ‘flicks’.

Page 7: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

7Arab Open University-Lebanon

Tutorial 12

Monochrome picturesIn the case of digital picture coding, the analogue variation of brightness along each line has to be converted into a series of discrete digital samples. The picture is therefore coded as a series of dots called picture elements or pixels.Thus, the resolution of digital displays is normally expressed in terms of the number of pixels as, for instance, 800 × 600 pixels. Here, 800 is the number of pixels in one horizontal line and there are 600 lines on the visible portion of the screen. Other common standards include 640 × 480, the ‘cheap and cheerful’ standard, and 1024 × 768, or even 1600 × 1200 for detailed graphic work.

Page 8: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

8Arab Open University-Lebanon

Tutorial 12

Adding colorVision is a very complex subject, but it is sufficient to appreciate that, in a human eye, there are three types of receptors sensitive to red, green and blue light. By combining sources of these three primary colors and choosing appropriate intensities for each, we can effectively produce any of the shades of color the human eye can perceive.Color television tubes, for instance, have three electron beams focused in such a way that each beam can only land on one of three separate sets of phosphor spots on the screen, each set producing one of the three primary colors when energized. The spots lie close together, so that the eye perceives a single color resulting from the combination of the local intensities of the three primaries

Page 9: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

9Arab Open University-Lebanon

Tutorial 12

Adding colorThe black and white (that is, shades of grey) signal, known as the luminance signal, can be obtained from a suitably weighted sum of the three color signals.The luminance signal, Y, can be expressed as:Y=0.587G + 0.299R + 0.114BTwo more signals are needed to enable the R,G,B color signals to be reconstituted before feeding to the display device. The signals used are:Cb= 0.564(B-Y) Cr=0.713(R-Y).Cb and Cr are known as the color difference, chrominance or chroma signals. They are defined by the ITU-T for digital video systems.

Page 10: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

10Arab Open University-Lebanon

Tutorial 12

Digital conversionIn a typical video camera, the image of the scene is scanned electronically to produce the chrominance and luminance signals for each picture line.

The output of a video camera is essentially analogue and has to be converted to digital form in order to take advantage of the many digital processing techniques which are available.

The output bandwidths of analogue broadcast TV cameras and many other analogue video system sources are typically about 5 to 6 MHz for the luminance signal and about half of this for each of the chrominance signals.

Page 11: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

11Arab Open University-Lebanon

Tutorial 12

Digital conversionSampling rates numerically equal to at least twice the bandwidths are required, and the ITU-T has standardized on sampling frequencies of 13.5 MHz for the luminance signal and 6.75 MHz for each of the chrominance signals. Both types of signal are quantized using eight bits per sample and linear quantization.

The compression rates that have been achieved for video coding are most impressive. MPEG-2 coding

allows broadcast quality pictures to be transmitted at around 4 Mbit/s.

Page 12: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

12Arab Open University-Lebanon

Tutorial 12

Sampling formats chrominance or color sampling rate is half the luminance sampling rate, the question arises as to when the chrominance samples are taken relative to the luminance samples.Figure (a) represents the luminance sampling. The figure represents part of a camera scanning raster and the circles show the times when the camera output luminance signal is sampled. The samples are taken consecutively along each line at the sampling rate (that is, a sample is taken every 1/(13.5 × 106) = 0.074 μs).

Page 13: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

13Arab Open University-Lebanon

Tutorial 12

Sampling formatsFig. Video sampling. The Cb and Cr chrominance

signals are sampled at half the luminance rate and an obvious way of doing this is to take chrominance samples which coincide with alternate luminance ones. This is shown in Figure (b), and is known as 4:2:2 sampling.In many cases it is possible to reduce the number of chrominance samples while still producing acceptable pictures. There are various ways of doing this, one of which can be used for MPEG coding. It is known as 4:2:0 sampling and is shown in Figure (c),

Each pair represented by a cross

Page 14: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

14

Sampling formats

lower resolution is acceptable, source intermediate format (SIF)

Page 15: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

15Arab Open University-Lebanon

Tutorial 12

The coding of still picturesMPEG is designed to squeeze out as much redundancy as possible in order to achieve high levels of compression. This is done in two stages:Spatial compression uses the fact that, in most pictures, there is considerable correlation between neighboring areas in a picture (and, hence, a high degree of redundancy in the data directly obtained by sampling) to compress separately each picture in a video sequence.Temporal compression uses the fact that, in most picture sequences, there is normally very little change during the 1/25 s interval between one picture and the next. The resulting high degree of correlation between consecutive pictures allows a considerable amount of further compression.

Page 16: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

16Arab Open University-Lebanon

Tutorial 12

The discrete cosine transformThe first stage of spatial compression uses a variety of Fourier transform known as a discrete cosine transform (DCT) on 8 × 8 blocks of data.

Page 17: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

17Arab Open University-Lebanon

Tutorial 12

The discrete cosine transformThe discrete cosine transform (DCT) is the one used for JPEG and MPEG coding. It is a reversible transform: applied to n original samples, it yields n amplitude values, and applying the reverse transform to these n amplitudes enables one to recover the original sample values.If the high-frequency components are sufficiently small, then setting them to zero before carrying out the reverse transform will produce a picture which, to a human observer, is effectively the same as the original one. This is the essence of the compression process.

Page 18: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

18Arab Open University-Lebanon

Tutorial 12

The discrete cosine transformThe DCT can be used to take advantage of correlations between pixels horizontally along a picture line. But one could use the same technique on vertical columns of pixels, but a much higher degree of compression can be achieved by using it simultaneously in both directions. This is done by applying a two-dimensional DCT to rectangular 8 × 8 blocks of pixels.The two-dimensional DCT applied to the 64 luminance values of an 8 × 8 block yields 64 amplitudes of two-dimensional spatial cosine functions. These are shown below. The spatial frequencies range from 0 (dc term) to 7 in both directions. The luminance in each block varies as a cosine function in both the horizontal and vertical directions.

Page 19: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

19Arab Open University-Lebanon

Tutorial 12

The discrete cosine transform

Fig. Example of an 8 × 8 DCT.In Figure above, each amplitude applies to a different component and the way the amplitudes are ordered in the right-hand transform output block is shown below.Fig. Block

notation.

Page 20: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

20Arab Open University-Lebanon

Tutorial 12

The discrete cosine transformThe output block is organized so that the horizontal frequencies increase from left to right and the vertical frequencies increase from top to bottom. The top-left component, with zero vertical and horizontal frequencies, is the dc term which represents the average luminance of the block.

Page 21: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

21Arab Open University-Lebanon

Tutorial 12

Thresholding and requantizationHumans are not very sensitive to fine detail at low luminance levels. This allows higher spatial frequency components to be eliminated.

Also, in general, humans are less sensitive to the contribution of high-frequency components compared with lower ones. This is taken into account by using requantization: fewer bits are used for the higher-frequency components than for the low-frequency ones. The DCT output block of amplitudes is reproduced in Table 4.1.

Page 22: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

22Arab Open University-Lebanon

Tutorial 12

Thresholding and requantization

A requantization table which is stored in the encoder is used. One of the tables used for the luminance coding in MPEG-2 is shown in Table 4.2. Each amplitude value in the DCT output table is divided by the corresponding number in the quantization table and the result, rounded to the nearest integer, is used to replace the original amplitude.

Page 23: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

23Arab Open University-Lebanon

Tutorial 12

Thresholding and requantization

Page 24: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

24Arab Open University-Lebanon

Tutorial 12

Zig-zag scan and run-length encodingIn general, the higher the frequencies, the more zeros there are in the requantized block. In order to take advantage of this, the requantized values are rearranged for further processing in the order shown below, which places them in order of ascending frequency for the horizontal and vertical directions combined.

The result of this is that there are relatively long sequences consisting entirely of zeros. This is rather like the sequences of pels of the same ‘colour’ (i.e. black or white) in faxed documents and, as in that case, run-length encoding leads to useful compression

Page 25: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

25Arab Open University-Lebanon

Tutorial 12

Zig-zag scan and run-length encodingThe dc term is coded separately using differential coding. This just involves sending the difference between the value of the dc term and the value of the dc term of the contiguous block that was encoded immediately before.The small jumps in average luminance from one block to the next can cause the block structure used for coding to become apparent. This effect is known as blocking.Blocking is minimized by differential coding of the dc term because the small difference between consecutive values allows greater accuracy for the dc terms and thus reduces the size of the steps arising from the quantization process.

Page 26: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

26Arab Open University-Lebanon

Tutorial 12

Zig-zag scan and run-length encodingFor Table 4.3 values, zig-zag scanning gives 103, followed by two zeros, −2, 1, 1, two zeros, 1, 42 zeros, 1, and finally 12 zeros. Assuming that the difference between the current and previous dc terms was 4, the data after zig-zag scanning and run-length coding would be sent as:4, 2, −2, 0, 1, 0, 1, 2, 1, 42, 1, 12

Page 27: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

27Arab Open University-Lebanon

Tutorial 12

Huffman codingThe final step in the coding of single pictures is to use Huffman coding for pairs of numbers resulting from the run-length encoding. Separate tables, stored in the encoder, are used for the run length and for the luminance values. The tables have been constructed from the statistics of typical luminance data.Summary of spatial codingThe coding for the chrominance is essentially the same as for the luminance, but different quantization tables and Huffman code tables are used, based on the statistics of typical sets of chrominance data and on relevant features of our perception of color.The overall coding process for a single picture is summarized below. The decoder, at the receiving end, carries out the same processes in reverse using the same set of tables as the encoder.

Page 28: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

28Arab Open University-Lebanon

Tutorial 12

Summary of spatial codingThe coding techniques can be divided into two categories:Reversible or lossless coding, for which the exact data can be recovered after decoding. Huffman and run-length encoding are examples. The DCT is also effectively reversible, although some errors are, in fact, introduced through rounding and other effects. Reversible coding preserves all the information contained in the signal.Non-reversible or lossy coding, which causes some information to be lost irrecoverably. The requantization, which reduces the number of bits per sample, is non-reversible.

Page 29: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

29Arab Open University-Lebanon

Tutorial 12Fig. Summary of spatial coding

Page 30: T305: DIGITAL COMMUNICATIONS Arab Open University-Lebanon Tutorial 121 T305: Digital Communications Block III – Video Coding

T305: DIGITAL COMMUNICATIONS

30Arab Open University-Lebanon

Tutorial 12

The coding of moving pictures – MPEG

When pictures are produced at a rate of 25 per second, there cannot be much change between one picture and the next as a scene evolves in time, except, occasionally, when the camera or video editor cuts abruptly from one scene to another.There is, therefore, a considerable amount of temporal redundancy which is exploited in MPEG by, in effect, only transmitting the differences between one scene and the next.MPEG-1 was designed for use in conjunction with compact disks (CDs) used for multi-media material and, although it cannot cope with the much higher video bit-rates required for broadcast television, it can cope with the much lower rates required for audio coding.