dvf 2 digital image video definitions

1

Digital Still Image and Digital Video Definitions DCT compression

Jouko Kurki, 6.1.2014

Copyright © Jouko Kurki, 2005-2014

References: Michael Robin and Michael Poulin: Digital television Fundamentals, 2nd ed., 2000, ISBN 0 07 135 581 2, Iain E.G. Richardson, H.264 and MPEG-4 Video Compression, Wiley, England, 2003, ISBN 0-470-84837-5. Jerry D. Gibson (ed), Multimedia Communications, Academic Press, 2001, ISBN 0-12-282160-2, Information from the Web, and other documents.

DVF_2_DigitalImage_Video_Definitions_Compression_DCT.ppt

2

Digital image • Digital image is formed by an array of equal size picture elements, PIXELS, or PEL’s • Pixels can be square or non-square (typical with video) • Each pixel holds a color value and brightness value. • The color value is presented by the used color space, e.g. RGB • In typical RGB video system each color element (R/G/B) is coded with 8 bits, and thus has 256 different

possible values 0..255. Thus there can be 256*256*256 = 23*8 ~ 16 million different colors and the color depth is said to be 3*8 = 24 bits.

Horizontal resolution, N pixels

Vertical resolution, M pixels

Pixel

3

Image size and resolution • In the picture there can be N horizontal and M vertical pixels, so total there would be

NxM pixels

• For a 10x13 cm digital camera the pixel count for a good quality snapshot would be e.g.: 1300 horizontal and 1000 vertical pixels, so total there would be NxM = 1300000 pixels = 1.3 Mega pixels. If each pixel is coded according to the 8-bit RGB color scheme, it would hold 1.3 M * 24 bits of information = 1.3 M * 3 Bytes = 3.8 MB of information.

• However typical file size from such a camera is approximately 0.5 MB, so where is the error ?

• The answer is that the image information is strongly compressed e.g. by JPEG-compression algorithm - to a faction of about 10 % !

• The density of pixels in the (printed) picture is called resolution. Bigger number of pixels in a certain pictures size makes the picture look more accurate. The unit of resolution pixels per inch (ppi). Printer resolution is defied as dots per inch (dpi), that defines the number of inks spots / inch. Typical number with inkjet printers is 300 dpi and suitable resolution of the pictures is 100 ppi.

4

Picture sizes, TV Monitor (CIF) and Computer Monitor (VGA) Resolution Formats

CIF Formats

Sub-QCIF

QCIF CIF

No of Pixels

128 x 96

176 x 144

352 x 288

VGA Formats

QQVGA QVGA VGA SVGA XGA SXGA UXGA HDTV QXGA

No of Pixels

160 x 120

320 x 240

640 x 480

800 x 600

1024 x 768

1280 x 1024

1600 x 1200

1920 x 1080

2048 x 1536

• The old PC monitor had a pixel size of 640x480, and aspect ratio 4:3. This serves still as basis for many NTSC video applications. In Europe (PAL world) numbers are slightly different for TV and many video applications

• Below is a summary of pixel sizes for many popular formats. Note that for TV (CIF) numbers may differ in North American and Europe.

• CIF = Common Intermediate Format: 352 x 288 picture size, for video 30 fps frame rate. Used e.g. for videoconferencing. 4CIF = 2x pixel count in both dimensions -> 720x576 (digital TV resolution; Quarter CIF ½ pixel count in both dimensions.

5

Color systems and Color Spaces

Color space is the system under which colors are defined, e.g. RGB-system is used in TV and PC-monitor to define the color. Two regions in our visual field that appear to have the same color need not have the same spectrum.

Color reproduction schemes rely on the fact that any color visible by humans can be approximated by the combination of a limited subset of visible light frequencies. The main color spaces have at least three dimensions:

•RGB (red, green, blue) •CMY (cyan, magenta, yellow) •HSB (hue, saturation, brightness) •HLV (hue, lightness, value) •XYZ (tristimulus)

Some printing schemes use more than three colors of ink: •CMYK (cyan, magenta, yellow, key) •CMYK+spot (cyan, magenta, yellow, key, special color) •Hexachrome™ (cyan, magenta, yellow, black, green, orange)

6

The additive color wheel (RGB color space)

The basic rules of additive color mixing: red + green = yellow green + blue = cyan blue + red = magenta red + green + blue = white

Additive color mixing: • Used in TV, PC-monitors etc. • Used to express the color: All possible colors are presented on the circle. Different colors

(hue or tint) are expressed as degrees on the color wheel: 0 o = red, 120 o = Green , 240 o = Blue. These are primary colors (in Finnish päävärit), the colors between the main colors are complementary colors (Fi välivärit).

• the absence of light is darkness, add light to it to create desired color. • Color are super positioned (lamp overlap) • small elements (TV pixels, halftones) • Usually 8 bits / color (24 bits total), thus 224 ~ 16.7 million colors

7

Subtractive color mixing

The basic rules of subtractive color mixing: cyan + magenta = blue magenta + yellow = red yellow + cyan = green cyan + magenta + yellow = black the

subtractive color wheel

• Used in printing while adding inks • The applied inks reflect certain wavelengths to give appearance of a desired color • In most cases, each of the four channels is a value between 0 and 255. While this provides up to 4,294,967,296 different colour combinations (32-bit), you actually only get the same number (16,777,216) of discrete colours as if you are using 24-bit colours, because many of the possible colour combinations duplicate each other. E. g grey can presented as: Cyan=128, Magenta=128, Yellow=128, Black=0 or Cyan=0, Magenta=0, Yellow=0, Black=128.

• CMYK is not able to present as many colours as RGB. • When most computer programs display a graphic that uses CMYK, they first convert the image to CMY by adding the value of the black channel to each of the other three and then removing the black channel. This CMY can easily be converted to RGB.

8

HSL Colon Space (Hue, Saturation, and Luminance)

The acronym stands for hue, saturation, and luminance. This method of describing colors is also known as HSB (hue, saturation, and brightness), HSI (hue, saturation, and intensity), or HSV (hue, saturation, and value). The hue describes the position on the spectrum where the color is located (angle on color wheel), with red at the low end of the spectrum and violet at the high end of the spectrum. This number can be either an 8-bit value (a number between 0-255), a percentage (0-100 percent), or a number between 0-359 (representing the degrees on a color wheel). The saturation describes how bright the color is, between gray at the low end and very bright at the high end. This number can be either an 8-bit value or a percentage. The luminance (intensity or brightness) describes where on the scale between black and white the color falls. This method of describing color is easy for many artists to use, and it is usually used only in the interface of a graphics program. Once the graphic is saved, it is converted to RGB, Palletized, or CMYK color. The only time this color definition method is used natively is by color television, where it is referred to as YUV (Y-signal, U-signal, and V-signal.) The Y-signal represents the intensity, and is the only part of the signal a black-and-white television set uses. The U- and V-signals define a color spectrum that a color television uses to choose which color to display each pixel.

9

CAMERA PICTURE / TV PICTURE FORMATION AND INTERFACES

• In cameras and color TV system the picture is broken to RGB-components in the still / video camera. This can be done by filters or prisms. The umber of CCD’s also vary: There can be 3 for high quality / professional video cameras. In digital still cameras and lower cost video cameras there is one CCD and a color filter in front of it for different color components of the picture.

10

RGB<=> YCrRb signal processing

The Matrix converts the RGB signal to Luminance (brightness) Y: Y = 0.587G + 0.299R + 0.114B and two color difference signals (Cr,, Cb) Cr = 0.713 (R-Y) = 0.500 R – 0.419G – 0.081B = 0.701R–0.587G – 0.114B ( V-signal) Cb = 0.564 (B-Y) = -0.169R – 0.331G – 0.500B ( U-signal)

The Y, Cr, and Cb signals are carried by the TV system to the TV receiver, that converts them back to RGB signals for display in TV

• Human vision system (HVS) says that human eye is less sensitive lacking detail in colors (Chrominance = Chroma) than for brightness (Luminance = Luma). To take advantage of this picture signal is divided to luminance and Chrominance signals for separate treatment in compression, storage and transport.

• The conversion RGB < - > YUV is linear and works to both directions. Display uses RGB.

11

R

G

B B-Y

R-Y

Y

Stereo Audio

L

R

C

Y

Y/C, S-Video Matrix

Composite video

R

G

B B-Y, U

R-Y, V

Y

Stereo Audio

L

R

C

Y

Component video,

YUV-signal; or Y, Cr, Cb signal Matrix

QAM Modulation and Modulation to IF-carrier

Luminance and Chroma combined in one cable

Y Cr Cb / Y/C / S-video Interface

Chroma

Luma

12

Digital Still Picture Standards

• Most important standards – JPEG (JPEG = Joint Photographers Expert Group)

• Picture compression to about 10 % by DCT compression and quantization

• Some degradation of quality • Very widely spread: Digital cameras, E-mail attachment etc.

– JPEG2000 • Picture compression by wavelets • Improves picture quality as compared to JPEG • Special applications, mapping, satellite pictures etc.

– GIF format • Used especially in animation and graphics on the web • 8-bit color (palette of colors), saves space

13

JPEG

14

Syntax of Non-hierarchical JPEG data

15

Picture and Video Transform Compression steps

1. In picture and video compression the compression is applied to component signals: Y, Cr and Cb. So this transform needs to be done fist to the picture data. Also the picture is divided to typically 8x8 sample BLOCK’s

2. Next step is color sub-sampling. This takes advantage of the Human eye lower capability to see fast color variations, so color signals Cr and Cb can be presented with less accuracy. The process is called Chroma Subsampling.

3. After that Y, Cr and Cb signals are transform coded by the Discrete Cosine Transformation (DCT).

– DCT is a discrete form of Fourier cosine transformation. Cosine transformation is Fourier Transformation in case the signal to be transformed is symmetric.

– In case of Fourier transform the signal to be transformed is usually in time domain (t) and frequency domain is presented as frequency f (Hz). In case of picture transform the picture is in spatial (dimensional) domain (e.g. marked with x), and instead of frequency; we talk about spatial frequencies. E.g. a lot of variation in the picture in a small area means that there are high spatial frequencies. Nevertheless the mathematics is quite the same !

16

Croma (color) sub-sampling

4:2:2 video (studio video) 4:2:0 video (Digital TV-system / Europe, DVD video disc, DV tape)

In colour (chroma) sub sampling the chroma signals is sampled at a lower frequency (or there are less chroma pixels). Most used systems are: 4:4:4 Here there are 4 Cr and Cb samples for one 4 Y samples, thus no chroma sub sampling. Best quality for most demanding studio use. 4:2:2 Here there are 2 Cr and Cb samples for 4 Y samples. This is normal system for a lot of studio work. 4:2:0 In this scheme there are one Cr and Cb sample for 4 Y samples (name is odd due technical reasons). This format is used in DV video compression, Digital TV and on DVD video disc. This is also recommended for JPEG digital still images.

17

DCT for JPEG & MPEG-2 compression • JPEG, MPEG-1/2 and MPEG-4 use Discrete Cosine transformation (DCT) for

compression. • In the method a 8x8 pixel area of the picture is taken at a time. This is called a

BLOCK. For chroma we consider 4:2:0 subsampling (used in JPEG, DTV and DVD).

• A Block of 8 x 8 pixels consists of (4:2:0): • 8 x 8 luminance pixels á 8 bits, and 4x4 chroma points á 8 bits for Cr Cb.

Other color sampling structures and block sizes Like 4x4) are also possible. • In 4:2:0 format 4 luminance blocks and 1 chroma block for Cr and Cb form a

Macroblock. This is formed so that color signal data can be handled in 8x8 pixel blocks for DCT, and same HW/SW used for Luma and Chroma. Example 8 x 8 pixel BLOCK of

pixel values (-128…+127)

18

Macroblocks and Scaling of Luminance Data DCT is the basic method used in JPEG still image compression as well as MPEG-1/2 and MPEG-

4 Visual profile video compression standards. • For compression the image is split into 8x8 pixel blocks (called BLOCK). DCT is applied to

these blocks to yield 8x8 blocks of DCT coefficients. • The DCT coefficients are calculated for Y, Cr and Cb separately Scaling of pixel data before and after DCT: • RGB signal values are normally presented with 8 bits. Some Professional Video applications

use 10 bits. • When applying DCT process to component video signal, Cr and Cb have binary values

between (+127 and -127) (due to calculation of these components) • To simplify DCT encoder / decoder the Luminance signals Y are downshifted by subtracting

128 from the luminance pixel values. At the decoder 128 is added to the pixel data • The DC coefficient after DCT process has value range -1024…1016 (8 * binary value); AC

coefficients have a range +-/1027. • A thing to take into account is that corresponding to the analogue video signal range: 0…700

mV, the binary signal has a range 16..235 according to CCIR-601. This allows some headroom. This value may vary between standards and systems. Care should be taken in compression / decompression systems to take this into account.

• DCT process is reversible. The accuracy of the calculation should be 13-14 bits to avoid round-off errors.

19

DCT coefficient calculation for JPEG and MPEG-1/2

F(u,v) = [C(u) C(v)] /4 { Σ Σ f(j,k) * cos[(2j+1)uπ/16] * cos[(2k+1)vπ/16] }

j=0..7 k=0..7

Discrete Cosine Transformation (DCT) operates n NxN size pixel block (X-matrix) and produces NxN size matrix of coefficients (Y-matrix); N=8 for JPEG and MPEG-1/2. The forward DCT (FDCT) JPEG and MPEG-1/2 is given by:

f(j,k) = Pixel values (luminance or chrominance) in the 8x8 pixel block. F(u,v) = coefficients of a 8x8 DCT block u = normalized horizontal frequency (0<u<7) v = normalized horizontal frequency (0<v<7) Scaling factor C(u) and C(v) is 1/√2 for v, u = 0, and 1 for u, v ≠ 0. E.g. when calculating F(0,7), we have C(0) =1 and C(7) = 1/√2.

F(0,0) is the DC coefficient: It is the sum of all pixel values in the block multiplied by 1/8. Pixel values can be max -128..127 so max. DC coefficient values are 1/8 * 64 *(-128) = -1024…1016. DC coefficient represent the average value of the block. The other coefficients represent variation in the block.

F(0,0) = 1/8 { Σ Σ f(j,k)} j=0..7 k=0..7

20

Weighting Table

• JPEG Standard Weighting Table -> for Luminance Q (u,v). In MPEG-1/2 weighting tables can be varied. Different weighting tables are used for Luminance and Crominance

•After calculating the DCT coefficient a Weighting Table is applied: Each precalculated DCT coefficient is divided by the corresponding value (same u and v indexes) in the weighting table.

•Weighting: the DCT coefficient table is divided pixel by pixel by the weighting table (different for Luma and Chroma) and the result is rounded to the nearest integer:

21

Quantizicing & Rounding •DCT itself is not lossy ! - But it enables lossy compression by low weighting of the high frequency components less than low frequency components (in effect this is low pass filtering !).

•As a result the detail in the picture is reduced, but also amount of data is reduced. Using Human Vision System (HVS) properties this can be done so that the degradation is only small for the visual quality but very big reduction of data rate can be achieved (in the order of 5:1).

•Weighting: the DCT coefficient table is divided pixel by pixel by the weighting table (different for Luma and Chroma) and the result is rounded to the nearest integer:

•After weighting the high frequency components get even smaller and many are rounded to zero.

22

Weighting Table Examples

Examples for JPEG standard weighting tables •In MPEG-2 weighting tables can be varied at frame level. Thus the tables need to be transmitted with the video data.

•Size of coefficients can be used to adjust picture data rate

23

DCT example (1) DCT coefficient calculation and weighting

24

DCT in detail – indexing (2) Original scaled luminance data of a 8x8 pixel macroblock, f(j,k)

Form of the matrix of resulting DCT coefficients, F(u,v), note capital letter as in Fourier transform

25

DCT coefficient calculation

F(u,v) = [C(u) C(v)] /4 { Σ Σ f(j,k) * cos[(2j+1)uπ/16] * cos[(2k+1)vπ/16] } Scaling factor C(u) and C(v) is 1/√2 for v, u = 0, and 1 for u, v ≠ 0.

j=0..7 k=0..7

General DCT equation for JPEG and MPEG-1/2 reads as:

Example procedure for calculation: When calculating ONE DCT coefficient F(u,v) we can first put j=0, and then calculate the inner summation. In starting that we take a pixel value f(j,k)=f(0,0) [left top value in pixel block] and multiply that by the two cosine terms having appropriate v and u values, j=0 and k is first 0. Then we repeat that for k=1… 7 and all these values are summed together. That is the first inner summation, i.e. sum of 8 terms. Next we give j the next value j=1 and again calculate all 8 sum terms and sum them together. This is repeated for all j=0..7, i.e. 8 times. After that we sum all the eight inner sums together and get the sum of all 64 terms; then multiply the result by scaling factor [C(u) C(v)] /4. All this is repeated for the 64 coefficients of the 8x8 DCT coefficient matrix.

F(u,v)= [C(u) C(v)] /4 { Σ [ Σ f(j,k) * cos[(2j+1)uπ/16] * cos[(2k+1)vπ/16]] }

j=0..7 k=0..7

E.g. when calculating F(0,7), we have C(0) =1 and C(7) = 1/√2. Putting brackets:

26

DCT coefficient calculation – DC coefficient

DCT calculation of a 8x8 pixel block produces a 8x8 DCT coefficient matrix. DC coefficient: This is DCT coefficient F(0,0), which is in the upper left corner of the DCT matrix. For F(0,0) U and v=0 so both cosines are cos(0) =1. What is left in inner sum is sum of all pixel coefficient f(j,k). This is repeated for all rows and the result is the sum of all pixel values, scaled.

F(0,0) = [1/√2*1/√2] /4 { Σ Σ f(j,k) * cos[(2j+1)0 π/16] * cos[(2k+1)0π/16] }


j=0..7 k=0..7

j=0..7 k=0..7

i.e. F(0,0) is the sum of all pixel values in the block multiplied by 1/8. Pixel values can be max 127 so max. DC coefficient value is 1/8 * 64 *127 = 1016. DC coefficient represent the average value of the block. The other coefficients represent variation in the block.

F(0,0) = 1/8 { Σ Σ f(j,k)} j=0..7 k=0..7

27

Calculation of coefficient for u or v=0 (1/2)


j=0..7 k=0..7

For u=0 or v=0 on of the cosine terms becomes zero ( cos(0) =1 ). This allows simplification. E.g. if v=0, we get:

F(u,0) = [C(u) C(0)] /4 { Σ Σ f(j,k) * cos[(2j+1)uπ/16] } j=0..7 k=0..7

In calculation of the inner sum: j = constant, so the cosine term is the same for all terms (and can be takes as common factor); and then we have left sum of pixel values with k=0..7, i.e. sum of one column's pixel values multiplied by one cosine factor. This is repeated for all 8 rows (j=0..7) and the sums are then summed together and scaled.

28

Calculation of coefficient for u or v=0 (2/2)


k=0..7 j=0..7

This becomes useful when calculating coefficients with u=0:

F(0,v) = [C(0) C(v)] /4 { Σ Σ f(j,k) * cos[(2k+1)vπ/16] } k=0..7 j=0..7

Again in the calculation of the inner sum now k = constant, so the cosine term is the same for all terms (and can be takes as common factor); and then we have left sum of pixel values with j=0..7, i.e. sum of one row’s pixel values multiplied by one cosine factor. This is repeated for all 8 columns (k=0..7) and the sums are then summed together and scaled. In cases where u and v are both ≠ 0,these procedures cannot be used.

Mathematically in the DCT equation we can exchange the summation order in DCT calculation (after all we calculate all the coefficients according to equation, but summation order does not matter), so:

29

Review of DCT process

DCT coefficient calculation and weighting

1. Take 8x8 pixel data block of Y, Cb or Cr values f(j,k).

2. Calculate DCT coefficient matrix F(u,v), 8x8 values. This can be considered as a two-dimensional Fourier transform of the pixel data block. Calculation accuracy around 14 bits needed, Values are rounded to nearest integer.

3. Divide each value of the DCT matrix by corresponding value in weighting table and round to nearest integer -> Normalized and quantiziced DCT coefficient values. These are stored or transmitted in applications.

4. Received (decoder) does the reverse process. Instead of Forward DCT, Inverse DCT. But equation for IDCT is the same as forward DCT !

30

Example of calculation of one DCT coefficient Calculation of F(3,0) = F(u,v) => u=3, v=0 Pixel data f(j,k)

Weighting Table Q(u,v)

After DCT process we get F(u,v) = 157.6, as rounded 158. This is divided by weighting table number Q(3,0) = 16 to give normalized and quantized DCT coefficient of Fq(3,0) = 2

C(u) = 1 for u or v ≠ 0; C(v) = 1/√2 for u or v=0

31

Zig-Zag scanning, RLC, VLC coding After performing DCT the

remaining bit stream is still reduced:

1. RLC = Run Length Coding. Here runs of zeros are presented by the number of zeros, and the next digit. Instead of transmitting zeros at the end End Of Block (EOB) marker is inserted.

2.Variable length coding. The more often appearing combinations are coded with less digits.

Note: Here Zig-Zag scan is for

JPEG !

32

Zig-Zag scanning, RLC, VLC coding

Note: Here Zig-Zag scan is for JPEG !

33

VLC coding

34

What happens in DCT ? • We calculate the F(u,v) coefficients. • The top left corner coefficient is F(0,0). Thus u=v=o, and the cosines in the DCT equation are

zero. Also by the DCT equation: F(0,0) = 1/8 * (sum of all pixel values). This represent the average of luminance of over the whole block for Luminance and macroblock for Chrominance) x 8 (so max. range comes -1024…+1016). F(0,0) is the DC coefficient of the block / macroblock (DC = Direct Current, compare with Fourier transform).

• The AC- coefficients (AC=Alternating Current) in turn measure correlation of the bit pattern of each row or column against the given cosine wave with used u and v values. If the picture varies according to this cosine pattern we get a large DCT coefficient value, if not value is small. If the variation is antiphase the component is negative.

• Most of the AC coefficient (before rounding) are zero. What does that mean ? It means that the picture does not have a lot of fast variation. An even area of picture with same color, gives just DC-coefficients for one for luma (Y) and the two Chrominance components (Cr and Cb). I.e. the matrix of 3 x 64 values is transformed to 3 values; and no error is made !! This is because spatial components are represented by frequency components and for that kind of picture there are just the DC-coefficients. In this cases we achieve 64:1 compression without making an error !

• However in quantization and rounding we deliberately cause some error. In weighting we divide the usually small AC-coefficient with relatively large values. These often yield zero values after rounding to nearest integer. Thus in weighting we can cause some error which is reduction of the value of high order high frequency components. In effect this means low pass filtering where the detail of the picture is reduced.

35

Illustration of the DCT-coefficients with DCT patterns

4x4 DCT basic pattern used in Advanced Video Coding (AVC) for MPEFG-4 part 10 / H.264.

8x8 DCT basic pattern

36

DCT based compression process, continued • Depending on the detail of the picture the DCT process can give a varying

amount of data. To make things work for a fixed size transmission a buffer is needed. In addition a feedback control adjusting the weighting parameters is needed. E.g. for a vivid picture larger quantization values would be used resulting in reduction of picture quality.

• Example might be sports programs where quality could be clearly degraded if not enough transmission capacity is available.

• In MPEG-2 the weighting tables can be varied on frame level. Naturally the weighting table values need to be transmitted within the data.

37

Full chain of MPEG-1/2 compression

38

DCT decoder

Inverse DCT is similar to Forward DCT with DCT coefficients replaced by sample values ! – same hardware/software can be used for both operations.

First a reverse quantization process is done: every DCT coefficient is multiplied by the weighting table values Q(u,v). Second step is Inverse DCT.

39

Errors in video compression Errors can be calculated as differences between the original picture and encoded video as follows.

In practice the Root Mean Square Error (RMSE), and Peak Signal to Noise Ratio (PSNR) are calculated

Here we calculate error for each pixel value, and calculate the rms-error based on these and scale to the maximum value of 255 (for 8-bit video).

40

JPEG-2000 compression: Wavelet (1) Wavelet is the newer Transform technique used e.g. in JPEG2000 Idea is based on decomposition of the picture data to higher and lower frequency band successively; i.e. spectral band formation

41

Wavelet (2)

First level decomposition

Sub-band structures in verification model

Refinement of decomposition

42

Visualization of Wavelet Transform

43

Picture after wavelet decomposition

44

VIDEO DEFINITIONS

45

Video is sequence of pictures

•The aspect ration is = physical picture width/picture height. For normal TV this is 4:3 and for widescreen TV 16:9. The shape of the pixels can be rectangular (normal 4:3 TV), or rectangular (widescreen TV). Thus the pixel aspect ratio does not need the be the same as the physical aspect ratio.

•Picture size can also determined as a physical size, e.g. 50 cm x 67 cm. Resolution is then sometimes defined as pixels /inch or pixels /cm. In printers the common definition is dots/inch (dpi).

In video we need to define spatial resolution of one picture of the sequence and temporal resolution, i.e. how many pictures /second are transmitted The accuracy of the picture is called resolution, and it is defined as a number of horizontal and vertical picture elements: Pixels. E.g. resolution of a European (PAL) TV is 576x720 (576 vertical lines, each line containing 720 pixels, so there are total about 0.415*106 pixels = 0.415 Mega pixels =0.415 MPix.

46

Progressive (p) and Interlaced video (i) •The other definition of video concerns repetition of the frames. Possibilities for transmission of frames are progressive mode (p) and interlaced mode (i)

•Frame rate is the number of full frames transmitted and displayed per second, unit Frames / second = fps or f/s. In European (PAL) TV the frame rate is 25 fps, and North American TV system (NTSC) 60 fps.

•The video frames can be full frames containing all information of the frame. This is called progressive video, abbreviated p. E.g. marking 25p means progressive video with frame repetition rate of 25 fps.

•In Interlaced video the frame is split into two fields: top field containing e.g. odd numbered lines (or pixels), i.e. lines 1, 3, 5…, and bottom field containing even numbered lines, i.e. 2, 4, 6. Field are transmitted at twice the frame rate. In this way for the eye the visual repetition rate appears as 50 Hz while the full frame is produced at a frequency of 25 Hz.

47

PAL- and NTSC-TV resolutions

PAL = European standard, also used in Far East. NTSC: North America and Japan. PAL and NTSC differ in number of lines, frame rate and on how color signal is modulated. Different versions of PAL differ mainly on the sound carrier freq. Digital sampling freq same for both, 13.5 MHz Same number of horizontal lines - No horizontal interpolation needed in conversion, but careful

design in line blanking needed. Conversion is problem for consumer. Vertical interpolation or eqv. Needed.

Widescreen TV (Aspect ration 16:9) uses the same 720 horizontal pixels as 4:3 TV but pixels are stretched (non-square), pixel aspect ratio 1.422 vs. 1.067 in 4:3 TV.

858 total lines, 720 active horizontal pixels

864 total lines, 720 active horizontal pixels

Aspect ratio 4:3 (Clean aperture pixel array 690x566)

Aspect ratio 4:3 (Clean aperture pixel array 708x480)

NTSC 525 lines / 59.94 Half-frames / second / 29.97 fps

PAL 625 lines /50 half-frames / second/ 25 fps

625 total lines:

576 lines of picture information

525 total lines:

480 lines of picture information

48

ITU-R BT.601-5 specification for NTSC (30 Hz) and PAL (25 Hz) video

• Uncompressed SD quality digital video bit rate: 216 Mb/s. This does not go through any bitpipe. So reduction of data is very necessary.

• HD video in uncompressed format takes > 1 Gb/s. Very efficient compression is key for making this practical for consumers.

• BUT: Technologies now exist to do this – and at low cost !

49

Video bitrate and compression • Video is a sequence of multiple pictures. The picture (or frame) rate is typically

10-30 frames / second (fps).

• In PAL TV frame rate is 25 fps and resolution 576x720 pixels (eye see this as constant movement). Low quality videoconferencing some 10-15 fps (jerky picture) at resolution of 176x144 pixels (QCIF).

• Imagine TV-resolution. The amount of picture data is 576*720 pixels x 3 x 8 bits (each color component RGB coded with 8 bits) = 0.415 Mpixels x 24 bits/pixel = 9 953 280 bits = 1.244 MB.

• When this is repeated at 25 fps the bitrate is 1.244 MB * 25 /s = (1.244 *8) *25 ~ 250 Mbit/s !!

• This is huge, so compression is necessary. The current technology is to use MPEG-2 (MPEG = Motion Pictures Expert Group) compression in digital TV system (same also on DVD). This compresses this data to less than 5 Mb/s – a ratio of 50 !

• Improvement in compression (to about 100:1) have been obtained with newer codecs like e.g. MPEG-4 AVC (=H.264) and Windows Media. Note, however, that compression efficiency is not the only variable: E.g. needed compression and decoding (decompression) processor power is also important parameter for mobile devices.

50

Digital Video Compression – key enabler for digital video applications.

• Reference point: Uncompressed video in PAL Standard TV resolution (SD) is 216 Mb/s and High Definition video about 1.2 Gbit /s !

• Thus efficient compression of video withtout major degradation of quality is key for applications.

• Picture quality vs. compression ratio – basic estimates:

– 10:1 20 Mbits High definition (MPEG-2)

• 20:1 10 Mbits Enhanced definition.

• 40:1 5 Mbits PAL Digital TV, and DVD (MPEG-2)

• 100:1 2 Mbits VHS-quality

• Newer technologies, e.g H.264, Microsoft Windows Media, Apple Quick Time, Real’s Real Video achieve even better compression efficiencies.

• In compression, efficiency and cost of hardware, battery power consumption, quality of video and transfer bitrate /required storage space are related. In general the older methods (MPEG-1/2), are not so efficient, but playback also works on older computers with less CPU power.

• In audio the compression ratios are around 10:1 (e.g. MP3). Standard formats used today in Digital TV and DVD (stereo), MPEG-1 layer II (224 kb/s) and Dolby Digital (AC-3) around 192 kb/s. Some newer technologies (MPEG-4, Windows media etc.) achieve even better efficiencies.

• However, for wireless transmission high compression ratio is the key thing due to the limited bandwidth of the transmission channel.

51

Video compression - workflow and tools

Key things: Spatial and temporal correlation of picture data Flow of motion - > Motion vectors

Video -> Encoding -> -> Transmit or Storage -> -> Decoding -> -> ~Original video

52

VIDEO COMPRESSION STEPS 1. Data reduction due to colour sub-sampling. Due to Human vision system resolution

of colours need to be roughly 50 % as compared to luminance signal. Common format is 4:2:0 where from studio video of approximately 216-249 Mb/s –> 124 Mb/s : 2:1 compression.

2. Data reduction due to Intraframe compression using Discrete Cosine Transformation (DCT): 124 Mb/s -> to approximately 25 Mb/s (data rate of DV video): 5:1 compression

3. Data reduction using the similarity between successive frames: E.g. in PAL MPEG-2 a Group Of Pictures (GOP) block consisting of 12 frames is coded using motion estimation and interpolation in MPEG-2). Data reduced from approx. 25 Mb/s -> 5 Mb/s, so 5:1 compression

• Total compression 2 * 5 * 5:1 = 50:1 ! • Newer Compression tools (MPEG-4 Part10 / H.264, Windows Media, Real Media,

Quick Time) achieve even better efficiency up to about 100:1.

53

Temporal Spatial video coding

• Spatial compression (Intraframe) is based on Discrete Cosine Transformation (DCT) in most video standards

• Temporal compression is based on motion estimation and transmission of motion vectors

• The coding process for MPEG-1 and MPEG-2 is in many respects similar as for JPEG: Same DCT process, quantization, RLC and VLC. For video temporal redundancy coding by motion estimation is the main new element.

54

Croma (color) sub-sampling

4:2:2 video (studio video) 4:2:0 video (Digital TV-system / Europe, DVD video disc, DV tape)

In colour (chroma) sub sampling the chroma signals is sampled at a lower frequency (or there are less chroma pixels). Most used systems are: 4:4.4 Here there are 4 Cr and Cb samples for one Y sample, thus no chroma sub sampling. Best quality for most demanding studio use. 4:2:2 Here there are 2 Cr and Cb samples for one Y sample. This is normal system for a lot of studio work. 4:2:0 In this scheme there are 2 Cr and Cb samples for one Y sample (name is odd due technical reasons). This format is used in DV video compression, Digital TV and on DVD video disc. This is also recommended for JPEG digital still images.

55

CODING OF COMPOSITE

VIDEO SIGNALS AND CHROMA SUBSAMPLING STRUCTURES

Y Cr Cb

4:4:4

For each four Y sample points, four Cr and Cb sample points

Y Cr Cb

4:2:2

For each four Y sample points, two Cr and Cb sample points

Y Cr Cb

4:1:1

For each four Y sample points, one Cr and Cb sample point

Y Cr Cb

4:2:0

For each four Y sample points, two Cr or Cb sample points in turn The Cr and Cb sample points are taken from underlying lines

56

Intraframe compression - DCT

• The first key thing in video compression is Intraframe Compression, i.e. reducing picture data within one video frame.

• Technology Used is Discrete Cosine Transform (DCT). This is the same that is used for JPEG.

• Difference in MPEG methods is that different weighting tables and zig-zag scan may be used.

• Video compression systems also generally include a feedback system for adjusting quantization level to maintain constant bitrate.

• For DCT – see previous slides on the DCT process

57

Removing Temporal Redundancy • Next step in compression: Reduction of the temporal variation between successive frames. • Idea: In video the successive pictures have often relatively little change. Then we transmit

only first full frame, and for next frames only change is transmitted. • In MPEG-2 this is done typically in groups of 12 frames for PAL (15 frames for NTSC). This

is called Group of Pictures= GOP. These mean about 0.5 seconds of video. • In Windows media, the GOP length can be several seconds, e.g. 3-5 seconds. • What do we lose ? In quality theoretically nothing, since the differences to previous pictures

can be nearly perfectly transmitted. However, associated motion search takes a lot of computer power.

• But: If there are transmission errors then the several of the pictures in a GOP would be degraded resulting in severe degradation of picture quality. This gets worse with longer GOP (compare Windows Media and MPEG).

• To reduce these effects, a very strong error correction is needed in the transmission system. A Quasi Error Free (QEF) channel is needed (BER ~ 10-12) resulting in < 1 error /hr.

58

Temporal data reduction

Frame 1

Difference

Frame 2

• Successive video frames typically differ from each other only to some extent. The biggest change is often movement; i.e. parts of the picture have moved to different positions.

• The example shows that purely subtracting successive pictures results in dramatically reduced amount of data.

• This process is called Differential PCM. Many modern codec’s use combination of DPCM and DCT to achieve good coding efficiency (MPEG-1&2, MPEG-4 etc.) but add a technology called motion estimation to improve efficiency.

59

Differential PCM for video Calculate the difference between successive frames Difference is transmitted. Straightforward but does not comply with today's requirements in efficiency. However, DPCM is used for DC- coefficients of successive macro blocks.

60

Motion estimation • Motion estimation is key process for temporal data reduction and take advantage of the

similarity of successive video frames. • The Idea of Motion estimation is to form a new picture based on previous picture where

every macroblock (16x16 pixels in MPEG-1/2) has been moved to new positions in the current picture. This forms an improved estimate for the new picture.

• Motion estimation is applied for the Luminance signal (only). Same movement assumed for Chroma.

• The process results in motion vectors for every macroblock, indicating the position of each macroblock in the next frame based on position in previous frame. This estimate is then reduced from the previous frame and transmitted (in from of DCT coefficients) together with motion vectors to the receiver.

61

Model for DPCM / DCT video compression with motion compensation

This is the main method used e.g. im MPEG-1 &2 and MPEG-4. Step 1: Intraframes every z frames (typical 12 frame in PAL video) Step 2: Motion estimation for each macroblock -> Prediction for next frame Step 2: Subtract prediction from current frame. Residual contains only little information Step 4: Apply DCT for the residual Step 5. Transmit residual as DCT coefficients, and motion vectors Decoder does the reverse.

62

Motion vector search •Effective search for motion is key thing for a codec.

•Effectiveness depends on search area and accuracy of motion estimation

•As example in MPEG2, search area is 64x64 pixels for a 16x16 pixel macroblock and accuracy of motion estimation is 0.5 pixels.

•Motion estimation can be based to block matching where current block is compared to all possible search positions in the previous frame.

• The rms error in each position is calculated and the best position is the one with smallest error.

•Intelligence in motion estimation is a differentiating factor between codecs.

63

Types of MPEG Video Frames

• Intraframes (I): Compression is based e.g. on DCT. Only one frame is treated. These frames take most space, e.g. in PAL Digital TV if video consists only of I-frames bitrate would be about 25 Mb/s.

• Predicted frames (P): These frames are based motion estimation. Prediction is based on previous I or P frame.

• Bidirectional frames. Compression is based on motion estimation (kind of interpolation) between previous and next I, P and B frames. These frames take least space.

• All combined on a 12 pictures Group of Pictures (GOP) block results in about 5 Mb/s bitrate in PAL Digital TV.

64

Group of Pictures (GOP) structures in MPEG-2 •In MPEG-2 broadcast video and DVD pictures are compressed in groups. One such group is called Group Of Pictures (GOP). Typically there are 12 pictures in the GOP for PAL and 15 for NTSC.

•The first frame is always the Intraframe (I). The successive frames are based on predictions from the intraframe (P od B pictures).

•There are several possibilities for the GOP structures depending on many factors. Typical for PAL TV/DVD is 12 frame GOP:

IBB PBB PBB PBB

65

DCT based compression process, continued • Depending on the detail of the picture the DCT process can give a varying

amount of data. To make things work for a fixed size transmission a buffer is needed. In addition a feedback control adjusting the weighting parameters is needed. E.g. for a vivid picture larger quantization values would be used resulting in reduction of picture quality.

• Example might be sports programs where quality could be clearly degraded if not enough transmission capacity is available.

• In MPEG-2 the weighting tables can be varied on frame level. Naturally the weighting table values need to be transmitted within the data.

66

Full chain of MPEG-1/2 compression

67

Overview of Digital Video Standards

Copyright © Jouko kurki, 2005-2006

Based On Michael Robin and Michael Poulin: Digital television Fundamentals, 2nd ed., 2000, ISBN 0 07 135 581 2, Iain E.G. Richardson, H.264 and MPEG-4 Video Compression, Wiley, England, 2003, ISBN 0-470-84837-5. Jerry D. Gibson (ed), Multimedia Communications, Academic Press, 2001, ISBN 0-12-282160-2, Information from the Web, and other documents.

68

Summary of TV Monitor (CIF) and Computer Monitor (VGA) Resolution Formats

CIF Formats

Sub-QCIF

QCIF CIF

No of Pixels

128 x 96

180 x 144

360 x 288

VGA Formats

QQVGA QVGA VGA SVGA XGA SXGA UXGA HDTV QXGA

No of Pixels

160 x 120

320 x 240

640 x 480

800 x 600

1024 x 768

1280 x 1024

1600 x 1200

1920 x 1080

2048 x 1536

• Size of the picture determines the amount of data, so this together with compression scheme define the need for transmission bitrate.

• The old PC monitor had a pixel size of 640x480, and aspect ratio 4:3. This serves still as basis for many NTSC video applications. In Europe (PAL world) numbers are slightly different for TV and many video applications.

• Below is a summary of pixel sizes for many popular formats. Note that for TV (CIF) numbers may differ in North American and Europe.

• CIF = Common Intermediate Format: 360 x 288 picture size, for video 30 fps frame rate. Used e.g. for videoconferencing. 4CIF = 2x pixel count in both dimensions -> 720x576 (digital TV resolution; Quarter CIF ½ pixel count in both dimensions.

69

H.261 and H.263 (ITU-T) video standards

• H.261 first widely used standard for videoconferencing over ISDN circuit switched network. Bitrate nx 64 kb/s. Hybrid DPCM / DCT model for compression integer accuracy motion compensation

• H.263, better compression than H.261, supports basic quality video at 30 kb/s. Designed to operate over circuit and packet switched networks. Uses hybrid DPCM / DCT model for compression with half pixel motion compensation.

• The baseline H.263 coding model was adopted as core of MPEG-4 Visual Simple profile.

• There are also H.263+ and H.263++ standards with enhanced characteristics (see Richardson).

70

MPEG-1 and MPEG-2

• Use DCT for compression. The process is basically the same for JPEG still images and MPEG-1/2 video.

• Good compression efficiency with low processor power. • MPEG-2 dominating in Digital TV, DVD, and High Definition TV (HDTV)

• However, newer compression standards, like MPEG-4 part 2 and part 10,

achieve better compression and have other properties, like error resilience, digital rights management, etc. that make them better suitable for mobile applications

• Due to better compression efficiency MPEG-4 may become the standard also for HDTV.

71

MPEG-1 CD-ROM storage compression standard (Video CD)

Stereo Audio standards: MPEG1 layer 1 (high quality), layer 2 (audio data transmission over networks; DVD, and Digital TV); layer 3 (=MP3); music delivery to many aplications

MPEG-2 DVB (digital video broadcasting) and DVD (digital versatile disk) compression standard. Also profile for HDTV ~ 20 Mb/s with MPEG-2

MPEG-3 Originally meant for HDTV, but HDTV was made part of MPEG-2

MPEG-4 Effective object based video compression technology for natural and synthetic audio and video. Audio and video streaming and complex media manipulation. Great promises for mobile multimedia. Scales from 30 kbps to HDTV quality (HDTV ~ 8 Mb/s with MPEG-4).

MHEG-5 Multi-media hypermedia standard (MPEG 4 for set top boxes).

MPEG-7 Standard for content identification.

MPEG-21 Network quality, content quality, conditional access rights (multimedia umbrella standard).

M-PEG Set of Standards

MPEG = Moving Pictures Experts Group MPEG standards are ISO standards, ISO = International Standards Organization

72

MPEG-4 Part 10 and H.264

• MPEG-4 Part 10 is the newest addition to MPEG-4, standardized in 2003.

• Achieves very high compression efficiency for Natural Video

• Potential applications in Mobile / Wireless / 3G

• Also strongest proposal for Mobile-TV using DVB-H as a carrier

• H.264 is much like MPEG-4 Part 10, but simpler, has only 3 profiles.

• H.264 video format is 4:2:0. It supports Progressive and Interlaced video.

73

Some compressed video bitrates and applications

Compression standard

MPEG-1 MPEG-2 DV /HDV H.264 / MPEG-4 part 10

Media Hard disk, CD-ROM, tape etc.

Hard disk, DVD disk, tape, Flash memory

DV cassette 60 min (small), DV cassette 180 min (large) Flash memory

Hard disk, DVD disk, tape, Flash memory

Bitrate 1.5 Mb/s Typ. 3-7 Mb/s 25 Mb/s 2 Mb/s for SD and 8 Mb/s for HD video. 0.3 Mb/s for Mobile video.

Typical storage capacity

1 2hr movie on a 600 MB CD-ROM

1 2hr movie on a 4.7 GB DVD disk

1/2 hrs of video on a small/large DV casette

One HDTV-movie on a HD-DVD (15/30 GB) or Blue ray (25/50 GB) single /dual layer disc.

Applications

Backup of VHS quality video, training videos, Video clips on the WEB

High quality movie distribution via DVD and Digital TV brodcasting, HDTV (20 Mb/s)

Videocamera shooting format; video editing format. Broadcast news gathering, corporate / business use, Home videos, education etc. HDV is consumer High Definition format.

Future HD-TV Broadcasting format, news gathering, Mobile video format. Format for High definition video discs.

dvf 2 digital image video definitions

Documents