lecture - spring/95 - joensuucs.joensuu.fi/pages/franti/imagecomp/comp.doc · web viewlet us next...

UNIVERSITY OF JOENSUUDepartment of Computer Science

Lecture notes:IMAGE COMPRESSION

Pasi Fränti

Abstract: The course introduces to image compression methods in the case of binary, gray scale, color palette, true color and video images. Topics include Huffman coding, arithmetic coding, Colomb coding, run-length modeling, predictive modeling, statistical modeling, context modeling, progressive image decompression, transform-based modeling, wavelets, vector quantization, and fractal-based compression. Existing and forth-coming standards such as JBIG1, JBIG2, JPEG, JPEG-LS and JPEG-2000 will be covered. Main emphasize is on the compression algorithms in these standards.

Joensuu9.9.2002

TABLE OF CONTENTS:

IMAGE COMPRESSION"A picture takes more than thousand bytes"

1. INTRODUCTION........................................................................................................................................

1.1 Image types..................................................................................................................................................1.2 Lossy versus lossless compression................................................................................................................1.3 Performance criteria in image compression...................................................................................................

2 FUNDAMENTALS IN DATA COMPRESSION..........................................................................................

2.1 Modelling.....................................................................................................................................................2.2 Coding..........................................................................................................................................................

3 BINARY IMAGES........................................................................................................................................

3.1 Run-length coding........................................................................................................................................3.2 READ code..................................................................................................................................................3.3 CCITT group 3 and group 4 standards..........................................................................................................3.4 Block coding................................................................................................................................................3.5 JBIG.............................................................................................................................................................3.6 JBIG2...........................................................................................................................................................3.7 Summary of binary image compression algorithms.......................................................................................

4 CONTINUOUS TONE IMAGES..................................................................................................................

4.1 Lossless and near-lossless compression.........................................................................................................4.2 Block truncation coding................................................................................................................................4.3 Vector quantization......................................................................................................................................4.4 JPEG............................................................................................................................................................4.5 Wavelet........................................................................................................................................................4.6 Fractal coding...............................................................................................................................................

5 VIDEO IMAGES...........................................................................................................................................

LITERATURE..................................................................................................................................................

APPENDIX A: CCITT TEST IMAGES............................................................................................................

APPENDIX B: GRAY-SCALE TEST IMAGES................................................................................................

2

1. Introduction

The purpose of compression is to code the image data into a compact form, minimizing both the number of bits in the representation, and the distortion caused by the compression. The importance of image compression is emphasized by the huge amount of data in raster images: a typical gray-scale image of 512´512 pixels, each represented by 8 bits, contains 256 kilobytes of data. With the color information, the number of bytes is tripled. If we talk about video images of 25 frames per second, even a one second of color film requires approximately 19 megabytes of memory, therefore the capacity of a typical hard disk of a PC-machine (540 MB) can store only about 30 seconds of film. Thus, the necessity for compression is obvious.

There exists a number of universal data compression algorithms that can compress almost any kind of data, of which the best known are the family of Ziv-Lempel algorithms. These methods are lossless in the sense that they retain all the information of the compressed data. However, they do not take advantage of the 2-dimensional nature of the image data. Moreover, only a small portion of the data can be saved by a lossless compression method, and thus lossy methods are more widely used in image compression. The use of lossy compression is always a trade-off between the bit rate and the image quality.

1.1 Image types

From the compression point of view, the images can be classified as follows:

· binary images· gray-scale images· color images· video images

An illustration of this classification is given in Figure 1.1. Note that the groups of gray-scale, color, and video images are closely related to each other, but there is a gap between the binary images and the gray-scale images. This demonstrates the separation of the compression algorithms. The methods that are designed for the gray-scale images, can also be applied to the color and video images. However, they usually do not apply to the binary images, which are a distinct class of images from this point of view.

For comparison, Figure 1.1 also shows the class of textual data. The fundamental difference between images and e.g. English text is the 2-dimensionality of the image data. Another important property is that the gray-scales are countable, which is not true for English text. It is not evident that any subsequent symbols, e.g. 'a' and 'b' are close to each other, whereas the gray-scales 41 and 42 are. These properties distinct the image data from other data like English text.

Note also that the class of color-palette images appears on the border line of image data and the non-image data. This demonstrates the lack of countable alphabet of the color-palette images, which makes them closer to other data. In fact, color-palette images are often compressed by the universal compression algorithms, see Section 4.7.

3

Figure 1.1: Classification of the images from the compression point of view.

1.2 Lossy versus lossless compression

A compression algorithm is lossless (or information preserving, or reversible) if the decompressed image is identical with the original. Respectively, a compression method is lossy (or irreversible) if the reconstructed image is only an approximation of the original one. The information preserving property is usually desirable, but not always obligatory for certain applications.

The motivation for lossy compression originates from the inability of the lossless algorithms to produce as low bit rates as desired. Figure 1.2 illustrates typical compression performances for different types of images and types of compression. As one can see from the example, the situation is significantly different with binary and gray-scale images. In binary image compression, very good compression results can be achieved without any loss in the image quality. On the other hand, the results for gray-scale images are much less satisfactory. This deficiency is emphasized because of the large amount of the original image data when compared to a binary image of equal resolution.

4

Figure 1.2: Example of typical compression performance.

The fundamental question of lossy compression techniques is where to lose information. The simplest answer is that information should be lost wherever the distortion is least. This depends on how we define distortion. We will return to this matter in mode detailed in Section 1.3.

The primary use of images is for human observation. Therefore it is possible to take advantage of the limitations of the human visual system and lose some information that is less visible to the human eye. On the other hand, the desired information in an image may not always be seen by human eye. To discover the essential information, an expert in the field and/or image processing and analysis may be needed, cf. medical applications.

In the definition of lossless compression, it is assumed that the original image is in digital form. However, one must always keep in mind that the actual source may be an analog view of the real world. Therefore the loss in the image quality already takes place in the image digitization, where the picture is converted from analog signal to digital. This can be performed by an image scanner, digital camera, or any other suitable technique.

The principal parameters of digitization are the sampling rate (or scanning resolution), and the accuracy of the representation (bits per sample). The resolution (relative to the viewing distance), on the other hand, is dependent on the purpose of the image. One may want to watch the image as an entity, but the observer may also want to enlarge (zoom) the image to see the details. The characteristics of the human eye cannot therefore be utilized, unless the application in question is definitely known.

Here we will ignore the digitization phase and make the assumption that the images are already stored in the digital form. These matters, however, should not be ignored when designing the entire image processing application. It is still worthwhile mentioning that while the lossy methods seem to be the main stream of research, there is still a need for lossless methods, especially in medical imaging and remote sensing (i.e. satellite imaging).

5

1.3 Performance criteria in image compression

The aim of image compression is to transform an image into compressed form so that the information content is preserved as much as possible. Compression efficiency is the principal parameter of a compression technique, but it is not sufficient by itself. It is simple to design a compression algorithm that achieves a low bit rate, but the challenge is how to preserve the quality of the reconstructed image at the same time. The two main criteria of measuring the performance of an image compression algorithm thus are compression efficiency and distortion caused by the compression algorithm. The standard way to measure them is to fix a certain bit rate and then compare the distortion caused by different methods.

The third feature of importance is the speed of the compression and decompression process. In on-line applications the waiting times of the user are often critical factors. In the extreme case, a compression algorithm is useless if its processing time causes an intolerable delay in the image processing application. In an image archiving system one can tolerate longer compression times if the compression can be done as a background task. However, fast decompression is usually desired.

Among other interesting features of the compression techniques we may mention the robustness against transmission errors, and memory requirements of the algorithm. The compressed image file is normally an object of a data transmission operation. The transmission is in the simplest form between internal memory and secondary storage but it can as well be between two remote sites via transmission lines. The data transmission systems commonly contain fault tolerant internal data formats so that this property is not always obligatory. The memory requirements are often of secondary importance, however, they may be a crucial factor in hardware implementations.

From the practical point of view the last but often not the least feature is complexity of the algorithm itself, i.e. the ease of implementation. Reliability of the software often highly depends on the complexity of the algorithm. Let us next examine how these criteria can be measured.

Compression efficiency

The most obvious measure of the compression efficiency is the bit rate, which gives the average number of bits per stored pixel of the image:

bit rate = (bits per pixel) (1.1)

where C is the number of bits in the compressed file, and N (=X×Y) is the number of pixels in the image. If the bit rate is very low, compression ratio might be a more practical measure:

compression ratio = (1.2)

where k is the number of bits per pixel in the original image. The overhead information (header) of the files is ignored here.

6

Distortion

Distortion measures can be divided into two categories: subjective and objective measures. A distortion measure is said to be subjective, if the quality is evaluated by humans. The use of human analysts, however, is quite impractical and therefore rarely used. The weakest point of this method is the subjectivity at the first place. It is impossible to establish a single group of humans (preferably experts in the field) that everyone could consult to get a quality evaluation of their pictures. Moreover, the definition of distortion highly depends on the application, i.e. the best quality evaluation is not always made by people at all.

In the objective measures the distortion is calculated as the difference between the original and the reconstructed image by a predefined function. It is assumed that the original image is perfect. All changes are considered as occurrences of distortion, no matter how they appear to a human observer. The quantitative distortion of the reconstructed image is commonly measured by the mean absolute error (MAE), mean square error (MSE), and peak-to-peak signal to noise ratio (PSNR):

(1.3)

MSE 1 2

1Ny xi i

i

N

(1.4)

, assuming k=8. (1.5)

These measures are widely used in the literature. Unfortunately these measures do not always coincide with the evaluations of a human expert. The human eye, for example, does not observe small changes of intensity between individual pixels, but is sensitive to the changes in the average value and contrast in larger regions. Thus, one approach would be to calculate the mean values and variances of some small regions in the image, and then compare them between the original and the reconstructed image. Another deficiency of these distortion functions is that they measure only local, pixel-by-pixel differences, and do not consider global artifacts, like blockiness, blurring, or the jaggedness of the edges.

7

2 Fundamentals in data compression

Data compression can be seen consisting of two separated components: modelling and coding. The modelling in image compressions consists of the following issues:

· How, and in what order the image is processed?· What are the symbols (pixels, blocks) to be coded?· What is the statistical model of these symbols?

The coding consists merely on the selection of the code table; what codes will be assigned to the symbols to be coded. The code table should match the statistical model as well as possible to obtain the best possible compression. The key idea of the coding is to apply variable length codes so that more frequent symbols will be coded with less bits (per pixel) than the less frequent symbols. The only requirement of coding is that it is uniquely decodable; i.e. any two different input files must result to different code sequences.

A desirable (but not necessary) property of a code is the so-called prefix property; i.e. no code of any symbol can be a prefix of the code of another symbol. The consequence of this is that the codes are instantaneously decodable; i.e. a symbol can be recognized from the code stream right after its last bit has been received. Well-known prefix codes are Shannon-Fano and Huffman codes. They can be constructed empirically on the basis of the source. Another coding scheme, known as Golomb-Rice codes, is also a prefix code, but it presumes a certain distribution of the source.

The coding is usually considered as the easy part of the compression. This is because the coding can be done optimally (corresponding to the model) by arithmetic coding! It is optimal, not only in theory, but also in practice, no matter what is the source. Thus the performance of a compression algorithm depends on the modelling, which is the key issue in data compression. Arithmetic coding, on the other hand, is sometimes replaced by sub-optimal codes like Huffman coding (or another coding scheme) because of practical aspects, see Section 2.2.2.

2.1 Modelling

2.1.1 Segmentation

The models of the most lossless compression methods (for both binary and gray-scale images) are local in the way they process the image. The image is traversed pixel by pixel (usually in row-major order), and each pixel is separately coded. This makes the model relatively simple and practical (small memory requirements). On the other hand, the compression schemes are limited to the local characteristics of the image.

In the other extreme there are global modelling methods. Fractal compression techniques are an example of modelling methods of this kind. They decompose the image into smaller parts which are described as linear combinations of the other parts of the image, see Figure 2.1. The global modelling is somewhat impractical because of the computational complexity of the methods.

8

Block coding is a compromise between the local and global models. Here the image is decomposed into smaller blocks which are separately coded. The larger the block, the better the global dependencies can be utilized. The dependencies between different blocks, however, are often ignored. The shape of the block can be uniform, and is often fixed throughout the image. The most common shapes are square and rectangular blocks because of their practical usefulness. In quadtree decomposition the shape is fixed but the size of the block varies. Quadtree thus offers a suitable technique to adapt to the shape of the image with the cost of a few extra bits describing the structure of the tree.

In principle, any segmentation technique can be applied in block coding. The use of more complex segmentation techniques is limited because they are often computationally demanding, but also because of the overhead required to code the block structure; as the shape of the segmentation is adaptively determined, it must be transmitted to the decoder also. The more complex the segmentation, the more bits are required. The decomposition is a trade-off between the bit rate and good segmentation:

Simple segmentation: Complex segmentation:

+ Only small cost in bit rate, if any. - High cost in the bit rate+ Simple decomposition algorithm. - Computationally demanding decomposition.- Poor segmentation. + Good segmentation according to image shape. - Coding of the blocks is a key issue. + Blocks are easier to code.

Figure 2.1: Intuitive idea of global modelling.

2.1.2 Order of processing

The next questions after the block decomposition are:

9

· In what order the blocks (or the pixels) of the image are processed?· In what order the pixels inside the block are processed?

In block coding methods the first question is interesting only if the inter-block correlations (dependencies between the blocks) will be considered. In pixel-wise processing, on the other hand, it is essential since the local modelling is inefficient without taking advantage of the information of the neighboring pixels. The latter topic is relevant for example in coding the transformed coefficients of DCT.

The most common order of processing is row-major order (top-to-down, and from left-to-right). If a particular compression algorithm considers only 1-dimensional dependencies (eg. Ziv-Lempel algorithms), an alternative processing method would be the so-called zigzag scanning in order to utilize the two-dimensionality of the image data, see Figure 2.2.

A drawback of the top-to-down processing order is that the image is only partially "seen" during the decompression. Thus, after decompression of 10 % of the image pixels, it is only little known about the rest of the image. A quick overview of the image, however, would be convenient for example in image archiving systems where the image database is browsed often to retrieve the desired image. A progressive modelling is an alternative approach to the ordering of the image to avoid this deficiency.

The idea in the progressive modelling is to arrange for the quality of an image to increase gradually as data is received. The most "useful" information in the image is sent first, so that the viewer can begin to use the image before it is completely displayed, and much sooner than if the image were transmitted in normal raster order. There are three basically different ways to achieve progressive transmission:

· In transform coding to transmit low-frequent components of the blocks first.· In vector quantization to begin with a limited palette of colors and gradually provide more information so that color details increase with time.· In pyramid coding a low resolution version of the image is transmitted first, following with gradually increasing resolutions until full precision is reached, see Figure 2.3 for an example.

These progressive modes of operation will be discussed in more detailed in Section 4.

Figure 2.2: Zigzag scanning; (a) in pixel-wise processing; (b) in DCT-transformed block.

10

Figure 2.3: Early stage of transmission of the image Camera (256´256´8);in sequential order (above); in progressive order (below).

2.1.3 Statistical modelling

Data compression in general is based on the following abstraction:

Data = information content + redundancy (2.1)

The aim of compression is to remove the redundancy and describe the data by its information content. (an observant reader may notice some redundancy between the course of String algorithms and this course). In statistical modelling the idea is to "predict" symbols that are to be coded by using a probability distribution for the source alphabet. The information content of a symbol in the alphabet is determined by its entropy:

H x p x log2 (2.2)

where x is the symbol and p x is its probability. The higher the probability, the lower the entropy, and thus the shorter codeword should be assigned to the symbol. The entropy H x gives the number of bits required to code the symbol x in an average, in order to achieve the optimal result. The overall entropy of the probability distribution is given by:

H p x p xx

k

× log2

1(2.3)

where k is the number of symbols in the alphabet. The entropy gives the lower bound of compression that can be achieved (measured in bits per symbol), corresponding to the model.

11

(The optimal compression can be realized by arithmetic coding.) The key issue is how to determine these probabilities. The modelling schemes can be classified into following three categories:

· Static modelling· Semi-adaptive modelling· Adaptive (or dynamic) modelling

In the static modelling the same model (code table) is applied to all input data ever to be coded. Consider text compression; if the ASCII data to be compressed is known to consist of English text, the model based on the frequency distribution of English text could be applied. For example, the probabilities of the most likely symbols in ASCII file of English text are p(' ')=18 %, p('e')=10 %, p('t')=8 % in an average. Unfortunately the static modelling fails if the input data is not English text, rather than binary data of an executable file, for example. The advantage of static modelling, is that no side information is needed to transmit to the decoder, and that the compression can be done by one-pass over the input data.

Semi-adaptive modelling is a two-pass method in the sense the input data is processed. In the first phase the input data is analyzed and some statistical information of it (or the code table) is sent to the decoder. At the second phase the actual compression is done on the basis of this information (eg. frequency distribution of the data) which is now known by the decoder, too.

Dynamic modelling takes one step further and adapts to the input data "on-line" during the compression. It is thus a one-step method. As the decoder does not have any prior knowledge of the input data, an initial model (code table) should be used for compressing the first symbols of the data. However, when the coding/decoding proceeds the information of the symbols already coded/decoded can be taken advantage of. The model for a particular symbol to be coded can be constructed on the basis of the frequency distribution of all the symbols that have already been coded. Thus both encoder and decoder have the same information and no side-information is needed to be sent to the decoder.

Consider the symbol sequence of (a, a, b, a, a, c, a, a, b, a). If no prior knowledge is allowed, one could apply the static model given in Table 2.1. In semi-adaptive modelling the frequency distribution of the input data is calculated. The probability model is then constructed on the basis of the relative frequencies of these symbols, see Table 2.1. The entropy of the input data is:

Static model:

H 1

101 58 1 58 1 58. . ... . 1.58

(2.4)

Semi-adaptive model:

H × 1

100 51 0 51 2 32 0 51 0 51 3 32 0 51 0 51 2 32 0 51. . . . . . . . . .

× × ×7

100 51 2

102 32 1

103 32. . . = 1.16 (2.5)

12

Table 2.1: Example of the static and semi-adaptive models.

STATIC MODEL SEMI-ADAPTIVE MODELsymbol p(x) H symbol count p(x) H

a 0.33 1.58 a 7 0.70 0.51b 0.33 1.58 b 2 0.20 2.32c 0.33 1.58 c 1 0.10 3.32

In dynamic modelling the input data is processed as shown in Table 2.2. In the beginning, some initial model is needed since no prior knowledge is allowed from the input data. Here we assume equal probabilities. The probability of the first symbol ('a') is thus 0.33 and the corresponding entropy 1.58. After that, the model is updated by increasing the count for symbol 'a' by one. Note, that it is assumed that each symbol has been occurred once before the processing, i.e. their initial counts equal to 1. (This is for avoiding the so-called zero-frequency problem. Because if we have no occurrence of a symbol, its probability would be 0.00 yielding entropy of ¥.)

At the second step, the symbol 'a' is again processed. Now the modified frequency distribution gives the probability of 2/4 = 0.50 for the symbol 'a', resulting to entropy of 1.00. As the coding proceeds, the more accurate approximation of the probabilities are obtained, and in the final step the symbol 'a' has now the probability of 7/12 = 0.58, resulting to entropy of 0.78. The sum of the entropies of the coded symbols is 14.5 bits, yielding to the overall entropy of 1.45 bits per symbol.

The corresponding entropies of the different modelling strategies are summarized here:

Static modelling: 1.58 (bits per symbol)Semi-adaptive modelling: 1.16Dynamic modelling: 1.45

Table 2.2: Example of the dynamic modelling. The numbers are the frequencies of the symbols.

step:symbol 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

a 1 2 3 3 4 5 5 6 7 7b 1 1 1 2 2 2 2 2 2 3c 1 1 1 1 1 1 2 2 2 2p(x) 0.33 0.50 0.20 0.50 0.57 0.13 0.56 0.60 0.18 0.58H 1.58 1.00 2.32 1.00 0.81 3.00 0.85 0.74 2.46 0.78

It should be noted, that in the semi-adaptive modelling the information used in the model should also be sent to the decoder, which would increase the overall bit rate. Moreover, the dynamic modelling is inefficient in the early stage of the processing, but it will quickly

13

improve its performance when more symbols have been processed, and thus more information can be used in the modelling. The result of the dynamic model is thus much closer to the semi-adaptive model when the length of the source is longer. Here the example was too short that the model would have had enough time adapt to the input data.

The properties of the different modelling strategies are summarized as follows:

Static modelling: Semi-adaptive modelling: Dynamic modelling:

+ One-pass method - Two-pass method + One-pass method+ No side information - Side information needed + No side information- Non-adaptive + Adaptive + Adaptive+ No updating of model + No updating of model - Updating of model

during compression during compression during compression

Context modelling:

So far we have considered only the overall frequency distribution of the source, but paid no attention to the spatial dependencies between the individual pixels. For example, the intensities of neighboring pixels are very likely to have strong correlation between each other.

Once again, consider ASCII data of English text, when the first five symbols have already been coded: "The_q...". The frequency distribution of the letters in English would suggest that the following letter would be blank with the probability of 18 %, or the letter 'e' (10 %), 't' (8 %), or any other with the decreasing probabilities. However, by consulting dictionary under the Q-chapter, it can be found that more than 99 % of the words have letter 'u' following the letter 'q' (eg. quadtree, quantize, quality). Thus, the probability distribution highly depends on, in which context it occurs. The solution is to use, not only one, but several different models, one for each context. The entropy of a N-level context model is the weighted sum of the entropies of the individual contexts:

H p c p x c p x cN j i j i ji

k

j

N

× ×

log2

11(2.6)

where p x c is the probability of symbo x in a context c, and N is the number of different contexts. In the image compression, the context is usually the value of the previous pixel, or the values of two or more neighboring pixels to the west, north, northwest, and northeast of the current pixel. The only limitation is that the pixels within the context template must have been compressed and thus seen by the decoder, so that both the encoder and decoder have the same information.

The number of contexts equals the number of possible combinations within the neighboring pixels that are present in the context template. A 4-pixel template is given in Figure 2.4. The number of contexts increases exponentially with the number of pixels within the template. With one pixel in template, the number of contexts are 28=256, but with two pixels the number is already 22´8=65536, which is rather impractical because of high memory requirement. One must also keep in mind, that in the semi-adaptive modelling, all models must be sent to the decoder, so the question is crucial also for the compression performance.

14

Figure 2.4: Example of a four pixel context template.

A solution to this problem is to quantize the pixels values within the template to reduce the number of possible combinations. For example, by quantizing the values by 4 bits each, the total number of contexts in a one-pixel template is 24=16, and in a two-pixel template 22´4=256. (Note, that the quantization is performed only in the computer memory in order to calculate what context model should be used; the original pixel values in the compressed file are untouched.)

Table 2.3: The number of contexts as the function of the number of pixels in the template.

Pixels withinthe template

No. of contextsNo. of contexts if quantized to

4 bits1 256 162 65536 2563 16´106 40964 4´109 65536

Predictive modelling:

Predictive modelling consists of the following three components:

· Prediction of the current pixel value.· Calculating the prediction error.· Modelling the error distribution.

The value of the current pixel x is predicted on the basis of the pixels that have already been coded (thus seen by the decoder too). Refer the neighboring pixels as in Figure 2.4, thus a possible predictor could be:

xx xW N

2

(2.7)

The prediction error is the difference between the original and the predicted pixel values:

e x x (2.8)

The prediction error is then coded instead of the original pixel value. The probability distribution of the prediction errors are concentrated around zero while very large positive, and very small negative errors are rare to appear, thus the distribution resembles Gaussian normal distribution function where the only parameter is the variance of the distribution, see

15

Figure 2.5. Now even a static model can be applied to estimate the probabilities of the errors. Moreover, the use of context modelling is not necessary anymore. The methods of this kind are sometimes referred as differential pulse code modulation (DPCM).

Figure 2.5: Probability distribution function of the prediction errors.

2.2 Coding

As stated earlier coding is considered as the easy part of compression. That is, good coding methods has been known since decades, eg. Huffman coding (1952). Moreover, the arithmetic coding (1979) is well-known to be optimal corresponding to the model. Let us next study these methods.

2.2.1 Huffman coding

Huffman algorithm creates a code tree (called Huffman tree) on the basis of probability distribution. The algorithm starts by creating for each symbol a leaf node containing the symbol and its probability, see Figure 2.6a. The two nodes with the smallest probabilities become siblings under a parent node, which is given a probability equal to the sum of its two children's probabilities, see Figure 2.6b. The combining operation is repeated, choosing the two nodes with the smallest probabilities, and ignoring nodes that are already children. For example, at the next step the new node formed by combining a and b is joined with the node for c to make a new node with probability p=0.2. The process continues until there is only one node without a parent, which becomes the root of the tree, see Figure 2.6c.

The two branches from every nonleaf node are next labelled 0 and 1 (the order is not important). The actual codes of the symbols are obtained by traversing the tree from the root to the corresponding leaf nodes representing the symbols; the codeword is the catenation of the branch labels during the path from root to the leaf node, see Table 2.4.

16

(a)

(b)

(c)

Figure 2.6: Constructing the Huffman tree: (a) leaf nodes; (b) combining nodes; (c) the complete Huffman tree.

Table 2.4. Example of symbols, their probabilities, and the corresponding Huffman codes.

Symbol Probability Codeworda 0.05 0000b 0.05 0001c 0.1 001d 0.2 01e 0.3 10f 0.2 110g 0.1 111

17

2.2.2 Arithmetic coding

Arithmetic coding is known to be optimal coding method in respect to the model. Moreover, it is extremely suitable for dynamic modelling, since there is no actual code table like in Huffman coding to be updated. The deficiency of Huffman coding is emphasized in the case of binary images. Consider binary images having the probability distribution of 0.99 for a white pixel, and 0.01 for a black pixel. The entropy of the white pixel is -log20.99 = 0.015 bits. However, in Huffman coding the code length is bounded to be 1 bit per symbol at minimum. In fact, the only possible code for binary alphabet would be one bit for each symbol, thus no compression would be possible.

Let us next consider the fundamental properties of binary arithmetic. With n bits at most 2n

different combinations can represented; or with n bits a code interval between zero to one can be divided into 2n parts each having the length of 2-n, see Figure 2.7. Let us assume A is a power of ½, that is A=2-n. From the opposite point of view, an interval with the length A can be coded by using bits (assuming A is a power of ½).

Figure 2.7: Interval [0,1] is divided into 8 parts, thus each having the length of 2-3=0.125. Each interval can now be coded by using bits.

The basic idea of arithmetic coding is to represent the entire input file as an small interval between the range [0,1]. The actual coding is the binary code representation of the interval; taking bits, in respect to the length of the interval (A). In the other words, arithmetic coding represents the input file with a single codeword.

Arithmetic coding starts by dividing the interval into subintervals according to the probability distribution of the source. The length of each subinterval equals to the probability of the corresponding symbol. Thus, the sum of the lengths of all subintervals equals to 1 filling the range [0, 1] completely. Consider the probability model of Table 2.1. Now, the interval is divided into the subintervals [0.0, 0.7], [0.7, 0.9], and [0.9, 1.0] corresponding to the symbols a, b, and c, see Figure 2.8.

The first symbol in a sequence (a,a,b,...) to be coded is 'a'. The coding proceeds by taking the subinterval of the symbol 'a', that is [0, 0.7]. Then this interval is again splitted into three subintervals so that the length of each subinterval is relative to its probability. For example, the subinterval for symbol 'a' is now 0.7 ´ [0, 0.7] = [0, 0.49]; for symbol 'b' it is 0.2 ´

18

[0, 0.7] plus the length of the first part, that is [0.49, 0.63]. The last subinterval for symbol 'c' is [0.63, 0.7]. The next symbol to be coded is 'a', so the next interval will be [0, 0.49].

Figure 2.8: Example of arithmetic coding of sequence 'aab' using the model of Table 2.1.

The process is repeated for each symbol to be coded resulting to a smaller and smaller interval. The final interval describes the source uniquely. The length of this interval is the cumulative multiplication of the probabilities of the coded symbols:

(2.9)

Due to the previous discussions this interval can be coded by

C A p pii

n

ii

n

log log2

12

1(2.10)

number of bits (assuming A is a power of ½). If the same model is applied for each symbol to be coded, the code length can be described in respect to the source alphabet:

C A p pi ii

m

× log2

1(2.11)

where m is the number of symbols in the alphabet, and pi is the probability of that particular symbol in the alphabet. The important observation is that the C A equals to the entropy! This means that the source can be coded optimally if A is a power of ½. This is the case if the length of the source approaches to infinite. In practice, arithmetic coding is optimal even for rather short sources.

Optimality of arithmetic coding:

The length of the final interval is not exactly a power of ½, as it was assumed. The final interval, however, can be approximated by any of its subinterval that meets the requirement A=2-n. Thus the approximation can be bounded by

19

(2.12)

yielding to the upper bound of the code length:

C A A A H log log2 22 1 1 (2.13)

The upper bound for the coding deficiency thus is 1 bit for the entire file. Note, that Huffman coding can be simulated by arithmetic coding, where each subinterval division is restricted to be a power of ½. There is no such restrictions in arithmetic coding (except in the final interval), which is the reason why arithmetic coding is optimal, in contrary to Huffman coding. The code length of Huffman coding has been shown to be bounded by

C A H p ee

H ps s ×

1 12 0 086log log . (bits per symbol) (2.14)

where ps1 is the probability of the most probable symbol in the alphabet. In the other words, the relative performance of Huffman coding (in respect to the entropy H) is the better the smaller is the probability of the most probable symbol s1. This is often the case in multi-symbol alphabet, however, in binary images the probability distribution is often very skew; it is not rare that the probability of the white pixel is as high as 0.99.

In principle, the problem with the skew distribution of binary alphabet is possible to avoid by blocking the pixels, for example by constructing a new symbol from 8 subsequent pixel in the image. The redefined alphabet thus consists of all the 256 possible pixel combinations. However, the probability of the most probable symbol (8 white pixels) is still too high, that is 0.998 = 0.92. Moreover, the number of combinations increases exponentially by the number of symbols in the block.

Implementation aspects of arithmetic coding:

Two variables are needed to describe the interval; A is the length of the interval, and C is the lower bound. The interval, however, very soon gets so small that it cannot be expressed by 16- or 32 bit integer in computer memory. The following procedure is thus applied. When the interval falls completely below the half point (0.5), it is known that the codeword describing the final interval starts with the bit 0. If the interval were above the half point, the codeword would start with the bit 1. In both cases, the starting bit can be outputted, and the processing can then be limited to the corresponding half of the full scale, which is either [0, 0.5], or [0.5, 1]. This is realized by zooming the corresponding half as shown in Figure 2.9.

20

Figure 2.9: Example of half point zooming.

The underflow can also occur if the interval decreases so that its lower bound is just below the half point, but the upper bound is still above it. In this case the half-point zoom cannot be applied. The solution is the so-called quarter-point zooming, see Figure 2.10. The condition for quarter-point zooming is that the lower bound of the interval exceeds 0.25, and the upper bound doesn't exceed 0.75. Now it is known that the following bit stream is either "01xxx" if final interval is below half point, or "10xxx" if the final interval is above the half point (Here xxx refers to the rest of the code stream). In general, it can be shown that if the next bit due to a half-point zooming will be b, it is followed by as many opposite bits of b as there were quarter-point zoomings before the next half-point zooming.

Since the final interval completely covers either the range [0.25, 0.5], or the range [0.5, 0.75], the encoding can be finished by sending the bit pair "01" if the upper bound is below 0.75, or "10" if the lower bound exceeds 0.25.

Figure 2.10: Example of two subsequent quarter-point zoomings.

QM-coder:

QM-coder is an implementation of arithmetic coding which has been specially tailored for binary data. One of the primary aspects in its designing has also been the speed. The main differences between QM-coder and the arithmetic coding described in the previous section is summarized as follows:

· The input alphabet of QM-coder must be in binary form.· For gaining speed, all multiplications in QM-coder has been eliminated.· QM-coder includes its own modelling procedures.

21

The fact that QM-coder is a binary arithmetic coder does not exclude the possibility of having multi-alphabet source. The symbols just have to be coded by one bit at a time, using a binary decision tree. The probability of each symbol is the product of the probabilities of the node decisions.

In QM-coder the multiplication operations have been replaced by fast approximations or by shift-left-operations in the following way. Denote the more probable symbol of the model by MPS, and the less probable symbol by LPS. In other words, the MPS is always the symbol which has the higher probability. The interval in QM-coder is always divided so that the LPS subinterval is above the MPS subinterval. If the interval is A and the LPS probability estimate is Qe, the MPS probability estimate should ideally be (1-Qe). The lengths of the respective subintervals are then and A Qe× 1 . This ideal subdivision and symbol ordering is shown in Figure 2.11.

Figure 2.11: Illustration of symbol ordering and ideal interval subdivision.

Instead of operating in the scale [0, 1], the QM-coder operates in the scale [0, 1.5]. Zooming, (or renormalization as it is called in QM-coder) is performed every time the length of the interval gets below half the scale 0.75 (the details of the renormalization are by-passed here). Thus the interval length is always in the range . Now, the following rough approximation is made:

(2.15)

If we follow this scheme, coding a symbol changes the interval as follows:

After MPS:

CA A Qe A A Qe A Qe

is unchanged × × 1 (2.16)

After LPS:

C C A Qe C A A Qe C A QeA A Qe Qe × ×

×

1 (2.17)

22

Now all multiplications are eliminated, except those needed in the renormalization. However, the renormalization involves only multiplications by the number of two, which can be performed by bit-shifting operation.

QM-coder also includes its own modelling procedures, which makes the separation between modelling and coding a little bit unconventional, see Figure 2.12. The modelling phase determines the context to be used and the binary decision to be coded. QM-coder then picks up the corresponding probability, performs the actual coding and updates the probability distribution if necessary. The way QM-coder handles the probabilities is based on a stochastic algorithm (details are omitted here). The method also adapts quickly to local variations in the image. For details see the "JPEG-book" by Pennebaker and Mitchell [1993].

Figure 2.12: Differences between the optimal arithmetic coding (left), and the integrated QM-coder (right).

2.2.3 Golomb-Rice codes

Golomb codes are a class of prefix codes which are suboptimal but very easy to implement. Golomb codes are used to encode symbols from a countable alphabet. The symbols are arranged in descending probability order, and non-negative integers are assigned to the symbols, beginning with 0 for the most probable symbol, see Figure 2.13. To encode integer x, it is divided into two components, to the most significant part xM and to the least significant part xL:

x xm

x x m

M

L

mod(2.18)

where m is the parameter of the Golomb coding. The values xM and xL are a complete representation of x since:

(2.19)

23

The xM is outputted using unary code, and the xL is outputted using binary code (an adjusted binary code is needed if m is not a power of 2), see Table 2.5 for an example.

Rice coding is the same as Golomb coding except that only a subset of the parameter values may be used, namely the power of 2. The Rice code with the parameter k is exactly the same as the Golomb code with parameter m=2k. The Rice codes are even simpler to implement since xM can be computed by shifting x bit-wise right k times. The xL is computed by masking out all but the k low order bits. The sample Golomb and Rice code tables are shown in Table 2.6.

Figure 2.13: Probability distribution function assumed by Golomb and Rice codes.

Table 2.5. An example of the Golomb coding with the parameter m=4.

x xM xL Code of xM Code of xL

0 0 0 0 001 0 1 0 012 0 2 0 103 0 3 0 114 1 0 10 005 1 1 10 016 1 2 10 107 1 3 10 118 2 0 110 00: : : : :

Table 2.6. Golomb and Rice codes for the parameters m=1 to 5.

Golomb: m=1 m=2 m=3 m=4 m=5Rice: k=0 k=1 k=2x=0 0 00 00 000 0001 10 01 010 001 0012 110 100 011 010 0103 1110 101 100 011 01104 11110 1100 1010 1000 0111

24

5 111110 1101 1011 1001 10006 : 11100 1100 1010 10017 : 11101 11010 1011 10108 : 111100 11011 11000 101109 : 111101 11100 11001 10111: : : : : :

25

3 Binary images

Binary images represent the simplest and most space economic form of images and are of great interest when colors or grey-scales are not needed. They consists only of two colors, black and white. The probability distribution of this input alphabet is often very skew, e.g. p(white)=98 % and p(black)=2 %. Moreover, the images usually have large homogenous areas of the same color. These properties can be taken advantage of in the compression of binary images.

3.1 Run-length coding

Run-length coding (RLC), also referred as run-length encoding (RLE) is probably the best known compression method for binary images. The image is processed row by row, from left to right. The idea is to block the subsequent pixels in each scanning line having the same color. Each block, referred as run, is then coded by its color information and length resulting to a code stream like

C1, n1, C2, n2, C3, n3,... (3.1)

where Ci is the code due to the color information of the i'th run, and ni is the code due to the length of the run. In binary images there are only two colors, thus a black run is always followed by a white run, and vice versa. Therefore it is sufficient to code only the lengths of the runs; no color information is needed. The first run in each line is assumed to be white. On the other hand, if the first pixel happens to be black, a white run of zero length is coded.

The run-length "coding" method is purely a modelling scheme resulting to a new alphabet consisting of the lengths of the runs. These can be coded for example by using the Huffman code given in Table 3.1. Separate code tables are used to represent the black and white runs. The code table contains two types of codewords: terminating codewords (TC) and make-up codewords (MUC). Runs between 0 and 63 are coded using single terminating codeword. Runs between 64 and 1728 are coded by a MUC followed by a TC. The MUC represents a run-length value of 64×M (where M is an integer between 1 and 27) which is equal to, or shorter than, the value of the run to be coded. The following TC specifies the difference between the MUC and the actual value of the run to be coded. See Figure 3.1 for an example of run-length coding using the code table of Table 3.1.

26

Figure 3.1: Example of one-dimensional run-length coding.

Vector run-length coding:

The run-length coding efficiently codes large uniform areas in the images, even though the two-dimensional correlations are ignored. The idea of the run-length coding can also be applied two-dimensionally, so that the runs consist of m´n-sized blocks of pixels instead of single pixels. In this vector run-length coding the pixel combination in each run has to be coded in addition to the length of the run. Wang and Wu [1992] reported up to 60 % improvement in the compression ratio when using 8´4 and 4´4 block sizes. They also used blocks of 4´1 and 8´1 with slightly smaller improvement in the compression ratio.

Predictive run-length coding:

The performance of the run-length coding can be improved by using prediction technique as a preprocessing stage (see also Section 2.1). The idea is to form a so-called error image from the original one by comparing the value of each original pixel to the value given by a prediction function. If these two are equal, the pixel of the error image is white; otherwise it is black. The run-length coding is then applied to the error image instead of the original one. Benefit is gained from the increased number of white pixels; thus longer white runs will be obtained.

The prediction is based on the values of certain (fixed) neighboring pixels. These pixels have already been encoded and are therefore known to the decoder. The prediction is thus identical in the encoding and decoding phases. The image is scanned in row-major order and the value of each pixel is predicted from the particular observed combination of the neighboring pixels, see Figure 3.2. The frequency of a correct prediction varies from 61.4 to 99.8 % depending on the context; the completely white context predicts a white pixel with a very high probability, and a context with two white and two black pixels usually gives only an uncertain prediction.

The prediction technique increases the proportion of white pixels from 94 % to 98 % for a set of test images; thus the number of black pixels is only one third than that of the original image. An improvement of 30 % in the compression ratio was reported by Netravali and Mounts [1980]; and up to 80 % with an inclusion of the so-called re-ordering technique.

27

Table 3.1: Huffman code table for the run-lengths.

- - - terminating codewords - - -n white runs black runs n white runs black runs0 00110101 0000110111 32 00011011 0000011010101 000111 010 33 00010010 0000011010112 0111 11 34 00010011 0000110100103 1000 10 35 00010100 0000110100114 1011 011 36 00010101 0000110101005 1100 0011 37 00010110 0000110101006 1110 0010 38 00010111 0000110101017 1111 00011 39 00101000 0000110101118 10011 000101 40 00101001 0000011011009 10100 000100 41 00101010 000001101101

10 00111 0000100 42 00101011 00001101101011 01000 0000101 43 00101100 00001101101112 001000 0000111 44 00101101 00000101010013 000011 00000100 45 00000100 00000101010114 110100 00000111 46 00000101 00000101011015 110101 000011000 47 00001010 00000101011116 101010 0000010111 48 00001011 00000110010017 101011 0000011000 49 01010010 00000110010118 0100111 0000001000 50 01010011 00000101001019 0001100 00001100111 51 01010100 00000101001120 0001000 00001101000 52 01010101 00000010010021 0010111 00001101100 53 00100100 00000011011122 0000011 00000110111 54 00100101 00000011100023 0000100 00000101000 55 01011000 00000010011124 0101000 00000010111 56 01011001 00000010100025 0101011 00000011000 57 01011010 00000101100026 0010011 000011001010 58 01011011 00000101100127 0100100 000011001011 59 01001010 00000010101128 0011000 000011001100 60 01001011 00000010110029 00000010 000011001101 61 00110010 00000101101030 00000011 000001101000 62 00110011 00000110011031 00011010 000001101000 63 00110100 000001100111

- - - make-up codewords - - -n white runs black runs n white runs black runs64 11011 0000001111 960 011010100 0000001110011128 10010 000011001000 1024 011010101 0000001110100192 010111 000011001001 1088 011010110 0000001110101256 0110111 000001011011 1152 011010111 0000001110110320 00110110 000000110011 1216 011011000 0000001110111384 00110111 000000110100 1280 011011001 0000001010010448 01100100 000000110101 1344 011011010 0000001010011512 01100101 0000001101100 1408 011011011 0000001010100576 01101000 0000001101101 1472 010011000 0000001010101640 01100111 0000001001010 1536 010011001 0000001011010704 011001100 0000001001011 1600 010011010 0000001011011768 011001101 0000001001100 1664 011000 0000001100100832 011010010 0000001001101 1728 010011011 0000001100101896 011010011 0000001110010 EOL 0000000000001 000000000001

28

Figure 3.2: A four-pixel prediction function. The various prediction contexts (pixel combinations) are given in the left column; the corresponding prediction value in the middle;

and the probability of the correct prediction in rightmost column.

3.2 READ code

Instead of the lengths of the runs, one can code the location of the boundaries of the runs (the black/white transitions) relative to the boundaries of the previous row. This is the basic idea in the method called relative element address designate (READ). The READ code includes three coding modes:

· Vertical mode· Horizontal mode· Pass mode

In vertical mode the position of each color change (white to black or black to white) in the current line is coded with respect to a nearby change position (of the same color) on the reference line, if one exists. "Nearby" is taken to mean within three pixels, so the vertical-mode can take on one of seven values: -3, -2, -1, 0, +1, +2, +3. If there is no nearby change position on the reference line, one-dimensional run-length coding - called horizontal mode - is used. A third condition is when the reference line contains a run that has no counterpart in the current line; then a special pass code is sent to signal to the receiver that the next complete run of the opposite color in the reference line should be skipped. The corresponding codewords for each coding mode are given in Table 3.2.

Figure 3.3 shows an example of coding in which the second line of pixels - the current line - is transformed into the bit-stream at the bottom. Black spots mark the changing pixels that are to be coded. Both end-points of the first run of black pixels are coded in vertical mode, because that run corresponds closely with one in the reference line above. In vertical mode, each offset is coded independently according to a predetermined scheme for the possible values. The beginning point of the second run of black pixels has no counterpart in the reference line, so it is coded in horizontal mode. Whereas vertical mode is used for coding individual change-points, horizontal mode works with pairs of change-points. Horizontal-

29

mode codes have three parts: a flag indicating the mode, a value representing the length of the preceding white run, and another representing the length of the black run. The second run of black pixels in the reference line must be "passed", for it has no counterpart in the current line, so the pass code is emitted. Both end-points of the next run are coded in vertical mode, and the final run is coded in horizontal mode. Note that because horizontal mode codes pairs of points, the final change-point shown is coded in horizontal mode even though it is within 3-pixel range of vertical mode.

Table 3.2. Code table for the READ code. Symbol wl refers to the length of the white run and bl to the length of the black run; Hw() and Hb() refer to the Huffman codes of Table 3.1.

Mode: Codeword:Pass 0001

Horizontal 001 + Hw(wl) + Hb(bl) +3 0000011+2 000011+1 011

Vertical 0 1-1 010-2 000010-3 0000010

Figure 3.3: Example of the two-dimensional READ code.

3.3 CCITT group 3 and group 4 standards

The RLE and READ algorithms are included in two image compression standards, known as CCITT1 Group 3 (G3) and Group 4 (G4). They are nowadays widely used in FAX-machines. The CCITT standard also specifies details like paper size (A4) and scanning resolution. The two optional resolutions of the image are specified as 1728´1188 pixels per A4 page

1Consultative Committee for International Telegraphy and Telephone

30

(200´100 dpi) in the low resolution, and 1728´2376 (200´200 dpi) in the high resolution. The G3 specification covers binary documents only, although G4 does include provision for optional grayscale and color images.

In the G3 standard every kth line of the image is coded by the 1-dimensional RLE-method (also referred as Modified Huffman) and the 2-dimensional READ-code (more accurately referred as Modified READ) is applied for the rest of the lines. In the G3, the k-parameter is set to 2 for the low resolution, and 4 for the high resolution images. In the G4, k is set to infinite so that every line of the image is coded by READ-code. An all-white reference line is assumed at the beginning.

3.4 Block coding

The idea in the block coding, as presented by Kunt and Johnsen [1980], is to divide the image into blocks of pixels. A totally white block (all-white block) is coded by a single 0-bit. All other blocks (non-white blocks) thus contain at least one black pixel. They are coded with a 1-bit as a prefix followed by the contents of the block, bit by bit in row-major order, see Figure 3.4.

The block coding can be extended so that all-black blocks are considered also. See Table 3.2 for the codewords of the extended block coding. The number of uniform blocks (all-white and all-black blocks) depends on the size of the block, see Figure 3.5. The larger the block size, the more efficiently the uniform blocks can be coded (in bits per pixel), but the less there are uniform blocks to be taken advantage of.

In the hierarchical variant of the block coding the bit map is first divided into b´b blocks (typically 16´16). These blocks are then divided into quadtree structure of blocks in the following manner. If a particular b*b block is all-white, it is coded by a single 0-bit. Otherwise the block is coded by a 1-bit and then divided into four equal sized subblocks which are recursively coded in the same manner. Block coding is carried on until the block reduces to a single pixel, which is stored as such, see Figure 3.6.

The power of block coding can be improved by coding the bit patterns of the 2´2-blocks by Huffman coding. Because the frequency distributions of these patterns are quite similar for typical facsimile images, a static Huffman code can be applied. The Huffman coding gives an improvement of ca. 10 % in comparison to the basic hierarchical block coding. Another improvement is the use of prediction technique in the same manner as was used with the run-length coding. The application of the prediction function of Figure 3.2 gives an improvement ca. 30 % in the compression ratio.

Figure 3.4: Basic block coding technique. Coding of four sample blocks.

31

Figure 3.5: Number of block types as a function of block size.

Table 3.2. Codewords in the block coding method. Here xxxx refers to the content of the block, pixel by pixel.

Content of the block:

Codeword in block coding:

Codeword in extended block

coding:All-white 0 0All-black 1 1111 11

Mixed 1 xxxx 10 xxxx

Im a ge to b e c o m p re s se d : C o d e b its :1 011 1 0 01 1 0 11 1 1 00 0

x xx x

01 11 11 11 11 1 101 01 10 10 11 00

x x x xx x x xx x x xx x x x

x xx x

x xx x

x xx x

x xx xx xx x

x xx x

Figure 3.6: Example of hierarchical block coding.

3.5 JBIG

32

JBIG (Joint Bilevel Image Experts Group) is the newest binary image compression standard by CCITT and ISO2. It is based on context-based compression where the image is compressed pixel by pixel. The pixel combination of the neighboring pixels (given by the template) defines the context, and in each context the probability distribution of the black and white pixels are adaptively determined on the basis of the already coded pixel samples. The pixels are then coded by arithmetic coding according to their probabilities. The arithmetic coding component in JBIG is the QM-coder.

Binary images are a favorable source for context-based compression, since even a relative large number of pixels in the template results to a reasonably small number of contexts. The templates included in JBIG are shown in Figure 3.7. The number of contexts in a 7-pixel template is 27=128, and in the 10-pixel model it is 210=1024. (Note that a typical binary image of 1728´1188 pixels consists of over 2 million pixels.) The larger the template, the more accurate probability model it is possible to obtain. However, with a large template the adaptation to the image takes longer; thus the size of the template cannot be arbitrary large. The optimal context size for the CCITT test images set Figure 3.8.

Figure 3.7: JBIG sequential model templates.

02468101214161820222426

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Pixel in context template

Com

pres

sion

ratio

standard

optimal

Figure 3.8: Sample compression ratios for context-based compression.

2International Standards Organization

33

JBIG includes two modes of processing: sequential and progressive. The sequential mode is the traditional row-major order processing. In the progressive mode, a reduced resolution version of the image (referred as starting layer) is compressed first, followed by the second layer, and so on, see Figure 3.9. The lowest resolution is either 12.5 dpi (dots per inch), or 25 dpi. In each case the resolution is doubled for the next layer. In this progressive mode, the pixels in the previous layer can be included in the context template when coding the next layer. In JBIG, four such pixels are included, see Figure 3.10. Note that there are four variations (phases 0, 1, 2, and 3) of the same basic template model depending on the position of the current pixel.

Figure 3.9: Different layers of JBIG.

Figure 3.10: JBIG progressive model templates.

Resolution reduction in JBIG:

34

In the compression phase, the lower resolution versions are calculated on the basis on the next layer. The obvious way to halve the resolution is to group the pixels into 2´2 blocks and take the color of majority of these four pixels. Unfortunately, with binary images it is not clear what to do when two of pixels are black (1) and the other two are white (0). Consistently rounding up or down tends to wash out the image very quickly. Another possibility is to round the value in a random direction each time, but this adds considerable noise to the image, particularly at the lower resolutions.

The resolution reduction algorithm incorporated in JBIG is illustrated in Figure 3.11. the value of the target pixel is calculated as a linear function of the marked pixels. The already-committed pixels at the lower resolution - the round ones - participate in the sum with negative weights that exactly offset the corresponding positive weights. This mean that if the already-committed areas are each either uniformly white or uniformly black, they do not affect the assignment of the new pixel. If black and white are equally likely and the pixels are statistically independent, the expected value of the target pixel is 4.5. It is chosen to be black if the value is 5 or more, and white if it is 4 or less.

This method preserves the overall grayness of the image. However, problems occur with lines and edges, because these deteriorate very rapidly. To address this problem, a number of exception patterns can be defined which, when they are encountered, reverse the polarity of the pixel that is obtained from thresholding the weighted sum as described above. An example of such an exception pattern is show in Figure 3.12.

Figure 3.11: Resolution reduction in JBIG: participating pixels marked (left);

pixel weights (right)

Figure 3.12: Example of an exception pattern for resolution

reduction.

432´594 216´297 108´148 54´74

Figure 3.13: Example of JBIG resolution reduction for CCITT image 5.

35

3.6 JBIG2

The emerging standard JBIG2 enhances the compression of text images using pattern matching technique. The standard will have two encoding methods: pattern matching & substitution (PM&S), and soft pattern matching (SPM). The image is segmented into pixel blocks containing connected black pixels using any segmentation technique. The content of the blocks are matched to the library symbols. If an acceptable match (within a given error marginal) is found, the index of the matching symbol is encoded. In case of unacceptable match, the original bitmap is coded by a JBIG-style compressor. The compressed file consists of bitmaps of the library symbols, location of the extracted blocks as offsets, and the content of the pixel blocks.

The PM&S encoding mode performs lossy substitution of the input block by the bitmap of the matched character. This requires very safe and intelligent matching procedure to avoid substitution error. The SPM encoding mode, on the other hand, is lossless and is outlined in Fig. 3.14. Instead of performing substitution, the content of the original block is also coded in order to allow lossless compression. The content of the pixel block is coded using a JBIG-style compressor with the difference that the bitmap of the matching symbol is used as an additional information in the context model. The method applies the two-layer context template shown in Fig. 3.15. Four context pixels are taken from the input block and seven from the bitmap of the matching dictionary symbol. The dictionary is builded adaptively during the compression by conditionally adding the new pixel blocks into dictionary. The standard defines mainly the general file structure and the decoding procedure but leaves some freedom in the design of the encoder.

36

Se gm e nt im ageinto pixe l b lo c ks

Se arc h fo rac c e ptab le m atc h

E nc o de inde x o fm atc hing s ym bo l

E N D

C o nditio nal ly addne w s ym bo l to the

dic t io nary

E nc o de po s i t io n o fthe blo c k as o f fs e t

E nc o de o r iginalblo c k us ing 2 - le ve l

c o nte xt te m plate

E nc o de bi tm ap byJ B IG s tyle

c o m pre s so r

L as t blo c k?

M a tc h found ?Ye s N o

N e xt blo c k

Ye s

Figure 3.14: Block diagram of JBIG2.

?

C o ntext p ixels fro mthe o riginal im age

C o ntex t p ixels fro mthe m atc h ing p ixel b lo c k

Figure 3.15: Two-level context template for coding the pixel blocks.

3.7 Summary of binary image compression algorithms

Figure 3.16 profiles the performance of several binary image compression algorithms, including also three well-known universal compression software (Compress, Gzip, Pkzip). Note that all of these methods are lossless. Figure 3.17 gives comparison of JBIG and JBIG2 for the set of CCITT test images.

37

Figure 3.16: Compression efficiency of several binary image compression algorithms for CCITT test image 3.

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8

byte

s

JBIGJBIG-2

Figure 3.17: Comparison of JBIG and JBIG2 for the set of CCITT test images.

38

4 Continuous tone images

4.1 Lossless and near-lossless compression

4.1.1 Bit-plane coding

The idea of bit plane coding is to apply binary image compression for the bit planes of the gray-scale image. The image is first divided into k separate bit planes, each representing a binary image. These bit planes are then coded by any compression method designed for the binary images, e.g. context based compression with arithmetic coding, as presented in Section 3.5. The bit planes of the most significant bits are the most compressible, while the bit planes of the least significant bits are nearly random and thus mostly uncompressable.

Better results can be achieved if the binary codes are transformed into Gray codes before partitioning the image into the bit planes. Consider an 8-bit image consisting of only two pixel values, which are either 127 and 128. Their corresponding binary representations are 0111 1111 and 1000 0000. Now, even if the image could be compressed by 1 bit per pixel by a trivial algorithm, the bit planes are completely random (thus uncompressable) since the values 127 and 128 differ in every bit position. Gray coding is a method of mapping a set of numbers into a binary alphabet such that successive numerical changes result in a change of only one bit in the binary representation. Thus, when two neighboring pixels differ by one, only a single bit plane is affected. Figure 4.1 shows the binary code (BC) and their corresponding Gray code representations (GC) in the case of 4-bit number of 0 to 15. One method of generating a Gray code representation from the binary code is by the following logical operation:

GC BC BC 1 (4.1)

where denotes bit-wise exclusive OR operation, and >> denotes bit-wise logical right-shift operation. Note that the ith GC bit plane is constructed by performing an exclusive OR on the ith and (i+1)th BC bit planes. The most significant bit planes of the binary and the Gray codes are identical. Figures 4.2 and 4.3 shows the bit planes of binary and Gray codes for the test image Camera.

39

Figure 4.1: Illustration of four-bit binary and Gray codes.

Suppose that the bit planes are compressed by the MSP (most significant plane) first and the LSP last. A small improvement in a context based compression is achieved, if the context templates includes few pixels from the previously coded bit plane. Typically the bits included in the template are the bit of the same pixel that is to be coded, and possibly the one above it. This kind of templated is referred as a 3D-template.

40

Binary code Gray code

7

6

5

4

Figure 4.2: Binary and Gray code bit planes for test image Camera, bit planes 7 through 4.

41

Binary code Gray code

3

2

1

0

Figure 4.3: Binary and Gray code bit planes for test image Camera, bit planes 3 through 0.

42

4.1.2 Lossless JPEG

Lossless JPEG (Joint Photographic Experts Group) proceeds the image pixel by pixel in row-major order. The value of the current pixel is predicted on the basis of the neighboring pixels that have already been coded (see Figure 2.4). The prediction functions available in JPEG are given in Table 4.1. The prediction errors are coded either by Huffman coding or arithmetic coding. The Huffman code table of lossless JPEG is given in Table 4.2. Here one first encodes the category of the prediction error followed by the binary representation of the value within the corresponding category, see Table 4.3 for an example.

The arithmetic coding component in JPEG is QM-coder, which is a binary arithmetic coder. The prediction errors are coded in the same manner than in the Huffman coding scheme: category value followed by the binary representation of the value. Here the category values are coded by a sequence of binary decisions as shown in Figure 4.4. If the prediction value is not zero, the sign of the difference is coded after the "zero/non-zero" decision. Finally, the value within the category is encoded bit by bit from the most significant bit to the least significant bit. The probability modelling of QM-coder takes care that the corresponding binary decisions are encoded according to their corresponding probabilities. The details of the context information involved in the scheme are omitted here.

Figure 4.4: Binary decision tree for coding the categories.

Table 4.1: Predictors used in lossless JPEG.

Mode: Predictor: Mode: Predictor:0 Null 4 N + W - NW1 W 5 W + (N - NW)/22 N 6 N + (W - NW)/23 NW 7 (N + W)/2

43

Table 4.2: Huffman coding of the prediction errors.

Category: Codeword: Difference: Codeword:0 00 0 -1 010 -1, 1 0, 12 011 -3, -2, 2, 3 00, 01, 10, 113 100 -7,..-4, 4..7 000,...011, 100...1114 101 -15,..-8, 8..15 0000,...0111, 1000...11115 110 -31,..-16, 16..31 :6 1110 -63,..-32, 32..63 :7 11110 -127,..-64, 64..127 :8 111110 -255,..-128, 128..255 :

Table 4.3: Example of lossless JPEG for the pixel sequence (10, 12, 10, 7, 8, 8, 12) when using the prediction mode 1 (i.e. the predictor is the previous pixel value). The predictor for the first pixel is zero.

Pixel: 10 12 10 7 8 8 12Prediction error: +10 +2 -2 -3 +1 0 +4

Category: 4 2 2 2 1 0 3Bit sequence: 1011010 01110 01101 01100 0101 00 100100

4.1.3 FELICS

FELICS (Fast and Efficient Lossless Image Compression System) is a simple but yet efficient compression algorithm proposed by Howard and Vitter [1993]. The main idea is to avoid the use of computationally demanding arithmetic coding, but instead use a simpler coding scheme together with a clever modelling method.

FELICS uses the information of two adjacent pixels when coding the current one. These are the one to the left of the current pixel, and the one above it. Denote the values of the neighboring pixels by L and H so that L is the one which is smaller. The probabilities of the pixel values obeys the distribution given in Figure 4.5.

Figure 4.5: Probability distribution of the intensity values.

The coding scheme is as follows: A code bit indicates whether the actual pixel value P falls into the range [L, H]. If so, an adjusted binary coding is applied. Here the hypothesis is that

44

the in-range values are uniformly distributed. Otherwise the above/below-range decision requires another code bit, and the value is then coded by Rice coding with adaptive k-parameter selection.

Adjusted binary codes:

To encode an in-range pixel value P, the difference P-L must be encoded. Denote D=H-L, thus the number of possible values in the range is D+1. If D+1 is a power of two, a binary code with log 2 1D bits is used. Otherwise the code is adjusted so that log2 1D bits are assigned to some values, and log2 1D bits to others. Because the values near the middle of the range are slightly more probable, shorter codewords are assigned to those values. For example, if D=4 there are five values (0, 1, 2, 3, 4) and their corresponding adjusted binary codewords are (111, 10, 00, 01, 110).

Rice codes:

If the pixel value P exceed the range [L, H], the difference P-(H+1) is coded using Rice coding. According to hypotesis of the distribution in 4.6, the values have exponentially decreasing probabilities when P increases. On the other hand, if P falls below the range [L, H], the difference (L-1)-P is then coded. The shape of the distributions in the above and below ranges are identical, thus the same Rice coding is applied. See Table 4.4 for a summarison of the FELICS code words, and Section 2.2.3 for details of Rice coding.

For determining the Rice coding parameter k, the D is used as a context. For each context D, a cumulative total bit rate is maintained for each reasonable Rice parameter value k, of the code length that would have resulted if the parameter k were used to encode all values encoutered so far in the context. The parameter with the smallest cumulative code length is used to encode the next value encountered in the context. The allowed parameter values are k=0, 1, 2, and 3.

Table 4.4: Code table of FELICS; B = adjusted binary coding, and R = Rice coding.

Pixel position: Codeword:Below range 10 + R(L-P-1)

In range 0 + B(P-L) Above range 11 + R(P-H-1)

4.1.4 JPEG-LS

The JPEG-LS is based on the LOCO-I algorithm (Weinberger et al., 1998). The method uses the same ideas as the lossless JPEG with the improvement of utilizing context modeling and adaptive correction of the predictor. The coding component is changed to Golomb codes with an adaptive choice of the skewness parameter. The main structure of the JPEG-LS is shown in Fig. 4.6. The modeling part can be broken into the following three components:

a. Predictionb. Determination of context.c. Probability model for the prediction errors.

45

In the prediction, the next value x is predicted as x based on the values of already coded neighboring pixels. The three nearest pixels (denoted as a, b, c), shown in Fig. 4.?, are used in the prediction as follows:

min ,max ,

max( , )min( , )x

a ba b

a b c

c a bc a bc

if if if otherwise

(4.2)

The predictor tends to pick a in cases of an horizontal edge above the current location, and b in cases of a vertical edge exists left of the current location. The third choice (a+b-c) is based on the presumption that there is a smooth plane around the pixel and uses this estimation as the prediction. The prediction residual = x - x are then input to the context modeler, which will decide the appropriate statistical model to be used in the coding.

G radie nts

Flatre gio n?

F ixe dpr e dic to r

Adaptivec o r r e c t io n

C o nte xtm o de le r

R un c o unte r

G o lo m bc o de r

R un c o de r

c b da x

ima g esa m p le s

pred ictionerro rs

im a g es a m p le s

p r e d ic tedva lu e s

ima g es a m p le s

c o m p r e s se db it s tre a m

r u n le n g th s ,s ta tistic s

p re d . e rr o r s ,s ta tistic s

+

-

c o n te x tM o de le r C o de r

m o d e reg u lar

ru n

Figure 4.6: Block diagram of JPEG-LS.

The context is determined by calculating three gradients between the four context pixels: g1 = d - b, g2 = b - c, and g1 = c - a. Each difference is quantized into a small number of approximately equiprobable connected regions in order to reduce the number of models. Denote the quantized values of g1, g2, g3 by q1, q2, q3. The quantization regions are {0}, {1,2}, {3,4,5,6}, {7,8,...,20}, and {e20} by default for 8 bits per pixel images, but the regions can be adjusted for the particular images and application. The number of models is reduced further by assuming that symmetric contexts Ci = g1, g2, g3 and Cj = -g1, -g2, -g3 have the same statistical properties (with the difference of the sign). The total number of models is thus:

C

T

2 1 12

3

(4.3)

where T is the number of non-zero regions. Using the default regions (T=4) this gives 365 models in total. Each context will have its own counters and statistical model as described next.

The fixed predictor of (4.2) is fine-tuned by adaptive adjustment of the prediction value. An optimal predictor would result in prediction error = 0 on average, but this is not necessary the case. A technique called bias cancellation is used in JPEG-LS to detect and correct

46

systematic bias in the predictor. A correction value C’ is estimated based on the prediction errors seen so far in the context, and then subtracted from the prediction value. The correction value is estimated as the average prediction error:

CDN

'

(4.4)

where D is the cumulative sum of the previous uncorrected prediction errors, and N is the number of pixels coded so far. These values are maintained in each context. The practical implementation is slightly different and the reader is recommended to look for the details from the paper by Weinberger et al. (1998).

The prediction errors is approximated by a Laplacian distribution, i.e. a two-sided exponential decay centered at zero. The prediction errors are first mapped as follows:

M u 2 (4.5)

This mapping will re-order the values in the interleaved sequence 0, -1, +1, -2, +2, and so on. Golomb codes (or their special case Rice codes) are then used to code the mapped values, see Section 2.2.3. The only parameter of the code is k, which defines the skewness of the distribution. In JPEG-LS, the value of k is adaptively determined as follows:

k k N Ak × min ' '2 (4.6)

where A is the accumulated sum of magnitudes of the prediction errors (absolute values) seen in the context so far. The appropriate Golomb code is then used as described by Weinberger et al. (1998).

JPEG-LS has also “run mode” for coding flat regions. The run mode is activated when a flat region is detected as a = b = c = d (or equivalently q1 = q2 = q3 = 0). The number of repeated successful predictions (=0) is the encoded by a similar Golomb coding of the run length. The coding the returns to the regular mode when coding the next unsuccessful prediction error (0). This technique is also referred as alphabet extension.

The standard included also an optional near-lossless mode, in which every sample value in a reconstructed image is guaranteed to differ from the original value by a (small) preset amount . This mode is implemented simply by quantizing the prediction error as follows:

Q u

2 1 (4.7)

The quantized values must then be used also in the context modeling, and taken into account in the coding and decoding steps.

4.1.5 Residual coding of lossy algorithms

An interesting approach to lossless coding is the combination of lossy and lossless methods. First, a lossy coding method is applied to the image. Then a residual between the original and

47

the reconstructed image is calculated and compressed by any lossless compression algorithm. This scheme can also be considered as a kind of progressive coding. The lossy image serves as a rough version of the image which can be quickly retrieved. Then the complete image can be retrieved by adding the residual and the lossy parts together.

4.1.6 Summary of the lossless gray-scale compression

Figure 4.7 profiles the performance of several lossless gray-scale image compression algorithms, including also three well-known compression software (Compress, Gzip, Pkzip) based on Ziv-Lempel algorithms [1977] [1978].

The compression efficiency of the bit plane coding is rather close to the results of the lossless JPEG. In fact, the coding of the bit planes by JBIG (with 3D-template) outperforms lossless JPEG for images with low bits per pixel values in the original image, see Figure 4.7. For example, the corresponding bit rates of the lossless JPEG and JBIG for the bit planes were 3.83 and 3.92, for a set of 8 bpp images (256 gray scales). On the other hand, for 2 bpp images (4 gray scales) the corresponding bit rates were 0.34 (lossless JPEG), and 0.24 (JBIG bit plane coding). The result of the bit plane based JBIG coding are better when the precision of the image is 6 bpp or lower. Otherwise lossless JPEG gives slightly better compression results.

Figure 4.7: Compression efficiency of lossless image compression algorithms for test image Lena (512´512´8).

48

Figure 4.8: Compression versus precision. JBIG refers to the bit plane coding, and JPEG for the lossless JPEG. The results are for the JPEG set of test images.

4.2 Block truncation coding

The basic idea of block truncation coding (BTC) is to divide the image into 4´4-pixel blocks and quantize the pixels to two different values, a and b. For each block, the mean value ( ) and the standard deviation (s) are calculated and encoded. Then two-level quantization is performed for the pixels of the block so that a 0-bit is stored for the pixels with values smaller than the mean, and the rest of the pixels are represented by a 1-bit. The image is reconstructed at the decoding phase from the and s, and from the bit plane by assigning the value a to the 0-value pixels and b to the 1-value pixels:

(4.8)

(4.9)

where m (=16) is the total number of the pixels in the block and q is the number of 1-bits in the bit plane. The quantization level values were chosen so that the mean and variance of the pixels in the block would be preserved in the decompressed image, thus the method is also referred as moment preserving BTC. Another variant of BTC, called absolute moment BTC (AMBTC) selects a and b as the mean values of the pixels within the two partitions:

(4.10)

(4.11)

This choice of a and b does not necessarily preserve the second moment (variance) of the block. However, it has been shown that the MSE-optimal representative for a set of pixels is their mean value. See Figure 4.9 for an example of the moment preserving BTC.

49

In the moment preserving BTC the quantization data is represented by the pair ( ,s). A drawback to this approach is that the quantization levels are calculated at the decoding phase from the quantized values of ( ,s) containing rounding errors. Thus extra degradation is caused by the coding phase. The other approach is to calculate the quantization levels (a,b) already at the encoding phase and transmit them. In this way one can minimize both the quantization error and the computation needed at the decoding phase.

The basic BTC algorithm does not consider how the quantization data - ( ,s) or (a,b) - and the bit plane should be coded, but simply represents the quantization data by 8+8 bits, and the bit plane by 1 bit per pixel. Thus, the bit rate of BTC is 8 8 m m bits per pixel (=2.0 in the case of 4´4 blocks).

Figure 4.9: Example of the moment preserving BTC.

The major drawback of BTC is that it performs poorly in high contrast blocks because two quantization levels are not sufficient to describe these blocks. The problem can be attacked by using variable block sizes. With large blocks one can decrease their total number and therefore reduce the bit rate. On the other hand, small blocks improve the image quality.

One such an approach is to apply quadtree decomposition. Here the image is segmented into blocks of size m1 ´m1. If standard deviation s of a block is less than a predefined threshold sth (implying a low contrast block) the block is coded by a BTC algorithm. Otherwise it is divided into four subblocks and the same process is repeated until the threshold criterion is met, or the minimal block size (m2 ´m2) is reached. The hierarchy of the blocks is represented by a quadtree structure.

The method can be further improved by compressing the bit plane, eg. by vector quantization. This combined algorithm is referred as BTC-VQ. The pair (a,b), on the other hand, can be compressed by forming two subsample images; one from the a-values and another from the b-values. These can then be compressed by any image compression algorithm in the same manner than the mean value in the mean/residual VQ.

All the previous ideas can be collected together to form a combined BTC algorithm. Let us next examine one such possible combination. See Table 4.5 for the elements of the combined method, referred as HBTC-VQ. The variable block sizes are applied with the corresponding

50

minimum and maximum block sizes of 2´2 and 32´32. For a high quality of the compressed image the use of 2´2-blocks is essential in the high contrast regions. Standard deviation (s) is used as a threshold criterion and is set to 6 for all levels. (For the 4´4-level the threshold value could be left as an adjusting parameter.) The bit plane is coded by VQ using a codebook with 256 entries, thus the compression effect will be 0.5 bpp for every block coded by VQ. Two subsample images are formed, one from the a-values and another from the b-values of the blocks. They are then coded by FELICS, see Section 4.1.3. The result of the combined BTC algorithm is illustrated in Figure 4.10.

Table 4.5: Elements of the combined BTC algorithm.

Part: Method:Quantization AMBTC

Coding of (a,b): FELICSCoding of bitplane: VQ / 256 entries

Block size: 32´32 ® 2´2

BTC AMBTC HBTC-VQ

bpp = 2.00 bpp = 2.00 bpp = 1.62mse = 43.76 mse = 40.51 mse = 15.62

Figure 4.10: Magnifications of Lena when compressed by various BTC variants.

4.3 Vector quantization

Vector quantization (VQ) is a generalization of scalar quantization technique where the number of possible (pixel) values is reduced. Here the input data consists of M-dimensional vectors (eg. M-pixel blocks) instead of scalars (single pixel values). Thus with 8-bpp gray-scale images the number of different vectors (blocks) is 256M. The input space, however, is not evenly occupied by these vectors. In fact, because of the high correlation between the neighboring pixel values, some input vectors are very common while others hardly ever appear in real images. For example, completely random patterns of pixels are rarely seen but certain structures (like edges, flat areas, and slopes) are found in almost every image.

51

Vector quantization partitions the input space into K non-overlapping regions so that the input space is completely covered. A representative (codevector) is then assigned to each cluster. Vector quantization maps each input vector to this codevector, which is the representative vector of the partition. Typically the space is partitioned so that each vector is mapped to its nearest codevector minimizing certain distortion function. The distortion is commonly the euclidean distance between the two vectors:

d X Y X Yi ii

M

, 2

1(4.12)

where X and Y are two M-dimensional vectors. The codevector is commonly chosen as the centroid of the vectors in the partition, that is

C c c c X X Xm m 1 2 1 2, ,..., , , ,..., (4.13)

where is the average value of the ith component of the vectors belonging to the partition. This selection minimizes the euclidean distortion within the partition. The codebook of vector quantization consists of all the codewords. The design of a VQ codebook is studied in the next section.

Vector quantization is applied in image compression by dividing the image into fixed sized blocks (vectors), typically 4´4, which are then replaced by the best match found from the codebook. The index of the codevector is then sent to the decoder by bits, see Figure 4.11. For example, in the case of 4´4 pixel blocks and a codebook of size 256, the bit rate is log .2 256 16 0 5 (bpp), and the corresponding compression ratio 16 8 8 16× .

Figure 4.11: Basic structure of vector quantization.

4.3.1 Codebook organization

Even if the time complexity of codebook generation algorithms is usually rather high, it is not so critical since the codebook is typically generated only once as a preprocessing stage.

52

However, the search of the best match is applied K times for every block in the image. For example, in the case of 4´4 blocks and K=256, 16×256 multiplications are needed for each block. For an image of 512´512 pixels there are 16384 blocks in total, thus the number of multiplications required is over 67 million.

Tree-structure VQ:

In a tree-structured vector quantization the codebook is organized in an m-ary balanced tree, where the codevectors are located in the leaf nodes, see Figure 4.12. The input vector is compared with m predesigned test vectors at each stage or node of the tree. The nearest test vector determines which of m paths through the tree to select in order to reach the next stage of testing. At each stage the number of code vectors is reduced to 1/m of the previous set of candidates. In many applications m=2 and we have a binary tree. If the codebook size is K=md, then d= m-ary search stages are needed to locate the chosen code vector. An m-ary tree have breadth m and depth d. The drawback of the tree-structured VQ is that the best match is not necessarily found because the search is made heuristically on the basis of the search-tree.

Figure 4.12: Tree-structured vector quantizer codebook.

The number of bits required to encode a vectors is still regardless of the tree-structure. In the case of binary tree, the indexes can be obtained by assigning binary labels for the branches in the tree so that the index of a codevector is a catenation of the labels within the path from root to the leaf node. Now these bit sequences can be transmitted (encoded) in progressive manner from the most significant bits of the blocks to the least significant bits, so that at the first phase the bits of the first branches in the tree structure are sent. Thus the decoder can display the first estimate of the image as soon as the first bits of each block are received. This kind of progressive coding consists of as many stages as there are bits in the code indexes. Note, that the progressive coding is an option that requires no extra bits in the coding.

Classified VQ:

In classified vector quantization instead of having one codebook, several (possibly smaller) codebooks are used. For each block, a classifier selects the codebook where the search is

53

performed, see Figure 4.13. Typically the codebooks are classified according to the shape of the block so that codevectors having horizontal edges might be located in one codebook, blocks having diagonal edges in another, and so on. The encoder has to send two indexes to the decoder; the index of the chosen codevector within the book, but also the index of the class where the codevector was taken. The classified VQ can be seen as a special case of tree-structued VQ where the depth of the tree is 1.

There are two motivations for classified VQ. First, it allows faster search from the codebooks since the codebooks can be smaller than the codebook of a full search VQ. Second, the classified VQ can also be seen as a type of codebook construction method where the codevectors are obtained due to their type (shape). However, the classified VQ codebooks in general are no better than full search codebooks. Consider a classified VQ where we have 4 classes, each having 256 codevectors. The total number of bits required to encode each block is . If we have the same number of bits, we can express 210=1024 different codevectors in a full-search VQ. By choosing the union of the four subcodebooks of classified VQ, the result cannot be any worse than the classified VQ. Thus, the primary benefit of classified VQ is that it allows a faster search (at the cost of extra distortion).

Figure 4.13: Classified vector quantizer.

4.3.2 Mean/residual VQ

In mean/residual vector quantization the blocks are divided into two components; the mean value of the pixels, and the residual where the mean is subtracted from the individual pixel values:

(4.14)

It is easier to design a codebook for the mean-removed blocks than for the original blocks. The range of the residual pixel values is [-255, 255] but they are concentrated around the zero. Thus, the mean/residual VQ is a kind of prediction technique. However, since the predictor is the mean value of the same block, it must be encoded also. A slightly different variant is interpolative/residual VQ, where the predictor is not the mean value of the block, but the result of a bilinear interpolation, see Figure 4.14. The predictor for each pixel is interpolated not only on the basis of the mean value of the block, but also on the basis of the mean values of the neighboring blocks. Details of bilinear interpolation are omitted here.

54

Figure 4.14: Predictors of two residual vector quantizers.

In addition to removing the mean value, one can normalize the pixel values by eliminating the standard deviation also:

(4.15)

where s is the standard deviation of the block. The histogram of the resulting residual values have zero mean and unit distance. What is left is the shape of the block, ie. the correlations between the neighboring pixels. This method can be called as mean/gain/shape vector quantization since it separates these three components from each other. The shape is then coded by vector quantization. Other coding methods, however, might be more suitable for the mean and gain (s). For exampe, one could form a subsample image from the mean values of the blocks and compress them by any image compression algorithm, eg. lossless JPEG. Figure 4.16 shows sample blocks taken from test image Eye (Figure 4.15) when normalized by Equation (4.15).

Figure 4.15: Test image Eye (50´50´8).

55

Figure 4.16: 100 samples of normalized 4´4 blocks from 4.14.

4.3.3 Codebook design

The codebook is usually constructed on the basis of a training set of vectors. The training set consists of sample vectors from a set of images. Denote the number of vectors in the training set by N. The object of the codebook construction is to design a codebook of K vectors so that the average distortion is minimized in respect to the training set. See Figure 4.17 for an example of 100 two-dimensional training vectors and their one possible partitioning into 10 sets.

56

Figure 4.17: Example of 100 sample training set consisting of 2-D vectors plotted into the vector space (left); example of partitioning the vectors into 10 sample regions (right).

Random heuristic:

The codebook can be selected by randomly choosing K vectors from the training set. This method, even if it looks like quite irrational, is useful as the initial codebook for an iterative method like the generalized Lloyd algorithm. It is also applicable as such, since the actual coding is not a random process, but for each image block to be coded, the best possible codevector is chosen from the codebook.

Pairwise nearest neighbor:

The pairwise nearest neighbor (PNN) algorithm starts by including all the N training vectors into the codebook. In each step, two codevectors are merged into one codevector so that the increase in the overall distortion is minimized. The algorithm is then iterated until the number of codevectors is reduced to K. The details of the algorithms are explained in the following.

The PNN algorithm starts by calculating the distortion between all pairs of training vectors. The two nearest neighboring vectors are combined into a single cluster and represented by their centroid. The input space consists now of N-1 clusters, one containing two vectors and the rest containing a single vector. At each phase of the algorithm, the increase in average distortion is computed for every pair of clusters, resulting if the two clusters and their centroids are replaced by the merged two clusters and the corresponding centroid. The two clusters with minimum increase in the distortion are then merged, and the codevector of that cluster is its centroid. See Figure 4.18 for an example of PNN algorithm.

57

Figure 4.18: Pairwise nearest neighbor algorithm for a training set with N=10, and the final codebook with K=5.

58

Splitting algorithm:

The splitting algorithm has the opposite approach to codebook construction compared to the PNN method. It starts by a codebook with a single codevector which is the centroid of the complete training set. The algorithm then produces increasingly larger codebooks by splitting certain codevector Y to two codevectors Y and Y+, where is a vector of small Euclidean norm. One choice of is to make it proportional to the vector whose ith component is the standard deviation of the ith component of the set of training vectors. The new codebook after the splitting can never be worse than the previous one, since it is the same than the previous codebook plus one new codevector. The algorithm is iterated until the size of codebook reaches K.

Generalized Lloyd algorithm:

The Generalized Lloyd Algorithm (GLA), also referred as the LBG-algorithm, is an iterative method that takes a codebook as an input (referred as the initial codebook) and produces a new improved version of the initial codebook (resulting to lower overall distortion). The hypothesis is that the iterative algorithm finds the nearest locally optimal codebook in respect to the initial one. The initial codebook for GLA can be constructed by any existing codebook design method, eg. by the random heuristic. The Lloyd's necessary conditions of optimality are defined as follows:

· Nearest neighbor condition: For a given set of codevectors, each training vector is mapped to its nearest codevector in respect to the distortion function.

· Centroid condition: For a given partition, the optimal codevector is the centroid of the vectors within the partition.

On the basis of these two optimality conditions, the Lloyd's algorithm is formulated as a two-phase iterative process:

1. Divide the training set into partitions by mapping each vector X to its nearest codevector Y using the euclidean distance.

2. Calculate the centroid of each region and replace the codevectors Y by the centroids of their corresponding partitions.

Both stages of the algorithm satisfy the optimality conditions, thus the resulting codebook after one iteration can never be worse than the original one. The iterations are thus continued until no change (=decrease in the overall distortion) is achieved. The algorithm doesn't necessarily reach the global optimum, but converges to a local minimum.

Note, that there is a third condition of optimality stating that there shouldn't be any vector having equal distance with two different codevectors. However, this can be omitted by defining the distortion function so that this condition could never happen. Thus this third optimality condition is omitted in the discussion made here.

4.3.4 Adaptive VQ

59

VQ can also be applied adaptively by designing a codebook on the basis of the image to be coded. The codebook, however, must then be included in the compressed file. For example, consider a codebook of 256 entries each taking 16 bytes. The complete codebook thus requires = 4096 bytes of memory increasing the overall bit rate by 0.125 bits per pixel (in the case of 512´512 image). In dynamic modelling, on the other hand, the compression would start with an initial codebook, which would then be updated during the compression - on the basis of the already coded blocks. This wouldn't increase the bit rate, but the computational load required by eg. GLA might be too much if it was applied after each block.

4.4 JPEG

JPEG (Joint photographic experts group) was started in 1986 by cooperative efforts of both the International organization for standardization (ISO) and International Telegraph and Telephone Consultative Committee (CCITT), and it was later joined by International Electrotechnical Committee (IEC). The purpose of JPEG was to create an image compression standard for gray-scale and color images. Even if the name JPEG refers to the standardization organization it was also adopted to the compression method. The JPEG standard includes the following modes of operation:

· Lossless coding· Sequential coding· Progressive coding· Hierarchical coding

The lossless coding mode is completely different form the lossy coding and was presented in Section 4.1.1. The lossy baseline JPEG (sequential coding mode) is based on discrete cosine transform (DCT), and is introduced next.

4.4.1 Discrete cosine transform

The 1-D discrete cosine transform (DCT) is defined as

C u u f xx u

Nx

N

×

cos2 1

20

1

(4.16)

Similarly, the inverse DCT is defined as

f x u C ux u

Nu

N

×

cos2 1

20

1

(4.17)

where

uN u

N u N

1 0

2 1 1

for

for ,2,...,(4.18)

The corresponding 2-D DCT, and the inverse DCT are defined as

C u v u v f x y

x uN

y vNy

N

x

N

, , cos cos ×

×

2 1

22 1

20

1

0

1

(4.19)

60

and

f x y u v C u v

x uN

y vNv

N

u

N

, , cos cos ×

×

2 1

22 1

20

1

0

1

(4.20)

The advantage of DCT is that it can be expressed without complex numbers. 2-D DCT is also separable (like 2-D Fourier transform), i.e. it can be obtained by two subsequent 1-D DCT in the same way than Fourier transform. See Figure 7.2 for an example of basis functions of the 1-D DCT, Figure 7.3 for an example of the basis functions of the 2-D DCT, and Figure 7.3 for illustration of the 2-D DCT for 4´4 sample blocks.

Figure 4.19: 1-D DCT basis functions for N=8.

61

Figure 4.20: 2-D DCT basis functions for N=4. Each block consists of 4´4 elements, corresponding to x and y varying from 0 to 3.

62

ORIGINALIMAGE

TRANSFORMEDIMAGE

FLAT10 10 10 10 40.0 0.0 0.0 0.010 10 10 10 0.0 0.0 0.0 0.010 10 10 10 0.0 0.0 0.0 0.010 10 10 10 0.0 0.0 0.0 0.0

RANDOM TEXTURE11 15 18 14 58.8 0.3 -1.8 1.314 11 13 12 -3.9 -2.8 -3.5 2.615 16 19 12 2.7 -1.7 1.2 -3.418 17 12 18 3.0 -0.9 -5.3 1.8

IMPULSE10 10 10 10 42.5 1.4 -2.5 -3.210 20 10 10 1.4 0.7 -1.4 -1.810 10 10 10 -2.5 -1.4 2.5 3.310 10 10 10 -3.2 -1.8 3.3 4.3

LINE (horizontal)10 10 10 10 50.0 0.0 0.0 0.010 10 10 10 -5.4 0.0 0.0 0.020 20 20 20 -10.1 0.0 0.0 0.010 10 10 10 13.1 0.0 0.0 0.0

EDGE (vertical)10 10 20 20 60.0 -18.4 0.0 7.710 10 20 20 0.0 0.0 0.0 0.010 10 20 20 0.0 0.0 0.0 0.010 10 20 20 0.0 0.0 0.0 0.0

EDGE (horizontal)10 10 10 10 60.0 0.0 0.0 0.010 10 10 10 -18.4 0.0 0.0 0.020 20 20 20 0.0 0.0 0.0 0.020 20 20 20 7.7 0.0 0.0 0.0

EDGE (diagonal)10 10 10 10 55.0 -11.1 0.0 -0.710 10 10 20 -11.1 5.0 4.6 0.010 10 20 20 0.0 4.6 -5.0 -1.910 20 20 20 -0.7 0.0 -1.9 5.0

SLOPE (horizontal)10 12 14 16 52.0 -8.9 0.0 -0.610 12 14 16 0.0 0.0 0.0 0.010 12 14 16 0.0 0.0 0.0 0.010 12 14 16 0.0 0.0 0.0 0.0

Figure 4.21: Example of DCT for sample 4´4 blocks.

4.4.2 Baseline JPEG

The image is first segmented into 8´8 blocks of pixels, which are then separately coded. Each block is transformed to frequency domain by fast discrete cosine transform (FDCT). The transformed coefficients are quantized and then entropy coded either by arithmetic coder

63

(QM-coder with binary decision tree) or by Huffman coding. See Figure 4.22 for the main structure of the baseline JPEG. The corresponding decoding structure is given in Figure 4.22.

Neither the DCT nor the entropy coding lose any information from the image. The DCT only transforms the image into frequency space so that it is easier to compress. The only phase resulting to distortion is the quantization phase. The pixels in the original block are represented by 8-bit integers, but the resulting transform coefficients are 16 bit real numbers, thus the DCT itself would result extension in the file size, if no quantization were performed.

The quantization in JPEG is done by dividing the transform coefficients ci (real number) by the so-called quantization factor qi (integer between 1..255):

(4.21)

The result is rounded to the nearest integer, see Figure 4.24 for an example. The higher the quantization factor, the less accurate is the representation of the value. Even the lowest quantization factor (q=1) results to a small amount of distortion, since the original coefficients are real number, but the quantized values are integers. The dequantization is defined by

r c qi i i × (4.22)

Figure 4.22: Main structure of JPEG encoder.

Figure 4.23: Main structure of JPEG decoder.

64

Figure 4.24: Example of quantization by factor of 2.

In JPEG, the quantization factor is not uniform within the block. Instead, the quantization is performed so that more bits are allocated to the low frequency components (consisting the most important information) than to the high frequency components. See Table 4.6 for an example of possible quantization matrices. The basic quantization tables of JPEG are shown in Table 4.7, where the first one is applied both in the gray-scale image compression, and for the Y component in color image compression (assuming YUV, or YIQ color space). The second quantization table is for the chrominance components (U and V in YUV color space). The bit rate of JPEG can be adjusted by scaling the basic quantization tables up (to achieve lower bit rates), or down (to achieve higher bit rates). The relative differences between the qi

factors are retained.

Table 4.6: Possible quantization tables.

Uniform quantization More accurate quantization Less accurate quantization16 16 16 16 16 16 16 16 1 2 2 4 4 8 16 16 8 64 64 128 256 256 256 25616 16 16 16 16 16 16 16 2 4 8 8 8 16 16 32 64 128 128 128 256 256 256 25616 16 16 16 16 16 16 16 4 4 8 16 16 16 32 32 128 256 256 256 256 256 256 25616 16 16 16 16 16 16 16 4 8 16 16 16 32 32 32 256 256 256 256 256 256 256 25616 16 16 16 16 16 16 16 8 16 16 32 32 32 32 32 256 256 256 256 256 256 256 25616 16 16 16 16 16 16 16 8 16 16 32 32 32 64 64 256 256 256 256 256 256 256 25616 16 16 16 16 16 16 16 16 16 32 32 32 32 64 64 256 256 256 256 256 256 256 25616 16 16 16 16 16 16 16 16 16 32 32 32 64 64 64 256 256 256 256 256 256 256 256

Table 4.7: JPEG quantization tables.

Luminance Chrominance16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 9912 12 14 19 26 58 60 55 18 21 26 66 99 99 99 9914 13 16 24 40 57 69 56 24 26 56 99 99 99 99 9914 17 22 29 51 87 80 62 47 66 99 99 99 99 99 9918 22 37 56 68 109 103 77 99 99 99 99 99 99 99 9924 35 55 64 81 104 113 92 99 99 99 99 99 99 99 9949 64 78 87 103 121 120 101 99 99 99 99 99 99 99 9972 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99

65

The entropy coding in JPEG is either the Huffman or arithmetic coding. Here the former is briefly discussed. The first coefficient (denoted by DC-coefficient) is separately coded from the rest of the coefficients (AC-coefficients). The DC-coefficient is coded by predicting its value on the basis of the DC-coefficient of the previously coded block. The difference between the original and the predicted value is then coded using similar code table than was applied in the lossless JPEG (see Section 4.1.1). The AC-coefficients are then coded one by one in the order given by zig-zag scanning (see Section 2.1.2). No prediction is made, but a simple run-length coding is applied. All sub-sequent zero-value coefficients are coded by their number (the length of the run). Huffman coding is then applied to the non-zero coefficients. The details of the entropy coding can be found in [Pennebaker & Mitchell 1993].

Table 4.8 gives an example of compressing a sample block by JPEG using the basic quantization table. The result of compressing test image Lena by JPEG is shown in Figures 4.25 and 4.26.

4.4.3 Other options in JPEG

JPEG for color images:

RGB color images are compressed in JPEG by transforming the image first into YUV (or YIQ in the case of North America) and then compressing the three color components separately. The chrominance components are often sub-sampled so that 2´2 block of the original pixels forms a new pixel in the sub-sampled image. The color component images are then upsampled to their original resolution in the decompression phase.

Progressive mode:

Progressive JPEG is rather trivial. Instead of coding the image block after block, the coding is divided into several stages. At the first stages the DC-coefficients are coded from each block. The decoder can get rather good approximation of the image on the basis on the DC-coefficients alone, since they contain the information of the average value of the blocks. At the second stage, the first significant AC-coefficients (determined by the zig-zag order) are coded. At the third phase the next significant AC-coefficients are coded, and so on. In total, there are 64 coefficients in each block, so the progressive coding can have at most 64 stages. In practice, the progressive coding can be changed back to the sequential order for example already after the first stage. This is because the DC-coefficients are usually enough for the decoder to decide whether the image is worth retrieving.

Hierarchical mode:

The hierarchical coding mode of JPEG is a variant of progressive modelling, too. A reduced resolution version of the image is compressed first followed by the higher resolution versions in increasing order. In each case the resolution is doubled for the next image similarly than was done in JBIG.

66

Table 4.8: Example of a sample block compressed by JPEG.

Original block Transformed block139 144 149 153 155 155 155 155 235.6 -1.0 -12.1 -5.2 2.1 -1.7 -2.7 1.3144 151 153 156 159 156 156 156 -22.6 -17.5 -6.2 -3.2 -2.9 -0.1 0.4 -1.2150 155 160 163 158 156 156 156 -10.9 -9.3 -1.6 1.5 0.2 -0.9 -0.6 -0.1159 161 162 160 160 159 159 159 -7.1 -1.9 0.2 1.5 0.9 -0.1 0.0 0.3159 160 161 162 162 155 155 155 -0.6 -0.8 1.5 1.6 -0.1 -0.7 0.6 1.3161 161 161 161 160 157 157 157 1.8 -0.2 1.6 -0.3 -0.8 1.5 1.0 -1.0162 162 161 163 162 157 157 157 -1.3 -0.4 -0.3 -1.5 -0.5 1.7 1.1 -0.8162 162 161 161 163 158 158 158 -2.6 1.6 -3.8 -1.8 1.9 1.2 -0.6 -0.4

Quantization matrix Quantized coefficients16 11 10 16 24 40 51 61 15 0 -1 0 0 0 0 012 12 14 19 26 58 60 55 -2 -1 0 0 0 0 0 014 13 16 24 40 57 69 56 -1 -1 0 0 0 0 0 014 17 22 29 51 87 80 62 -1 0 0 0 0 0 0 018 22 37 56 68 109 103 77 0 0 0 0 0 0 0 024 35 55 64 81 104 113 92 0 0 0 0 0 0 0 049 64 78 87 103 121 120 101 0 0 0 0 0 0 0 072 92 95 98 112 100 103 99 0 0 0 0 0 0 0 0

Dequantized coefficients Decompressed block240 0 -10 0 0 0 0 0 144 146 149 152 154 156 156 156-24 -12 0 0 0 0 0 0 148 150 152 154 156 156 156 156-14 -13 0 0 0 0 0 0 155 156 157 158 158 157 156 1560 0 0 0 0 0 0 0 160 161 161 162 161 159 157 1550 0 0 0 0 0 0 0 163 163 164 163 162 160 158 1560 0 0 0 0 0 0 0 163 164 164 164 162 160 158 1570 0 0 0 0 0 0 0 160 161 162 162 162 161 159 1580 0 0 0 0 0 0 0 158 159 161 161 162 161 159 158

67

Original JPEGbpp = 8.00 mse = 0.00 bpp = 1.00 mse = 17.26

JPEG JPEGbpp = 0.50 mse = 33.08 bpp = 0.25 mse = 79.11

Figure 4.25: Test image Lena compressed by JPEG. Mse refers to mean square error.

68

Original JPEGbpp = 8.00 mse = 0.00 bpp = 1.00 mse = 17.26

JPEG JPEGbpp = 0.50 mse = 33.08 bpp = 0.25 mse = 79.11

Figure 4.26: Magnifications of Lena compressed by JPEG.

69

4.5 JPEG2000

--- to be written later ---Equation numbers: Next number ---> (4.23)Figure numbers: Next number ---> (4.27)

4.5.1 Wavelet transform

The basic idea of (discrete) wavelet transform is to decompose the image into smooth and detail components. The decomposition is perform in horizontal and in vertical directions separately. The smooth component represents the average color information and the detail component the differentials of neighboring pixels. The smooth component is obtained using a low pass filter and the detail component by high pass filter.

· Within each family of wavelets (such as the Daubechies family) are wavelet subclasses distinguished by the number of coefficients and by the level of iteration. Wavelets are classified within a family most often by the number of vanishing moments. This is an extra set of mathematical relationships for the coefficients that must be satisfied, and is directly related to the number of coefficients (1). For example, within the Coiflet wavelet family are Coiflets with two vanishing moments, and Coiflets with three vanishing moments. In Figure 4, several different wavelet families are illustrated.

· The matrix is applied in a hierarchical algorithm, sometimes called a pyramidal algorithm. The wavelet coefficients are arranged so that odd rows contain an ordering of wavelet coefficients that act as the smoothing filter, and the even rows contain an ordering of wavelet coefficient with different signs that act to bring out the data's detail. The matrix is first applied to the original, full-length vector. Then the vector is smoothed and decimated by half and the matrix is applied again. Then the smoothed, halved vector is smoothed, and halved again, and the matrix applied once more. This process continues until a trivial number of "smooth-smooth-smooth..." data remain. That is, each matrix application brings out a higher resolution of the data while at the same time smoothing the remaining data. The output of the DWT consists of the remaining "smooth (etc.)" components, and all of the accumulated "detail" components.

70

Figure 4.??: Different families of wavelet functions.

71

Figure 4.??: Example of vertical and horizontal sub band decomposition.

Figure 4.??: Illustration of the first and second iterations of wavelet decomposition.

4.5.2 Wavelet-based compression

· Filtering· Quantizer· Entropy coding· Arithmetic coding· Bit allocation

72

4.6 Fractal coding

Fractals can be considered as a set of mathematical equations (or rules) that generate fractal images; images that have similar structures repeating themselves inside the image. The image is the inference of the rules and it has no fixed resolution like raster images. The idea of fractal compression is to find a set of rules that represents the image to be compressed, see Figure 4.25. The decompression is the inference of the rules. In practice, fractal compressions tries to decompose the image into smaller regions which are described as linear combinations of the other parts of the image. These linear equations are the set of rules.

The algorithm presented here is the Weighted Finite Automata (WFA) proposed by Karel Culik II and Jarkko Kari [1993]. It is not the only existing fractal compression method, but has been shown to work well in practice, rather than being only a theoretical model. In WFA the rules are represented by a finite automata consisting of states (Qi), and transitions from one state to another (fj). Each state represents an image. The transitions leaving the state define how the image is constructed on the basis of the other images (states). The aim of WFA is to find such an automate A that represents the original image as well as possible.

Figure 4.25: Fractal compression.

The algorithm is based on quadtree decomposition of the images, thus the states of the automata are squared blocks. The subblocks (quadrants) of the quadtree have been addressed as shown in Figure 4.26. In WFA, each block (state) is described by the content of its four subblocks. This means that the complete image is also one state in the automata. Let us next consider an example of an automata (Figure 4.27), and the image it creates (Figure 4.28). Here we adopt a color space where 0 represents the white color (void), and 1 represents the black color (element). The decimal values from 0 to 1 are different shades of gray.

The labels of the transitions indicates what subquadrant the transition is applied to. For example, the transition from Q0 to Q1 is used for the quadrant 3 (top rightmost quadrant) with the weight ½, and for the quadrants 1 (top leftmost) and 2 (bottom rightmost) with the weight of ¼. Denote the expression of the quadrant d in Qi by fi(d). Thus, the expression of the quadrants 1 and 2 in Q0 is given by ½×Q0+¼×Q1. Note that the definition is recursive so that these quadrants in Q0 are the same image as the one described by the state itself, but only half of its size, plus added the one fourth of the image defined by the state Q1.

A 2k´2k resolution representation of the image in Q0 is constructed by assigning the pixels by a value f0(s), where s is the k-length string of the pixel's address in the kth level of the

73

quadtree. For an example, the pixel values at the addresses 00, 03, 30, and 33 are given below:

f f f0 0 000 12

0 12

12

12

12

12

18

× × × × () (4.14)

f f f f0 0 0 103 12

3 12

12

12

18

14

38

× () () (4.15)

f f f f f0 0 1 0 130 12

0 12

0 12

12

12

18

12

58

× × × × × () () (4.16)

f f f f f f0 0 1 0 1 133 12

3 12

3 12

12

12

12

18

14

12

78

× × × × () () () (4.17)

Note, that fi() with empty string evaluates to the final weight of the state, which is ½ in Q0

and 1 in Q1. The image at three different resolutions is shown in Figure 4.28.

Figure 4.26: (a) The principle of addressing the quadrants; (b) Example of addressing at the resolution of 4´4; (c) The subsquare specified by the string 320.

Figure 4.27: A diagram for a WFA defining the linear grayness function of Figure 4.28.

74

Figure 4.28: Image generated by the automata in Figure 4.27 at different resolutions.

The second example is the automata given in Figure 4.29. Here the states are illustrated by the images they define. The state Q1 is expressed in the following way. The quadrant 0 is the same as Q0, whereas the quadrant 3 is empty so no transitions with the label 3 exists. The quadrants 1 and 2, on the other hand, are recursively described by the same state Q1. The state Q2 is expressed as follows. The quadrant 0 is the same as Q1, and the quadrant 3 is empty. The quadrants 1 and 2 are again recursively described by the same state Q2. Besides the different color (shading gray) the left part of the diagram (states Q3, Q4, Q5) is the same as the right part (states Q0, Q1, Q2). For example, Q4 has the shape of Q1 but the color of Q3. The state Q5 is described so that the quadrant 0 is Q4, quadrant 1 is Q2, and quadrant 2 is the same as the state itself.

Figure 4.29: A WFA generating the diminishing triangles.

75

WFA algorithm:

The aim of WFA is to find such an automata that describes the original image as closely as possible, and is as small as possible by its size. The distortion is measured by TSE (total square error). The size of the automata can be approximated by the number of states and transitions. The minimized optimization criteria is thus:

d f f G size Ak A, × (4.18)

The G-parameter defines how much emphasis is put on the distortion and to the bit rate. It is left as an adjusting parameter for the user. The higher is the G, the smaller bit rates will be achieved at the cost of image quality, and vice versa. Typically G has values from 0.003 to 0.2.

The WFA-algorithm compresses the blocks of the quadtree in two different ways:

· by a linear combination of the functions of existing states· by adding new state to the automata, and recursively compressing its four subquadrants.

Whichever alternative yields a better result in minimizing (4.18) is then chosen. A small set of states (initial basis) is predefined. The functions in the basis do not need even to be defined by a WFA. The choice of the functions can, of course, depend on the type of images one wants to compress. The initial basis in [Culik & Kari, 1993] resembles the codebook in vector quantization, which can be viewed as a very restricted version of the WFA fractal compression algorithm.

The algorithm starts at the top level of the quadtree, which is the complete image. It is then compressed by the WFA algorithm, see Figure 4.30. The linear combination of a certain block is chosen by the following greedy heuristic: The subquadrant k of a block i is matched against each existing state in the automata. The best match j is chosen, and a transition from i to j is created with the label k. The match is made between normalized blocks so that their size and average value are scaled to be equal. The weight of the transition is the relative difference between the mean values. The process is then repeated for the residual, and is carried over until the reduction in the square error between the original and the block described by the linear combination is low enough. It is a trade-off between the increase in the bit rate and the decrease in the distortion.

76

Figure 4.30: Two ways of describing the image: (a) by the linear combination of existing states; (b) by creating a new state which is recursively processed.

WFA algorithm for wavelet transformed image:

A modification of the algorithm that yields good results is to combine the WFA with a wavelet transformation [DeVore et al. 1992]. Instead of applying the algorithm directly to the original image, one first makes a wavelet transformation on the image and writes the wavelet coefficients in the Mallat form. Because the wavelet transform has not been considered earlier, the details of this modification are omitted here.

Compressing the automata:

The final bitstream of the compressed automata consists of three parts:

· Quadree structure of the image decomposition.· Transitions of the automata· Weights of the transitions

A bit in the quadtree structure indicates whether a certain block is described as a linear combination of the other blocks (a set of transitions), or by a new state in the automata. Thus the states of the automata are implicitly included in the quadtree. The initial states need not to be stored.

The transitions are stored in an n´n matrix, where each non-zero cell M(i, j)=wij describes that there is a transition from Qi to Qj having the weight wij. If there is no transition between i and j, wij is set to zero. The label (0,1,2, or 3) is not stored. Instead there are four different matrices, one for each subquadrant label. The matrices Mk(i, j) (k=0,1,2,3) are then represented as binary matrices Bk(i, j) so that B(i, j)=1 if and only if wij0; otherwise B(i, j)=0. The consequence of this is that only the non-zero weights are needed to store.

The binary matrices are very sparse because for each state only few transitions exists, therefore they can be efficiently coded by run-length coding. Some states are more frequently used in linear combinations than other, thus arithmetic coding using the column j as the context was considered in [Kari & Fränti, 1994]. The non-zero weights are then quantized and a variable length coding (similar to the FELICS coding) is applied.

77

p, 01/03/-1,

Page: 68

The results of the WFA outperform that of the JPEG, especially at the very low bit rates. Figures 4.31 and 4.32 illustrate test image Lena when compressed by WFA at the bit rates 0.30, 0.20, and 0.10. The automata (at the 0.20 bpp) consists of 477 states and 4843 transitions. The quadtree requires 1088 bits, whereas the bitmatrices 25850, and the weights 25072 bits.

Original WFAbpp = 8.00 mse = 0.00 bpp = 0.30 mse = 49.32

WFA WFAbpp = 0.20 mse = 70.96 bpp = 0.10 mse = 130.03

Figure 4.31: Test image Lena compressed by WFA.

78

Original WFAbpp = 8.00 mse = 0.00 bpp = 0.30 mse = 49.32

WFA WFAbpp = 0.20 mse = 70.96 bpp = 0.10 mse = 130.03

Figure 4.32: Magnifications of Lena compressed by WFA.

79

5 Video images

Video images can be regarded as a three-dimensional generalization of still images, where the third dimension is time. Each frame of a video sequence can be compressed by any image compression algorithm. A method where the images are separately coded by JPEG is sometimes referred as Motion JPEG (M-JPEG). A more sophisticated approach is to take advantage of the temporal correlations; i.e. the fact that subsequent images resemble each other very much. This is the case in the latest video compression standard MPEG (Moving Pictures Expert Group).

MPEG:

MPEG standard consists of both video and audio compression. MPEG standard includes also many technical specifications such as image resolution, video and audio synchronization, multiplexing of the data packets, network protocol, and so on. Here we consider only the video compression in the algorithmic level. The MPEG algorithm relies on two basic techniques

· Block based motion compensation· DCT based compression

MPEG itself does not specify the encoder at all, but only the structure of the decoder, and what kind of bit stream the encoder should produce. Temporal prediction techniques with motion compensation are used to exploit the strong temporal correlation of video signals. The motion is estimated by predicting the current frame on the basis of certain previous and/or forward frame. The information sent to the decoder consists of the compressed DCT coefficients of the residual block together with the motion vector. There are three types of pictures in MPEG:

· Intra-pictures (I)· Predicted pictures (P)· Bidirectionally predicted pictures (B)

Figure 5.1 demonstrates the position of the different types of pictures. Every Nth frame in the video sequence is an I-picture, and every Mth frame a P-picture. Here N=12 and M=4. The rest of the frames are B-pictures.

Compression of the picture types:

Intra pictures are coded as still images by DCT algorithm similarly than in JPEG. They provide access points for random access, but only with moderate compression. Predicted pictures are coded with reference to a past picture. The current frame is predicted on the basis of the previous I- or P-picture. The residual (difference between the prediction and the original picture) is then compressed by DCT. Bidirectional pictures are similarly coded than the P-pictures, but the prediction can be made both to a past and a future frame which can be I- or P-pictures. Bidirectional pictures are never used as reference.

80

The pictures are divided into 16´16 macroblocks, each consisting of four 8´8 elementary blocks. The B-pictures are not always coded by bidirectional prediction, but four different prediction techniques can be used:

· Bidirectional prediction· Forward prediction· Backward prediction· Intra coding.

The choice of the prediction method is chosen for each macroblock separately. The bidirectional prediction is used whenever possible. However, in the case of sudden camera movements, or a breaking point of the video sequence, the best predictor can sometimes be given by the forward predictor (if the current frame is before the breaking point), or backward prediction (if the current frame is after the breaking point). The one that gives the best match is chosen. If none of the predictors is good enough, the macroblock is coded by intra-coding. Thus, the B-pictures can consist of macroblock coded like the I-, and P-pictures.

The intra-coded blocks are quantized differently from the predicted blocks. This is because intra-coded blocks contain information in all frequencies and are very likely to produce 'blocking effect' if quantized too coarsely. The predicted blocks, on the other hand, contain mostly high frequencies and can be quantized with more coarse quantization tables.

Figure 5.1: Interframe coding in MPEG.

Motion estimation:

The prediction block in the reference frame is not necessarily in the same coordinates than the block in the current frame. Because of motion in the image sequence, the most suitable predictor for the current block may exist anywhere in the reference frame. The motion estimation specifies where the best prediction (best match) is found, whereas motion compensation merely consists of calculating the difference between the reference and the current block.

The motion information consists of one vector for forward predicted and backward predicted macroblocks, and of two vectors for bidirectionally predicted macroblocks. The MPEG standard does not specify how the motion vectors are to be computed, however, block matching techniques are widely used. The idea is to find in the reference frame a similar macroblock to the macroblock in the current frame (within a predefined search range). The

81

candidate blocks in the reference frame are compared to the current one. The one minimizing a cost function measuring the mismatch between the blocks, is the one which is chosen as reference block.

Exhaustive search where all the possible motion vectors are considered are known to give good results. Because the full searches with a large search range have such a high computational cost, alternatives such as telescopic and hierarchical searches have been investigated. In the former one, the result of motion estimation at a previous time is used as a starting point for refinement at the current time, thus allowing relatively narrow searches even for large motion vectors. In hierarchical searches, a lower resolution representation of the image sequence is formed by filtering and subsampling. At a reduced resolution the computational complexity is greatly reduced, and the result of the lower resolution search can be used as a starting point for reduced search range conclusion at full resolution.

82

Literature

All M. Rabbani, P.W. Jones, Digital Image Compression Techniques. Bellingham, USA, SPIE Optical Engineering Press, 1991.

All B.Furth (editor), Handbook of Multimedia Computing. CRC Press, Boca Raton, 1999.All C.W.Brown and B.J.Shepherd, Graphics File Formats: Reference and Guide. Manning

publications, Greenwich, 1995.All M. Nelson, Data Compression: The Complete Reference (2nd edition). Springer-Verlag, New

York, 2000.All J.A. Storer, M. Cohn (editors), IEEE Proc. of Data Compression Conference, Snowbird,

Utah, 2002.1-3 I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing

Documents and Images. Van Nostrand Reinhold, New York, 1994.1 R.E. Gonzalez, R.E. Woods, Digital Image Processing. Addison-Wesley, 1992.1 A.Low, Introductory Computer Vision and Image Processing. McGraw-Hill, 1991.1 P. Fränti, Digital Image Processing, University of Joensuu, Dept. of Computer Science.

Lecture notes, 1998.1.3 M. Miyahara, K. Kotani and V.R. Algazi, Objective Picture Quality Scale (PQS) for Image

Coding, IEEE Transactions on Communications, Vol. 46 (9), 1215-1226, September 1998.1.3 P. Fränti, "Blockwise distortion measure for statistical and structural errors in digital images",

Signal Processing: Image Communication, 13 (2), 89-98, August 1998.2 J. Teuhola, Source Encoding and Compression, Lecture Notes, University of Turku, 1998.2.1.3 T. Bell, J. Cleary, I. Witten, Text Compression. Prentice-Hall, Englewood Cliffs, New Jersey,

1990.2.2 P.G. Howard, The Design and Analysis of Efficient Lossless Data Compression Systems .

Brown University, Ph.D. Thesis (CS-93-28), June 1993.2.2.1 D. Huffman, A Method for the Reconstruction of Minimum Redundancy Codes. Proc. of the

IRE, Vol. 40, 1098-1101, 1952.2.2.2 J. Rissanen, G.G. Langdon, Arithmetic Coding. IBM Journal of Research and Development,

Vol. 23 (2), 149-162, March 1979.2.2.2 W.B. Pennebaker, J.L. Mitchell, G.G. Langdon, R.B. Arps, An Overview of the Basic

Principles of the Q-coder adaptive Binary Arithmetic Coder. IBM Journal of Research and Development, Vol. 32 (6), 717-726, November 1988.

3 R. Hunter, A.H. Robinson, International digital facsimile coding standards. Proceedings of the IEEE, Vol. 68 (7), 854-867, July 1980.

3 Y. Yasuda, Overview of Digital Facsimile Coding Techniques in Japan. Proceedings of the IEEE, Vol. 68 (7), 830-845, July 1980.

3 & 4.1 R.B. Arps and T.K. Truong , “Comparison of international standards for lossless still image compression”. Proc. of the IEEE, 82, 889-899, June 1994.

3 & 4 P. Fränti, Block Coding in Image Compression, Ph.D. Thesis, University of Turku, Dept. of Computer Science, 1994. (Research report R-94-12)

3.1 A.N. Netravali, F.W. Mounts, Ordering Techniques for Facsimile Coding: A Review. Proceedings of the IEEE, Vol. 68 (7), 796-807, July 1980.

3.1 Y. Wao, J.-M. Wu, Vector Run-Length Coding of Bi-level Images. Proceedings Data Compression Conference, Snowbird, Utah, 289-298, 1992.

3.3 CCITT, Standardization of Group 3 Facsimile Apparatus for Document Transmission, ITU Recommendation T.4, 1980.

83

3.3 CCITT, Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus, ITU Recommendation T.6, 1984.

3.3 Y. Yasuda, Y. Yamazaki, T. Kamae, B. Kobayashi, Advanced in FAX. Proceedings of the IEEE, Vol. 73 (4), 706-730, April 1985.

3.4 M. Kunt, O. Johnsen, Block Coding: A Tutorial Review. Proceedings of the IEEE, Vol. 68 (7), 770-786, July 1980.

3.4 P. Fränti and O. Nevalainen, "Compression of binary images by composite methods based on the block coding", Journal of Visual Communication and Image Representation, 6 (4), 366-377, December 1995.

3.5 ISO/IEC Committee Draft 11544, Coded Representation of Picture and Audio Information - Progressive Bi-level Image Compression, April 1992.

3.5 G.G. Langdon, J. Rissanen, Compression of Black-White Images with Arithmetic Coding. IEEE Transactions on Communications, Vol. 29 (6), 358-367, June 1981.

3.5 B. Martins, and S. Forchhammer, Bi-level image compression with tree coding. IEEE Transactions on Image Processing, April 1998, 7 (4), 517-528.

3.5 E.I. Ageenko and P. Fränti, “Enhanced JBIG-based compression for satisfying objectives of engineering document management system”, Optical Engineering, 37 (5), 1530-1538, May 1998.

3.5 E.I. Ageenko and P. Fränti, "Forward-adaptive method for compressing large binary images", Software Practice & Experience, 29 (11), 1999.

3.6 P.G. Howard, “Text image compression using soft pattern matching”, The Computer Journal, 40 (2/3), 146-156, 1997.

3.6 Howard, P. G., Kossentini, F., Martins, B., Forchammer, S, and Rucklidge, W. J., The emerging JBIG2 standard. IEEE Trans. Circuits and Systems for Video Technology, November 1998, 8 (7), 838-848.

4.1 Arps R.B., Truong T.K., Comparison of International Standards for Lossless Still Image Compression. Proceedings of the IEEE, Vol. 82 (6), 889-899, June 1994.

4.1.1 M. Rabbani and P.W. Melnychuck, Conditioning Context for the Arithmetic Coding of Bit Planes. IEEE Transactions on Signal Processing, Vol. 40 (1), 232-236, January 1992.

4.1.2 P.E. Tischer, R.T. Worley, A.J. Maeder and M. Goodwin, Context-based Lossless Image Compression. The Computer Journal, Vol. 36 (1), 68-77, January 1993.

4.1.2 N. Memon and X. Wu, Recent developments in context-based predictive techniques for lossless image compression. The Computer Journal, Vol. 40 (2/3), 127-136, 1997.

4.1.3 P.G. Howard, J.S. Vitter, Fast and Efficient Lossless Image Compression. Proceedings Data Compression Conference, Snowbird, Utah, 351-360, 1993.

4.1.4 M. Weinberger, G. Seroussi and G. Sapiro, “The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS”, Research report HPL-98-193, Hewlett Packard Laboratories. (submitted to IEEE Transactions on Image Processing)

4.1.4 X. Wu and N.D. Memon, “Context-based, adaptive, lossless image coding”, IEEE Transactions on Communications, Vol. 45 (4), 437-444, April 1997.

4.1.5 S. Takamura, M. Takagi, Lossless Image Compression with Lossy Image Using Adaptive Prediction and Arithmetic Coding. Proceedings Data Compression Conference, Snowbird, Utah, 166-174, 1994.

4.1.6 J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, Vol. 23 (3), 337-343, May 1977.

4.1.6 J. Ziv, A. Lempel, Compression of Individual Sequences Via Variable-Rate Coding. IEEE Transactions on Information Theory, Vol. 24 (5), 530-536, September 1978.

4.2 E.J. Delp, O.R. Mitchell, Image Coding Using Block Truncation Coding. IEEE Transactions on Communications, Vol. 27 (9), 1335-1342, September 1979.

84

4.2 P. Fränti, O. Nevalainen and T. Kaukoranta, "Compression of Digital Images by Block Truncation Coding: A Survey", The Computer Journal, vol. 37 (4), 308-332, 1994.

4.3 A. Gersho, R.M. Gray, Vector Quantization and Signal Compression. Kluwer Academic Publishers, Dordrecht, 1992.

4.3 N.M. Nasrabadi, R.A. King, Image Coding Using Vector quantization: A Review. IEEE Transactions on Communications, Vol. 36 (8), 957-971, August 1988.

4.3 Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications, Vol.28 (1), 84-95, January 1980.

4.4 W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard. Van Nostrand Reinhold, 1993.

4.5 M. Vetterli and C. Herley, Wavelets and Filter Banks: Theory and Design, IEEE Transactions on Signal Processing, Vol. 40, 2207-2232, 1992.

4.5 R.A. DeVore, B. Jawerth, B.J. Lucier, Image Compression Through Wavelet Transform Coding. IEEE Transactions on Information Theory, Vol. 38 (2), 719-746, March 1992.

4.5 T. Ebrahimi, C. Christopoulos and D.T. Lee: Special Issue on JPEG-2000, Signal Processing: Image Communication, Vol. 17 (1), Pages 1-144, January 2002.

4.5 D.S. Taubman, M.W. Marcellin, JPEG-2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Dordrecht, 2002.

4.6 Y. Fisher, Fractal Image Compression: Theory and Application. Springer-Verlag, New York, 1995.

4.6 K. Culik, J. Kari, Image Compression Using Weighted Finite Automata. Computers & Graphics, Vol. 17 (3), 305-313, 1993.

4.6 J. Kari, P. Fränti, Arithmetic Coding of Weighted Finite Automata. Theoretical Informatics and Applications, Vol. 28 (3-4), 343-360, 1994.

6 D.J. LeGall, The MPEG Video Compression Algorithm. Signal Processing: Image Communication, Vol. 4 (2), 129-139, April 1992.

85

Appendix A: CCITT test images

IMAGE 1 IMAGE 2 IMAGE 3 IMAGE 4

IMAGE 5 IMAGE 6 IMAGE 7 IMAGE 8

86

Appendix B: Gray-scale test images

BRIDGE (256´256) CAMERA (256´256)

BABOON (512´512) LENA (512´512)

87