intre-prediction compression

10
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38. NO. 12, DECEMBER 1990 2137 Visual Pattern Image Coding Abstract-This paper presents a new framework for digital image compression called visual pattern image coding, or VPIC. In VPIC, a set of visual patterns is defined independent of the images to be coded. Each visual pattern is a subimage of limited spatial support that is visually meaningful to a normal human observer. The patterns are used as a basis for efficient image representation; since it is assumed that the images to be coded are natural optical images to be viewed by human observers, visual pattern design is developed using relevant psychophysi- cal and physiological data. Although VPIC bears certain resemblances to block truncation (BTC) and vector quantization (VQ) image coding, there are important differ- ences. First, there is no training phase required: the visual patterns derive from models of perceptual mechanisms; second, the assignment of patterns to image regions is not based on a standard (norm) error criterion; expensive search operations are eliminated. VPIC offers: i) excellent image quality agreeing with perception; ii) compression ratios comparable to or exceeding those offered by VQ; and iii) vastly reduced coding complexity. In the simplest (static, nonhierarchical) implementa- tion, only 2.25 additions, 0.53 comparisons, and 0.1875 multiplications are required to code each image pixel; the coding complexity is linear with respect to the image size, and is nearly two orders of magnitude faster than standard VQ schemes with similar decoding complexity. Thus, VPIC is feasible for real-time applications without requiring special-purpose hardware. I. INTRODUCTION HE generic goal of digital image coding is to remove as much T redundant and/or insignificant information from images as pos- sible (compression), while maintaining the observed integrity of the images obtained following transmision and decoding. The goal of compression can easily be stated in engineering terms: minimize the number of bits per pixel (BPP) needed to represent the transmitted images [1]-[3]. However, the goal of preserving image integrity is not easily stated in quantitative terms since, ultimately, the only meaningful test of an image’s fidelity is how it “looks”-presuma- bly to a human observer. The concept of the visual appeal of an image has not yet been adequately translated into mathematical terms. Previous approaches have utilized error norms such as the mean-squared error (MSE) that do not correlate well with percep- tual notions of visual fidelity: standard measures of signal “informa- tion” do not coincide with visually significant image attributes. Thus, measurement of the adequacy of coded images from a visual perspective has remained difficult, particularly at high compression ratios where much information is discarded. This limitation of standard “information-theoretic” measures of image fidelity has resulted in a coding saturation level: the current high compression ratios on the order of 1O:l within which good image quality is retained are only exceeded at a high computational cost. This suggests that the development of image coding techniques that achieve significantly higher compression. ratios must better take into account the properties of the intended receiver. Simply, image coding must become a discipline not only of exploiting redundancy Paper approved by the Editor for Image Communications Systems of the IEEE Communications Society. Manuscript received March 29, 1989; re- vised September 1, 1989. The authors are with the Department of Electrical and Computer Engi- neering, University of Texas at Austin, Austin, TX 78712-1084. IEEE Log Number 9039091. and mathematical information measures, but of visual science-un- derstanding exactly what image information is made use of in perception. Image coding methods that attempt to capitalize on recent compu- tational models of visual information processing, rather than using classical information-theoretic approaches, have been termed sec- ond generation [3] - [5]. Second-generation image coding algo- rithms have included hierarchical algorithms mimicking the multiple channel characteristic of visual receptive fields [6] ; methods utiliz- ing presegmentation of, or contour detection in, the images to be coded [7]; combined multiple channel/edge coding algorithms [3], [8]; and algorithms that utilize other relevant data such as the intensity and contrast sensitivity to light of the eye [4], [5]. Second- generation image coding techniques have begun to approach or exceed the compression performance of the widely used vector quantization (VQ) [l], [2], [9], [lo] and block truncation coding (BTC) [11]-[13] techniques; however, high compression is often obtained at increased computational expense. The shortcomings of second-generation image coding algorithms derive both from the immaturity of the field and from a shortage of accurate models of biological visual processing. These shortcomings may be amelio- rated both by better models of early visual information processing [14]-[17], and by making better use of existing models, which is a main point of the current paper. The development of measurements quantifying notions of visual fidelity previously thought to be purely subjective would allow visually unimportant information to be dis- carded in a predictable manner, as well as information that is redundant in the information-theoretic sense. In this paper, a new framework for image coding, termed visual pattern image coding (VPIC), is introduced. VPIC operates by coding the images to be transmitted using a small set of visual patterns which are localized subimages containing visually impor- tant information. While the patterns used may vary with the applica- tion, the ones defined here correlate well with psychophysically derived visual response characteristics, and are image independent aside from the assumption that the images to be coded are ordinary optical images. The visual patterns used produce excellent image quality in accordance with perception, high image compression ratios comparable to or exceeding those offered by, e.g., VQ and BTC coding, and vastly reduced coding/decoding complexity: the simplest implementation requires only an average of 2.25 additions, 0.53 comparisons, and 0.1875 multiplications to code each image pixel; decoding is even simpler. The coding complexity is com- pletely linear with respect to the image size (nearly two orders of magnitude faster than standard VQ schemes for 512 x 512 images), implying that VPIC can be easily adapted for real-time applications. The remainder of the paper is organized as follows. Section I1 develops the VPIC image coding technique by first making some observations about the relatively successful VQ and BTC image coding strategies. The properties and assumptions underlying these techniques are reexamined in light of various psychophysical and physiological models of biological image sensing and processing. These observations lead to the design of the visual patterns used in the current implementations of VPIC, in Section 111. Section III also clarifies the overall coding/decoding processes in VPIC and pro- vides a complexity analysis. The complexity of VPIC is found to be comparable to such simple techniques as DPCM, despite the fact that VPIC performs at or above VQ in terms of compression ratio and image quality. Section IV illustrates the efficacy of VPIC using 0090-6778/90/1100-2137$01 .OO 0 1990 IEEE

Upload: prangya-bisoyi

Post on 22-Dec-2015

14 views

Category:

Documents


0 download

DESCRIPTION

journal paper

TRANSCRIPT

Page 1: intre-prediction compression

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38. NO. 12, DECEMBER 1990 2137

Visual Pattern Image Coding

Abstract-This paper presents a new framework for digital image compression called visual pattern image coding, or VPIC. In VPIC, a set of visual patterns is defined independent of the images to be coded. Each visual pattern is a subimage of limited spatial support that is visually meaningful to a normal human observer. The patterns are used as a basis for efficient image representation; since it is assumed that the images to be coded are natural optical images to be viewed by human observers, visual pattern design is developed using relevant psychophysi- cal and physiological data.

Although VPIC bears certain resemblances to block truncation (BTC) and vector quantization (VQ) image coding, there are important differ- ences. First, there is no training phase required: the visual patterns derive from models of perceptual mechanisms; second, the assignment of patterns to image regions is not based on a standard (norm) error criterion; expensive search operations are eliminated. VPIC offers: i) excellent image quality agreeing with perception; ii) compression ratios comparable to or exceeding those offered by VQ; and iii) vastly reduced coding complexity. In the simplest (static, nonhierarchical) implementa- tion, only 2.25 additions, 0.53 comparisons, and 0.1875 multiplications are required to code each image pixel; the coding complexity is linear with respect to the image size, and is nearly two orders of magnitude faster than standard VQ schemes with similar decoding complexity. Thus, VPIC is feasible for real-time applications without requiring special-purpose hardware.

I. INTRODUCTION

HE generic goal of digital image coding is to remove as much T redundant and/or insignificant information from images as pos- sible (compression), while maintaining the observed integrity of the images obtained following transmision and decoding. The goal of compression can easily be stated in engineering terms: minimize the number of bits per pixel (BPP) needed to represent the transmitted images [1]-[3]. However, the goal of preserving image integrity is not easily stated in quantitative terms since, ultimately, the only meaningful test of an image’s fidelity is how it “looks”-presuma- bly to a human observer. The concept of the visual appeal of an image has not yet been adequately translated into mathematical terms. Previous approaches have utilized error norms such as the mean-squared error (MSE) that do not correlate well with percep- tual notions of visual fidelity: standard measures of signal “informa- tion” do not coincide with visually significant image attributes. Thus, measurement of the adequacy of coded images from a visual perspective has remained difficult, particularly at high compression ratios where much information is discarded.

This limitation of standard “information-theoretic” measures of image fidelity has resulted in a coding saturation level: the current high compression ratios on the order of 1O:l within which good image quality is retained are only exceeded at a high computational cost. This suggests that the development of image coding techniques that achieve significantly higher compression. ratios must better take into account the properties of the intended receiver. Simply, image coding must become a discipline not only of exploiting redundancy

Paper approved by the Editor for Image Communications Systems of the IEEE Communications Society. Manuscript received March 29, 1989; re- vised September 1, 1989.

The authors are with the Department of Electrical and Computer Engi- neering, University of Texas at Austin, Austin, TX 78712-1084.

IEEE Log Number 9039091.

and mathematical information measures, but of visual science-un- derstanding exactly what image information is made use of in perception.

Image coding methods that attempt to capitalize on recent compu- tational models of visual information processing, rather than using classical information-theoretic approaches, have been termed sec- ond generation [3] - [5] . Second-generation image coding algo- rithms have included hierarchical algorithms mimicking the multiple channel characteristic of visual receptive fields [6] ; methods utiliz- ing presegmentation of, or contour detection in, the images to be coded [7]; combined multiple channel/edge coding algorithms [3], [8]; and algorithms that utilize other relevant data such as the intensity and contrast sensitivity to light of the eye [4], [5]. Second- generation image coding techniques have begun to approach or exceed the compression performance of the widely used vector quantization (VQ) [l], [2], [9], [lo] and block truncation coding (BTC) [11]-[13] techniques; however, high compression is often obtained at increased computational expense. The shortcomings of second-generation image coding algorithms derive both from the immaturity of the field and from a shortage of accurate models of biological visual processing. These shortcomings may be amelio- rated both by better models of early visual information processing [14]-[17], and by making better use of existing models, which is a main point of the current paper. The development of measurements quantifying notions of visual fidelity previously thought to be purely subjective would allow visually unimportant information to be dis- carded in a predictable manner, as well as information that is redundant in the information-theoretic sense.

In this paper, a new framework for image coding, termed visual pattern image coding (VPIC), is introduced. VPIC operates by coding the images to be transmitted using a small set of visual patterns which are localized subimages containing visually impor- tant information. While the patterns used may vary with the applica- tion, the ones defined here correlate well with psychophysically derived visual response characteristics, and are image independent aside from the assumption that the images to be coded are ordinary optical images. The visual patterns used produce excellent image quality in accordance with perception, high image compression ratios comparable to or exceeding those offered by, e.g., VQ and BTC coding, and vastly reduced coding/decoding complexity: the simplest implementation requires only an average of 2.25 additions, 0.53 comparisons, and 0.1875 multiplications to code each image pixel; decoding is even simpler. The coding complexity is com- pletely linear with respect to the image size (nearly two orders of magnitude faster than standard VQ schemes for 512 x 512 images), implying that VPIC can be easily adapted for real-time applications.

The remainder of the paper is organized as follows. Section I1 develops the VPIC image coding technique by first making some observations about the relatively successful VQ and BTC image coding strategies. The properties and assumptions underlying these techniques are reexamined in light of various psychophysical and physiological models of biological image sensing and processing. These observations lead to the design of the visual patterns used in the current implementations of VPIC, in Section 111. Section III also clarifies the overall coding/decoding processes in VPIC and pro- vides a complexity analysis. The complexity of VPIC is found to be comparable to such simple techniques as DPCM, despite the fact that VPIC performs at or above VQ in terms of compression ratio and image quality. Section IV illustrates the efficacy of VPIC using

0090-6778/90/1100-2137$01 .OO 0 1990 IEEE

Page 2: intre-prediction compression

2138 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38, NO. 12, DECEMBER 1990

a variety of natural images. The paper concludes in Section V with a discussion of further avenues of research offered by VPIC.

11. MOTIVATING VPIC Block truncation coding (BTC) and vector quantization (VQ) are

relatively recent techniques for image coding that have aroused considerable interest in applications where high compression is a requirement [ 1 11 -[ 131. We begin by considering certain aspects of these techniques that we believe have contributed significantly to their success. By examining these properties in light of various aspects of biological perception, the framework of VPIC is devel- oped. The VPIC coding technique represents much more than a refinement of BTC or VQ, however; rather, an entire conceptual framework is developed that directly addresses the drawbacks of these techniques while simultaneously exploiting their power.

A . Concepts from BTC and VQ Image Coding

BTC and VQ are similar in that the image to be coded is first partitioned into a large number of smaller subimages or blocks. Each block is then coded either by extracting certain intrinsically high-information quantities which are transmitted in reduced form (as in BTC), or by making a comparison between the block and a set of preordained vectors or “typical” subimages, the index of the “closest” vector being transmitted instead of the original block (as in VQ).

The type of information extracted from the block for transmission in BTC greatly affects the performance of the technique, not only in terms of compression, but in the coding complexity and in the quality of the transmitted and recovered images. In BTC, if pi, and ui, are the sample mean and standard deviation of a block centered near image coordinates ( i , j ) , then the quantities

preserve the first two sample moments, where p i and qi, are the number of pixels in the block falling below and above pi, J,, respec- tively. By transmitting pi, j , ai, j , and a bit map containing the distributions of p i , and q,, J , images can be coded with a high visual quality and relatively efficient compression rates in the range 5.1 to 6.1 or better [11]-[13]. The basic idea underlying BTC image coding is very similar to older established techniques for signal coding such as the “synthetic high” technique [3], 1181 and various transform coding techniques [19]: the human eye assigns different emphases to the low- and high-frequency components of images. In transform coding techniques, coding is typically accomplished by variably weighting low- and high-frequency trans- form coefficients. In BTC, the low- and high-frequency information is essentially transmitted as block means and block variances. The good results attained by BTC suggest that these quantities contain information that is to some degree visually significant. However, higher compression rates are achieved only at considerable increase in complexity; the technique is effective, for example, only if very small image blocks are coded (typically 4 X 4 blocks are required). The view expressed here is that: while the division of the image blocks into low- and high-frequency components is fundamen- tal, the features representing these components should be care- fully configured to agree with perception. Low-frequency infor- mation is coded by VPIC in essentially the same manner as in BTC: as average block intensities. However, the transmission of informa- tion describing image detail is significantly different.

Vector quantization coding schemes [9 ] , [ 101, [20], [ 2 11 require the construction of a codebook containing a set of image blocks (vectors) deemed in some manner to be “typical” of the images occurring in the coding application. The codebook is computed by finding a specified number of vectors that best represent the blocks contained in a typical image or set of images. In the simplest scheme, each block in the image is coded by transmitting the index of the closest (squared error) vector to it in the codebook. Thus, the

transmitted vector replaces the original block in the decoded image. The performance of VQ is quite good in applications requiring high image compression. Ratios as high as 1O:l to 15:l are often obtained while still retaining good image fidelity. Also, decoding is very efficient since it only involves a simple lookup in the codebook (assumed available at the receiver). As with BTC, the success of VQ image coding also clarifies an important concept: the human eye can reconstruct a large number of images from only a small number of selected image blocks. Clearly, the vectors in the codebook greatly affect the visual quality of the coded images; moreover, the number and method of generation of the codebook vectors determines the time involved in constructing the codebook, the actual coding complexity, and perhaps most significantly, the generality of the overall coder in terms of its effectiveness across a wide range of images. Currently, these issues define the drawbacks of VQ: codebook construction is very time consuming, despite the development of faster algorithms [22], [23], and multivalued vector comparisons are always costly. The largest drawback, which also effects the computation, is that the codebook is either image depen- dent or at least domain dependent (computed from a typical set of images). This has been demonstrated by the appearance of algo- rithms that, e.g., “classify” the quantization process by identifying smooth and rough (high detail) image regions [24]. The view expressed here is that: while the representation of an image by a small number of high information vectors is effective, the vec- tors representing the images to be coded must be perceptually important, rather than dependent on any given set of images.

It should also be mentioned that the vectors used should be visually suficient in that any visually meaningful image can be so represented. In view of the discussion of BTC coding, the vectors should either separately represent high- and low-frequency (rough and smooth) blocks, or should separately code these aspects of blocks. In the following, we outline the underlying framework of VPIC, which in a sense extends the concepts of both BTC and VQ. However, much higher compression than BTC is obtained by better representing image detail in accordance with perception. The com- pression obtained is comparable to or exceeds that of VQ, while the coding complexity is vastly reduced.’ While there is a codebook, it is defined universally in the sense that it develops from measured properties of biological vision, and does not vary over images that are to be viewed by normal human observers. Thus, there is no need for a codebook construction or training phase. Comparisons of image blocks with the codebook does not involve the computation of the usual mathematical norms such as the squared error, which have a poor correlation with visual fidelity. Instead, a simple measure of orientation is made involving only a simple counting process. The complexity of the VPIC algorithm described here is faster than VQ by an amount approaching two orders of magnitude (even with codebook construction aside).

B. Properties of Biological Vision We maintain that there are certain image attributes that appear to

be visually important: localized image vectors or patterns, and their low- and high-frequency components. The types of patterns to use, and how to effect an adequate decomposition into “smooth” and “rough,” is a focus of this paper. The criteria used in the design of VPIC develop not from the MSE or any other typical norm, but from measured properties of biological vision. Although our under- standing of the functional architecture of biological visual process- ing is still uncertain, there are certain “low-level’’ aspects of vision that can be regarded as accurately modeled. Thus, we next consider certain known psychophysical and physiological properties of the human visual system that prove useful in the design of a coding algorithm. In particular, the concepts of image smoothness and detail are recast as localized visual continuity and visual disconfi-

‘Only the simplest possible version of VPIC is described here; no modifications directed toward decrementing the compression ratio are made. For example, variable block sizes [25] are a natural option under investiga- tion.

Page 3: intre-prediction compression

CHEN AND BOVM: VISUAL PATTERN IMAGE CODING

nuity constraints, that are developed from properties of the recep- tive field profiles of cortical neurons in higher mammalian visual systems. We also use the known spatial frequency tuning curves of the receptive fields to develop the spatial frequency constraint, which is integrated with a model of a typical image viewing geometry to enable the design of a set of meaningful visual patterns that constitute the basis of VPIC. Specifically, we are interested in finding a set of image blocks that adequately represent visually meaningful images projected from real-world surfaces.

I ) Receptive Fields: The photoreceptive cells in the retina (rods and cones) transduce light into the electrical firing of neurons. Of central interest is the manner in which the distribution of luminances sensed by the photoreceptors is processed on a localized basis by subsequent neural elements. Although in most cases only rough models exist for this processing, in the design of VPIC we will only require gross fundamental descriptions of neuronal functioning rather than detailed quantitative models. The information from the retinal photoreceptors is relayed to the ganglion cells, which transform the visual data and in turn transmit it (eventually) to certain cells in the visual cortex. Unlike the photoreceptive cells, which only effect pointwise transformations of the data, the postretinal ganglion cells and the cortical cells have receptive fields characterized by their responses (firing rates) to the spatial structure of the visual stimuli. Within a reasonable range of luminances, the receptive fields can be accurately modeled as spatially linear, with characteristic tuning curves (spatial frequency responses) and associated spatial domain responses. In this work, we do not require a precise model of the “shape” of the receptive fields; instead, we make use only of the type of spatial stimuli that the cells are highly sensitive to, and the range of frequencies (bandwidths) to which the cells respond. While the exact shape of the receptive field profiles may eventually prove useful for refining the coding approach described here, precise quantitative models of the cortical (simple) cell receptive fields are amply available elsewhere [26], e.g., the Gabor receptive field model for the cortical cells [ 141 - [ 161, and the difference-of-Gaus- sian model of the ganglion cells [27]. However, there is still no definite agreement on any particular model [28]; moreover, the specific purposes of the receptive fields has not been definitely established.

It is suspected that the postretinal ganglion cells are primarily involved in the separation of low-frequency components from the stimuli on a spatially localized basis (obviously, both components are subsequently transmitted), while the cortical cells are involved with the coding, processing, and representation, at varying degrees of abstraction, of certain simple visual patterns such as edges. The cortical cells are classified as simple cells, complex cells, and hypercomplex cells. The cortical simple cells have been found to be particularly sensitive to localized edge- and bar-like structures in the spatial distribution of luminances and, especially, to the orientations and positions of these features with respect to the visual field. The complex cells are also sensitive to edge- and bar-like structures, but are less sensitive to their positions; thus, they may play a role in establishing a viewer-independent representation of shape, which represents a slightly higher level of abstraction. The hypercomplex cells appear to be responsive to more complex structures and are probably involved in yet higher level shape representation pro- cesses.

2) Visual Continuity and Discontinuity Constraints: The de- composition of images into low-frequency blocks and blocks con- taining visual patterns (edges and bars) suggests the visual continu- ity and visual discontinuity constraints, which state that an image can be adequately represented by localized (block) descriptions of smooth (continuous) regions and of image discontinuities. This is consistent with well-established properties of the retinal photorecep- tors, the responses (firing rates) of which effect a logarithmic compression of the dynamic range of luminances over a fairly wide luminance range. Weber’s law then states that, over a visual field of average luminance L, the just-noticeable luminance change AL is proportional to L:

A L = CL

2139

where the constant c lies in the range 0.01-0.1. In practical terms, if a localized image region is visually continuous, then it must project from a surface region of smooth geometric and illumination characteristics. If an image block is contained within a visually continuous image region, then it should be coded as a separate type of block, much as the ganglion cells transmit low-frequency infor- mation separately. If a local image block is visually discontinuous, i.e., if there is sufficient variation in the block, and if there is a strong orientation associated with it, then it should be coded by associating it with one of a set of oriented visual patterns. The patterns used should correspond with patterns that are visually important, i.e., that stimulate the edge- or bar-type simple cells. It would be possible to attempt to optimize the set of patterns used, e.g., by “matching” them to more detailed models of the receptive fields, e.g., Gabor-shaped patterns [ 141- [ 161. However, for sim- plicity, we choose to instead model the visually obvious image features that stimulate the receptive fields: simple oriented patterns modeled as edges or pairs of edges.

3) Spatial Frequency Constraint: Visual images are strongly characterized by representations consisting of separate descriptions of visually continuous and visually discontinuous regions [29], which can be loosely described as “smooth” image patches, edges, and possibly pairs of edges (bars). The question arises, then, as to why the various receptive fields are not “matched” to edges of various orientations, i.e., have receptive fields in the shape of edges. The answer lies in the need for efficient processing that can be interpreted as a need for localization. The receptive field profiles must be spatially localized in order that coding can occur on a localized basis. Perhaps more significantly, the spatial frequency responses (tuning curves) of the receptive fields must also be localized, or bandpass; this allows for the division of tasks over a range of scales, and serves to stabilize certain visual computations. The need for simultaneous spatio-spectral localization in the cortical receptive field responses can be formulated as an uncertainty princi- ple [ 141 - [ 161 constraining the degree of simultaneous localization that can be obtained and which determines the optimally localized solution. Discussions of the uncertainty framework in early visual processing are available in [14]-[16]. Similar arguments could also be applied in the design of the coding primitives or patterns; however, code vectors of considerable spatial extent (as required by spectral localization) entail vastly increased computation. Thus, we only consider spatially finite patterns of set size.

Visual information is processed over specific frequency ranges, normally measured in cycles per degree of viewing angle (c/deg). The overall passband of the receptive fields of the normal human eye is in the range 1-10 c/deg, with maximum response in the range 3-6 c/deg; outside that range, the response falls off quickly [26], [30]. Spatial frequency is also related to visual response time, or latency; higher spatial frequencies require a longer response time. Thus, image frequencies exceeding 10 c/deg contribute very little to perception, due to the lower frequency response and greatly increased latency. Within the global passband, the receptive field profiles are divided into distinct channels with bandwidths on the order of one octave, with approximately one octave separating the peak responses. Since we do not (yet) consider the extension of VPIC to multiscale or variable-block size coding, the design of dynamic visual patterns reflecting the various distinct visual pass- bands is not treated. Instead, we develop a constraint on the pattern size and on the number of allowable patterns by using the overall passband characteristic of the receptive fields in conjunction with a reasonable model of a viewing geometry.

4) Viewing Geometry: A key element in the design of an appropriate set of visual patterns is the integration of relevant measurable attributes of visual perception. Thus far, we have argued that image coding may proceed via the separate coding of small image blocks that are separately processed as either low frequency (visually continuous) or as visually discontinuous. Visually continu- ous blocks will be represented as uniform regions, whereas visually discontinuous blocks are coded as localized patterns interpreted as image edges. The coding of visually discontinuous blocks should

Page 4: intre-prediction compression

2140 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38, NO. 12, DECEMBER 1990

Viewpoint

Fig. 1. Viewing geometry used in the design of visual patterns.

also be consistent with the spatial frequency constraints discussed above. For the purpose of coding images to be viewed by a human observer, these constraints combined with a simple viewing geome- try can be used to constrain the number and sizes of the visual patterns required.

Since each visual pattern represents an image block, the number of possible patterns increases exponentially with the pattern size(s). Therefore, it is easier to confine the selection by using a relatively small pattern size. Unfortunately, the use of smaller patterns gener- ally reduces the attainable compression ratio. The strategy adopted in the current implementation of VPIC is that the pattern size is chosen such that an image block of that size can contain at most one visible edge. This constraint can be supported as follows. Assume that the images to be coded are N x N (pixels) and that the block or pattern dimensions are M x M (pixels). If the images are displayed on an L X L (cm') monitor and if D is the viewing distance in cm (Fig. I ) , then if D L, the viewing angle Q

subtending an image block is approximately

LM CY = 180. tan-' (=)

in degrees. For a 12-in monitor, L = 21.5 cm. Thus, for typical values N = 512, M = 4, and D = 200 cm, the viewing angle is (Y = 0.15". Thus, the assumed visual passband 1-10 c/deg corre- sponds to a band 0.15- 1.5 cycles within a 0.15' viewing angle; the maximizing range between 3 -6 c/deg approximately corresponds to a single cycle within the 0.15" view angle. One cycle implies that, at most, a single sustained intensity change (light to dark or vice versa) can occur along any direction within the image block. Similarly, it can be shown that occurrences of two or more edges correspond to a localized spatial frequency of 13.3 c/deg, which is not perceivable within the viewing geometry used. Thus, if 4 x 4 image blocks are coded as visual patterns containing single edges, then the block contains the maximum perceivable frequency content that can be coded as edges. Although this example assumes 4 x 4 blocks, other block sizes are feasible provided that they satisfy the necessary conditions imposed by the viewing geometry. Presently, 4 x 4 blocks are used, which serves to demonstrate the power of the coding algorithm, and which also simplifies the pattern design. It should be observed, however, that the range of frequencies pro- cessed biologically is divided into several different channels; this suggests that multiscale modifications of VPIC can be easily de- signed that define a set of spatial frequency constraints derived from the various passbands, and thus operate over a hierarchy of resolu- tions. At each scale of processing, the size of the image blocks or visual patterns used can be varied (in effect) without changing the number of patterns, by varying the resolution of the coded (sub)image. Since the visual channels are separated by octave

increments, it is easy to envision a hiearchical (pyramid) VPIC coding algorithm that efficiently utilizes 4 x 4 blocks and patterns, 8 x 8 blocks and patterns, etc.

m. DESIGN OF VISUAL PATTERNS The goal in designing a VPIC image coding algorithm is to find a

set of meaningful visual patterns satisfying the visual continuity and discontinuity requirements and the spatial frequency constraint de- scribed in the previous section. Since the visual patterns are used to represent natural images as perceived by a normal human observer, the selected visual patterns must also be visually significant and give a quality representation of the original image. We have previously argued that this can be accomplished using small patterns that code the image edges to which the cortical neurons are sensitive. Of course, the patterns should not be repetitive; each pattern should code a distinct visual attribute. This helps to limit the number of patterns used and increases the computability of the coding process. Moreover, the visual patterns should also be easily detectable; while complex visual (edge) patterns can be defined, they may prove unnecessarily difficult to detect using a computer program. Finally, an efficient mapping scheme must be provided that maps image blocks into the selected visual patterns. In other words, coding should be a simple process.

A. Pattern Design

the union of disjoint 4 x 4 image blocks of the form We denote the image to be coded as the array Z = { Zj , j } that is

b i , j = [Z,,,,: 4 i 5 n 5 4 i + 3 , 4 ; 5 m 5 4; + 31.

Prior to assigning patterns to image blocks, the mean intensities { p i , j } of the image blocks are coded separately; thus, the visual patterns have zero mean intensities. This allows for the design of visual patterns independent of the average intensity differences that occur between image blocks, hence reducing the number of required patterns. Based on the principles of visual continuity and visual discontinuity, two classes of visual patterns are used: uniform patterns and edge patterns. Uniform patterns are defined simply as constant valued, i.e., any variation within the block is presumed to fall below the Weber threshold. Blocks having sufficient intensity variation are coded as edge patterns, i.e., resolvable image details corresponding to localized intensity changes. This division is equiv- alent to identifying image blocks as either low frequency or as containing a pattern whose variation is visually significant.

I ) Edge Pattern Design: There are an infinite number of possi- ble physical situations which can give rise to an image block that is visually discontinuous. Although the image is quantized, the pattern size is finite, and the frequency content of the (edge) patterns is constrained to allow for occurrences of single edges only, there are many possible edge patterns that can be defined encompassing the range of orientations, contrast levels, etc. Thus, additional assump- tions must be made in the design of the edge patterns.

A natural measure of the variation, and specifically the edge content in the image I, is the discrete gradient VI = ( A , I , A y I ) , where the directional derivative approximations A x Z = { A , bi, j }

and A y Z = { A y b,, j } are computed as oriented differences be- tween weighted averages of contiguous image neighborhoods lying within each image block [31]. Reasonable and computationally convenient definitions of the directional variations within each block are

A x b j , j = A V E { I n , , : 4 i + 2 ~ n ~ 4 i + 3 , 4 j ~ r n ~ 4 j + 3 }

- AVE { Z,,, ,: 4 i 5 n 5 4i + 1 , 4 J 5 m 5 4; + 3 )

A y b j , j = AVE{Z,,,,: 4 i I n 5 4 i + 3 , 4 j + 2 I rn 5 4 j + 3)

- AVE {I,,,,: 4 i 5 n 5 4 i + 3 , 4 j 5 m 5 4 j + I } .

The gradient magnitude and gradient orientation within each image

Page 5: intre-prediction compression

CHEN AND BOVM: VISUAL PATTERN IMAGE CODING 2141

PI p2 p3

p4 p5 p6 p7

Pa P9 PI0

p11 Pl2 PI3 p14

Fig. 2. Fourteen visual patterns. (a) 90" patterns; (b) 45' patterns; (c) 0' patterns; (d) -45' patterns.

block are given, respectively, by

and

within each block, the computed block gradient values I V b 1 i , and L Vbi , correspond to the contrast and orientation of the intensity change occurring within the image block bi , j .2 These quantities have continuous ranges that must be quantized. Since small edge blocks are assumed, L Abi, is quantized in 45" increments, yield- ing four basic edge patterns with respective orientations %", 45', O", and -45". The remaining orientations 135", 180", 225", and 270" of opposite contrast are represented by separately transmitting the polarity of the edge. Fig. 2 shows one possible set of 14 basic edge patterns pi defined over the above Orientations within 4 x 4 image blocks; the set of patterns shown allows for the image block to be centered at a variety of distances from the edge at any orientation. In the figure, the positive (+) elements of each pattern take identical values, as do the negative ( - ) elements. These values are uniquely chosen such that each pattern has zero mean and unit gradient magnitude I V p j I = 1. The relative distribution of (+) and (-) elements in a pattern is called the polarity distribution of the pattern.

It should be observed that the basic visual patterns shown in Fig. 2 are not necessarily all used to code images; usually, some subset of them will suffice. It is also possible that different patterns with slightly different orientations could be defined. However, a surpris- ingly small number of the patterns in Fig. 2 can code images with remarkable quality, as demonstrated later. Each pattern p j defines a subspace W ( p j ) representing occurrences of edges of the same orientation and polarity distribution (generated by the pattern pi as discussed below); to eliminate redundancy, the subspaces generated by the edge patterns are disjoint:

W ( p i ) fl W ( p k ) = 4 wheneverj # k

since they either have different orientations or different polarity distributions. Each subspace has a coordinate representing a visual

'mere is no need to compute. the square root or tan-' , since 0n1y an index representing these values is transmitted.

c 7

IVbli.j

Fig. 3. Distribution of gradient magnitude of image "Lena" [Fig. 4(a)].

pattern used in the VPIC coding scheme. For instance, I V b I ,,, = 30 in the 45" edge subspace represents a visual edge pattern characterizing local occurrences of edges oriented close to 45" and with a gradient magnitude of 30. It should be observed that we make no claim that any image can be mathematically (exactly) recon- structed from any subset of the edge patterns in Fig. 2, i.e., that the patterns form a basis spanning the image space as do the edge detection templates used in [32]. This does not matter, however. The point is: the objective is to define a pattern set that allows a perceptual reconstruction of the image.

2) Information in the Edge Gradients: The use of the patterns in Fig. 2 (or some subset of them, or some other similar set of patterns) allows for the effective coding of edge locations and orientations, as detailed in Section III-B. It remains for us to consider the amplitudes of the edges, i.e., coding of the computed gradient magnitudes. Fig. 3 depicts the histogram of the gradient magnitude of the image "Lena" [Fig. 4(a)]. Histograms having this shape (approximately a Rayleigh distribution) are typical of natural images-image blocks having a zero or very low gradient magni- tude are unusual, and the frequency of gradient magnitudes de- creases quickly beyond some high value [33]. In VPIC, two thresh- olds are set that constrain the range of image gradients to be coded. The low threshold I V I I rmn represents the minimum perceivable gradient magnitude that can occur in an image, which must exist in accordance with Weber's law. Thus, I V I I rmn effectively deter- mines the number of edge blocks present in the image; the remain- ing blocks are coded as uniform patterns. There is considerable flexibility in the choice of 1 V I I ollo, although the value it takes may vary depending on the number of image gradient measurements transmitted (if any) as part of the coding process, it does not vary with the images coded. Moreover, the speed of the algorithm is independent of I V I I ,,,,,,.

The second threshold, I V I 1 -, denotes the maximum gradient magnitude that is coded distinctly; all gradient values above this are truncated to I V I I -. Selection of a reasonably high value of I VI I - is easily supported. First of all, large gradient values occur infrequently; second, the eye is not very sensitive to the magnitude of edge gradients; for example, the Mach band illusion and the simultaneous contrast effect provide evidence of this [26]. Of much greater visual significance is the simple presence of significant edge gradients, and of the average intensities near them. Thus, in the event that the gradients of an image block exceeds I VI 1 -, the visual distortion incurred by coding the maximum gradient value I VI1 - instead will not be significant. In the experiments in Section IV, an upper threshold I V I I max = 90 (if any) is used, which is very conservative.

Thus, the range of the gradient magnitudes that are actually coded is R = I V I I max - I V I I m,n. If R is quantized to M levels, then M possible gradient magnitudes I V I I ,; i = 0,. . , M - 1 can be coded. The index i of the edge gradient is transmitted separately after it has been determined by quantizing the squared gradient magnitude. Since only an index is transmitted, the possible edge

Page 6: intre-prediction compression

2142 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38, NO. 12, DECEMBER 1990

gradients are agreed upon at receiver and transmitter, i.e., the codebook contains indexes describing both the edge patterns and edge gradient magnitudes. The gradient values I VZ I , and the edge patterns pi allow the representation of local edge blocks that are reconstructed in the form 1 V I I , . p i , once the indexes i and j have been received. If P patterns are used (either a subset of those in Fig. 2 or some other set), then M . P different 4 x 4 edge block representations are possible (mean block intensity aside).

The number of gradient levels M , and the actual gradient values used, are far less critical than the number of edge orientations used. In fact, very reasonable results are obtained without transmitting any gradient magnitude information at all! Quantization of the edge orientations (into 45" increments in Fig. 2) is a more serious problem; if larger blocks are used, more orientations become desir- able. For 4 x 4 image blocks, the use of only four orientations produces very reasonable results.

B. Image Coding The criteria used in the design of the visual (edge) patterns

immediately suggest a very simple coding process. Obviously, the first step consists of determining into which subspace an image block b;, should be mapped. Once the mean block intensity F ; , is computed, the local gradient measurements ( I V b 1 ;, i)2 and tan ( L ~ b , , j ) are calculated. If ( I V b 1 j ) 2 5 ( I V I I then the block is determined to be a uniform block; otherwise, it is an edge block. In either case, the mean intensity pi , is quantized and transmitted through the communication channel using n,, + 1 bits, where the additional (flag) bit informs the decoder whether the pattern is uniform or an edge block. The value of n,, varies depending on whether the block is a uniform block (n , , = nPu) or an edge block (n , , = ripe); uniform blocks generally require a greater precision ( n P u > n g e ) .

If b;, is an edge block, i.e., if ( I V b 1 i, i ) 2 > ( 1 V I I then the measured orientation L V b , , maps the image block to some number of edge subspaces (less than four, if a subset of the patterns shown in Fig. 2 is used). A unique mapping among those at a given orientation is achieved by selecting the pattern whose polarity distribution is closest to that of the image block. This is easily accomplished by simply counting the number of (+) and ( - ) gradient values contained in the image block, and choosing the pattern having the best match of (+) and ( - ) elements. Since the (+) and ( - ) elements can be represented in binary form, pattern selection is equivalent to binary correlation, and hence is optimal. The index j of the selected pattern pi is transmitted as n p + 1 bits, where the additional bit indicates the edge polarity. If the block was previously determined to be a uniform block, these n p + 1 bits are used to augment coding of the mean block intensity. If the block is an edge block, then n is reduced. Finally, the (squared) gradient magnitude ( 1 V b 1 ;, i )2fs quantized into one of M levels with index i and transmitted as n g bits. Clearly, the greatest efficiency is attained if n g is a power of two (or zero!).

Thus, n u = npu + 1 bits are required to code a 4 x 4 uniform block, while ne = rifle + n p + ng + 2 bits are required to code a 4 x 4 edge block, where ne 2 nu. If uniform and edge blocks are coded with different bit lengths, then the overall range of compres- sion ratios CvpIc obtainable3 is

128 128 - : 15 CVPIC 5 - : 1 . "e " U

In an image coded with Nu uniform and Ne edge blocks, the exact compression ratio obtained is 128(Nu + Ne)/(N,,nu + Nene). The most efficient current implementation yielding coded images of good quality uses the values nu = 6 and ne = 7, or a compression range 18.3: 1 5 CVpIF: 5 21.3. The less stringent parameter values nu = 7 and n e = 12 yield compression ratios in the range 10.7:l 5 CvpIc

3Assuming 8-bit (original) images coded as 4 x 4 blocks

s 18.3. In both cases, the images obtained compare favorably to images coded using standard VQ algorithms having similar com- pression ratios. The computation required by VPIC, however, is very significantly less.

C. Coding Complexity Analysis

Perhaps the greatest advantage offered by VPIC is the efficiency of coding. Most existing algorithms suffer from high computational complexity, particularly those offering high compression ratios ( > 10: 1) . Any real-time coding application requiring such high com- pression ratios typically requires expensive dedicated hardware. Thus, VPIC provides an important alternative. Table I summarizes the number of operations required to code a 4 x 4 image block; it should be noted that these average values do not vary with the content of the image block-the computation required is a constant. Since the number M of gradient levels allowed is usually quite small, computation of the gradient index (integer comparisons) is ignored in computing the average (per pixel) complexity. Most of the computations involve only integer arithmetic, with the exception of tan ( L Vb;, j), which requires one floating-point multiplication (division), and the pattern index, requiring 3 floating-point compar- isons. The overall average complexity is just 2.25 addition opera- tions, 0.53 comparison operations, and 0.1875 multiplication opera- tions per image pixel (the overhead of loop counters required to compute the pattern index is not included). In the implementation yielding the compression range 18.3: 1-21.3 demonstrated in Sec- tion IV, a single gradient magnitude is used (none is transmitted), and only one pattern per orientation is used; thus, there is no need to compute the polarity distribution. The complexity of this algorithm is only 1.25 additions, 0.1875 comparisons, and 0.1875 multiplica- tions per image pixel. Thus, VPIC can be nearly as efficient as simple DPCM coding algorithms requiring a single addition per image pixel but only delivering a 1.5:l compression ratio. It is feasible to consider implementation of VPIC for real-time image coding using standard hardware. For example, a Motorola 68020 microprocessor running at 20 MHz executes about 6 million addi- tion/comparison instructions and 0.46 million integer multiplication instructions/s. Thus, the coding speed can approach 10 images/s for 512 X 512 monochrome images using the 18.3:l-21.3 scheme. Since image blocks can be coded independently, only a few micro- processors operating concurrently are required to achieve real-time coding. In all simulations run in comparison to VQ algorithms yielding similar compression ratios, the coding time of VPIC has been found to be on the order of two magnitudes faster than standard VQ.4

D. Decoding and Postprocessing Image decoding is trivial using VPIC, since the coded image

blocks are represented either as uniform blocks or as edge blocks coded as a visual pattern index and a gradient index. Edge block decoding only requires table lookups, a single multiplication, and a single addition. Thus, image decoding is no more complex than in VQ. The low overhead incurred in decoding implies that a certain amount of postprocessing (or image correction) may be feasible. At very high compression ratios, the predominant distortions that begin to occur in VPIC are block effects arising from coarse quantization of the average intensities in uniform (nonedge) blocks. Block effects are manifested as visual false contours occurring between blocks of similar average gray level, and are also a problem in high compres- sion VQ or BTC schemes. Fortunately, the natural division of the coded image into a mean intensity image and edge patterns in VPIC suggests that very simple block smoothing algorithms may be applied to the mean intensity subimage, without degrading the detail (edges) in the images. In Section IV it is found that a simple 3 x 1 moving average filter applied along each dimension of the mean intensity subimage (prior to adding to edge blocks) greatly decreases the visual distortion. The smoothing process does not degrade image

41mplemented using a fast K-dimensional search algorithm,

Page 7: intre-prediction compression

CHEN AND BOVIK: VISUAL PATTERN IMAGE CODING 2143

TABLE I CODING COMPLEXITY FOR A 4 x 4 IMAGE BLOCK

p i , j , ( 1 V b 1 i, j ) z , tan ( L Vbi , j ) Pattern index 3 comparisons Gradient index Polarity distribution

20 additions, 3 multiplications

5 log, M comparisons 16 additions, 16 comparisons (worst case)

Total Operations/pixel

36 additions, 19 comparisons, 3 multiplications 2.25 additions, 0.53 comparisons, 0.19 multiplications

TABLE II PARAMETERS USED IN VPIC SXMULATIONS

Number of bits/blocks

Block data Algorithm 1 Algorithm 2

Mean intensity (uniform block) n,, = 5 6 Mean intensity

Block type 1 1

(edge block) nre = 3 4 Pattern index n p = 2 3 Gradient index ng = 0 3 Edge polarity 1 1

detail, since the perceptually significant image detail is assumed encoded in the edge blocks.

IV. SIMULATIONS Simulation results are presented here. The experiments were

conducted on an IBM RT Model 6150 workstation using the C language. Each 512 x 512 image is divided into 4 x 4 blocks. In each case, the images were coded using two variations of VPIC. In the first variation (Algorithm l ) , a compression ratio C, in the range 18.3:l 5 C, 5 21.3:l is achieved for any image. In the second (Algorithm 2), a compression ratio C, in the range 10.7:l 5 C, 5 18.3:l is attained. Table 11 gives the coding parameters used in each algorithm.

= 30; this value is also the single edge gradient used in this algorithm when reconstructing edge blocks, i.e., they are all of the form 30 p j (hence, no upper threshold 1 V I I lllrlx is required in Algo- rithm l). The value I V I I min = 30 was not optimized; indeed, the quality of the coded images has been observed to be quite robust with respect to this value. The patterns used in Algorithm 1 are p , ,

In Algorithm 2, the gradient magnitude thresholds are I V I I = 10 and I VI I max = 90, both of which are conservative. Finally, the patterns used are p i , p 3 , p 5 , p67 P89 p l o . ~ 1 2 , and P13.

Three simulations were performed for both Algorithms 1 and 2, using the standard images shown in Figs. 4(a)-6(a). The actual compressions obtained, which vary with the relative densities of edge and uniform blocks, are listed with the coded images in the figures. Figs 4(b)-6(b) show the result of coding each image using Algorithm 1; the actual compression ratios fell in the narrow range 20.6: 1-21.1 : 1 . Each mean intensity subimage (composed of the mean block intensities {pi, j}) was postfiltered with a 3 x 1 average filter along each dimension (prior to adding the edge blocks), as discussed in Section III-D. The visual quality of the decoded images is remarkable, given the small number of pattern orientations al- lowed and the use of a single edge gradient magnitude. In particu- lar, observe the retention of small details such as the feathers in Fig. 4(a) and the whiskers in Fig. 4(b), which are adequately represented by localized, oriented edge patterns. Within the degree of resolution implied by the viewing diagram in Fig. 1, the coded representations are excellent.

Finally, Figs. 4(c)-6(c) show the result of coding the images using Algorithm 2 (with postfiltering); the compression ratios in this

In Algorithm 1 , the low gradient threshold is set at I V I I

P.5, P9, and Pi2 (from Fig. 2).

(c) Fig. 4 . Coding of 512 X 512 test image “Lena.” (a) Original image; (b)

VPIC coded image using Algorithm 1 (Nu = 14890, Ne = 1494; com- pression = 21.06:l); (c) VPIC coded image using Algorithm 2 (Nu = 12413, Ne = 3971; compression = 16.44:l).

Page 8: intre-prediction compression

2 14.4 IEEE TRANSACTIONS ON COMMUNICATIONS. VOL 18 NO 12 D t C t M B t K IYYO

( C )

Fig. 5 . Coding of 512 X 512 test image “Mandrill.” (a) Original image: (b) VPIC coded image using Algorithm 1 ( N u = 12496, N,, = 3888: compression = 20.61:l); (c) VPIC coded image using Algorithm 2 (N l , = 6931, N, = 9453: compression = 13.89:l).

case lie in the range 13.8:l-16.8:l. Again, the amount of retained image detail is remarkable, given that details are represented only by simple, localized edge patterns of just a few orientations. Al- though some improvement is seen as compared to Algorithm l , the differences are slight when the images are viewed from a short distance.

V. CONCLUDING REMARKS

This paper has presented the framework of a new image coding scheme called visual pattern image coding, or VPIC. VPIC is motivated by the need for high-compression image coding algo- rithms that can operate at high coding rates. Very efficient process- ing is obtained by selecting coding primitives that are consistent with the properties of biological vision.

(c)

Fig. 6. Coding of 512 x 512 test image “Peppers.” (a) Original image; (b) VPIC coded image using Algorithm 1 ( N u = 15063, N, = 1321; compression = 21.09:l); (c) VPIC coded image using Algorithm 2 (N, = 13061, N , = 3323: compression = 16.74:l).

Rather than further summarizing VPIC, we instead suggest some possible simple improvements in the approach, and further avenues of research suggested by VPIC.

First of all. it should be observed that the visual (edge) patterns used in the current VPIC implementation are by no means exclusive or unique. The idea of VPIC can be extended to incorporate other image patterns, e .g . , edges of different orientations, curves, line patterns. etc. The only important constraints on the patterns used are that they are both readily detected and contextually important. For coding optical (visual) images, the patterns used should be consis- tent with the relevant psychological and physiological properties of biological vision. The use of *‘visual patterns” or image primitives that closely model the cortical receptive field profiles, e .g . , the Cabor functions. are one such set of patterns [14]-[17]; however,

Page 9: intre-prediction compression

CHEN AND BOVIK: VISUAL PATTERN IMAGE CODING 2145

since the Gabor elementary functions can have very large nonnegli- gible support (extent), computation of the subimages become expen- sive. Neural computing may provide an efficient method of compu- tation in the future [16].

For simplicity, we have not discussed any of the many possible simple modifications (i.e., within the fixed-block-size, nondynamic framework) that could be made to improve the compression ratios obtained by the current implementation. For example, considerable gains could be obtained by coding the mean intensity subimages with a greater efficiency-a simple DPCM scheme could effect a significant improvement in the compression ratio. Similarly, since edge blocks can be expected to occur as connected contours with a high frequency, it is possible that contour coding techniques could be melded with VPIC as well. Such a hybrid scheme could effect a considerable increase in the obtainable compressions-perhaps ex- ceeding 30: 1 -with little loss of information. Of course, the compu- tational advantage offered by the simpler implementations of VPIC will be lost if care is not taken.

Yet another possibility is the use of variable block sizes, as in [25] . In the current fixed-block-size implementation, the maximum attainable compression ratio is fixed, as is the coding time. These are convenient properties when the communication channel is circuit switched. Recently, however, packet switching has become a popu- lar alternative for providing a dynamic transmission bandwidth. Consequently, coding algorithms that use dynamic compression rates can be effectively implemented with a packet-switched net- work. Since natural images usually contain numerous large (> 4 x 4) regions of approximately uniform intensity, larger block sizes can be used. Of course, to code image blocks of variable sizes, a dynamic data structure must be provided that allows the concurrent coding of smaller edge blocks and larger uniform blocks without sacrificing image quality. One such framework that allows for the predictable assignment of block sizes is hierarchical processing, or a pyramid data structure. In fact, our current research focus is directed toward extending the concept of VPIC using a pyramid framework, i.e., the basic patterns (uniform and edge blocks) are defined over a range of scales or block sizes. The results of this work will be detailed in later papers.

REFERENCES A. K. Jain, “Image data compression: A review,” Proc. IEEE, vol. 69, pp. 349-388, Mar. 1981. N. M. Nasrabadi and R. A. King, “Image coding using vector quantization: A review,” IEEE Trans. Commun., vol. 36, pp,

M. Kunt, A. Ikonomopoulos, and M. Kocher, “Second-generation image-coding techniques,” Proc. IEEE, vol. 73, pp. 549-574, Apr. 1985. S. A. Rajala, M. R. Civanlar, and W. M. Lee, “A second generation image coding technique using human visual system based segmentation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Dallas, TX, Apr. 6-9, 1987. R. A. Nobakht and S. A. Rajala, “An image coding technique using a human visual system model and image analysis criteria,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Dallas, TX, Apr. 6-9, 1987. P. J . Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. COM-31, pp. 532-540, Apr. 1983. M. Kocher and M. Kunt, “A contour-texture approach to picture coding,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cessing, Paris, France, May 3-5, 1982. A. Ikonomopoulos and M. Kunt, “Image coding based on a directional decomposition,” in Proc. Int. Picture Coding Symp., Davis, CA, Mar. 28-30, 1983. R. M. Gray, “Vector quantization,” IEEE ASSP Mag., vol. 1, pp. 4-29, Apr. 1984. A. Gersho, “On the structure of vector quantization,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 157-165, Mar. 1982. E. J. Delp and 0. R. Mitchell, “Image coding using block truncation coding,” IEEE Trans. Commun., vol. COM-27, pp. 329-336, Sept. 1979. D. J. Healy and 0. R. Mitchell, “Digital video bandwidth compression

957-971, Aug. 1988.

using block truncation coding,” IEEE Trans. Commun., vol. COM- 29, pp. 1809-1817, Dec. 1981.

[13] G. R. Arce and N. C. Gallagher, “BTC image coding using median filter roots,” IEEE Trans. Commun., vol. COM-31, pp. 784-793, June 1983.

[14] J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Amer., vol. 2, pp. 1160- 1169, 1985.

[I51 A. C. Bovik, M. Clark, and W. S. Geisler, “Multichannel texture analysis using localized spatial filters” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 55-73, Jan. 1990.

[16] J. G. Daugman, “Complete discrete 2-D Gabor transforms bv neural networks for image analisis and compression,” IEEE Trans. &oust. Speech, Signal Processing, vol. 36, pp. 1169- 1179, July 1988. M. Porat and Y. Y. Zeevi, “The generalized Gabor scheme of image representation in biological and machine vision,” IEEE Trans. Pat- tern Anal. Machine Intell., vol. 10, pp. 452-468, Sept. 1988. W. F. Schreiber, “Picture coding,” Proc. IEEE, vol. 55, pp. 320-330, Mar. 1967. W. K. Pratt, Digital Image Processing. A. Gersho and B. Ramamuthi, “Image coding using vector quantiza- tion,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process- i n g , Paris, France, May 3-5, 1982. J. P. Marscq and C. Labit, “Vector quantization in transformed image coding,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cessing, Tokyo, Japan, Apr. 9-11, 1986. Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Cornmun., vol. COM-28, pp. 84-95, Jan. 1980. W. Equitz, “Fast algorithms for vector quantization picture coding,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Dallas, TX, Apr. 6-9, 1987. B. Ramamurthi and A. Gersho, “Classified vector quantization of images,” IEEE Trans. Commun., vol. COM-34, pp. 1105-1115, Nov. 1986. D. J. Vaisey and A. Gersho, “Variable block-size image coding,” in Proc. IEEE Int. Cony. Acoust., Speech, Signal Processing, Dallas, TX, Apr. 6-9, 1987. M. D. Levine, Vision in Man and Machine. New York: McGraw- Hill, 1985. H. R. Wilson and J. R. Bergen, “A four mechanism model for threshold spatial vision,” Vis. Res., vol. 19, pp. 19-32, 1979. D. G. Stork and H. R. Wilson, “Analysis of Gabor function descrip- tions of visual receptive fields,” presented at the Assoc. Res. Vis. Ophthal. Annu. Spring Meet., Sarasota, FL, May 1988. D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, pp. 413-424, Apr. 1986. L. Kaufman, Perception. New York: Oxford University Press, 1979. B. K. P. Horn, Robot Vision. Cambridge, MA: M.I.T. Press, 1986. W. Frei and C. C. Chen, “Fast boundary detection: a generalization and a new algorithm,” IEEE Trans. Comput., vol. C-26, pp. 988-998, Oct. 1977. N. H. Kim and A. C. Bovik, “A contour-based stereo matching algorithm using disparity continuity,” Pattern Recogn., vol. 21, pp.

New York: Wiley, 1978.

505-514, 1988.

* Dapang Chen was born in Shanghai, People’s Republic of China, in June 1958. He received the B.S. degree in computer science from the Univer- sity of Science and Technology of China in 1982 and the M.S. degree in biomedical engineering from the University of Texas at Austin in 1985.

He is currently a part-time Design Engineer with National Instruments in Austin, TX. He is also working toward the Ph.D. degree in electrical en- gineering at the University of Texas at Austin. His primary research interests are image processing and computer vision.

Page 10: intre-prediction compression

2146 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38, NO. 12, DECEMBER 1990

Alan C. Bovik (S’80-M’84-SM’89) was born in Kirkwood, MO, on June 25, 1958. He received the B.S. degree in computer engineering in 1980, and the M.S. and Ph.D. degrees in electrical and com- puter engineering in 1982 and 1984, respectively, all from the University of Illinois, Urbana- Champaign.

He is currently Associate Professor in the De- partment of Electrical and Computer Engineering, the Department of Computer Sciences, and the Biomedical Engineering Program at the University

of Texas at Austin, where he is the Director of the Laboratory for Vision Systems. His current research interests include computer vision, nonlinear statistical methods in digital signal and image processing, biomedical image processing, three-dimensional microscopy, and computational aspects of biological visual perception.

Dr. Bovik is the Stark Centennial Endowed Fellow in Engineering at the University of Texas and a Registered Professional Engineer in the State of Texas. He is an Honorable Mention winner of the international Pattern Recognition Society Award, and is Associate Editor of the international journal Puttern Recognition and Associate Editor of the IEEE TRANSAC- TIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. He was Local Arrangements Chairman for the IEEE Computer Society Workshop on the Interpretation of 3-D Scenes in Austin, TX, November 1989, a member of the Program Committee on the Tenth International Conference on Pattern Recognition in Atlantic City, NJ, June 1990, Program Co-chairman of the SPIE Conference on Image Processing, SPIE/SPSE Symposium on Elec- tronic Imaging, Santa Clara, CA, February 1990, and Conference Chairman of the SPIE Conference on Biomedical Image Processing, Santa Clara, CA, February 1990.