arith coding

8/6/2019 Arith Coding

http://slidepdf.com/reader/full/arith-coding 1/9

1678 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 54, NO. 8, AUGUST 2007

An FPGA-Based Implementation of Multi-AlphabetArithmetic Coding

Sudipta Mahapatra , Member, IEEE , and Kuldeep Singh

Abstract—A fully parallel implementation of the multi-alphabetarithmetic-coding algorithm, an integral part of many losslessdata compression systems, had so far eluded the research com-munity. Although schemes were in existence for performing theencoding operation in parallel, the data dependencies involved inthe decoding phase prevented its parallel execution. This paperpresents a scheme for the parallel-pipelined implementation of both the phases of the arithmetic-coding algorithm for multi-symbol alphabets in high-speed programmable hardware. Thecompression performance of the proposed scheme has been eval-uated and compared with an existing sequential implementationin terms of average compression ratio as well as the estimated

execution time for the Canterbury Corpus test set of files. Theproposed scheme facilitates hardware realization of both coderand decoder modules by reducing the storage capacity necessaryfor maintaining the modeling information. The design has beensynthesized for Xilinx field-programmable gate arrays and thesynthesis results obtained are encouraging, paving the way forfurther research in this direction.

Index Terms—Arithmetic coding, decoding, higher order con-text models, lossless data compression, parallel architectures.

I. INTRODUCTION

T

HE FIELD OF data compression is gaining importance

day by day due to the fast development of data-intensiveapplications that place a heavy demand on information storageand also to meet the high data rate requirements of bandwidthhungry transmission systems such as multimedia systems. Outof the two categories of data compression, i.e., lossy and loss-less, lossy compression techniques achieve a high degree of compression, albeit at the expense of slight inaccuracy in the de-compressed data. In lossless compression, the original data is re-produced exactly on decompressing the compressed bit stream.It is applied in various fields including the storage and retrievalof database records, medical images, test data sets, code com-pression and so on where any loss of information in the com-pression-decompression cycle is unacceptable. Such applica-

tions have generated renewed interest in the design and develop-ment of lossless data compression techniques, especially thosesuitable for implementation in high-speed hardware.

Manuscript received June 17, 2005; revised March 21, 2006, and August 29,2006.This work was supported in part by the SERC Fast Track Scheme of theDepartment of Science and Technology, Govternment of India. This paper wasrecommended by Associate Editor Z. Wang.

S. Mahapatra is with the Department of Electronics & Electrical Communi-cation Engineering, Indian Institute of Technology, Kharagpur-721302, India(e-mail: [email protected]).

K. Singh was with the Department of Electronics & Electrical Communi-cation Engineering, Indian Institute of Technology, Kharagpur-721302, India.He is now with Philips Medical Systems, Bangalore-560045, India (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCSI.2007.902527

The basic approaches to lossless data compression can bebroadly classified into the following three categories.

1) Dictionary-based methods where each symbol or a groupof symbols is represented by a pointer to its position in adictionary that may be predefined or built up dynamically.Lempel–Ziv algorithms [1], [2], and their variants belongto this class.

2) Statistical techniques, which use a modeler to estimate thesymbol probabilities and use these probabilities to encodethe individual symbols. Huffman coding [1], [3], and arith-metic coding [4]–[6] belong to this class.

3) Grammar-based techniques, proposed more recently[7]–[9], which transform the original data sequence intoan irreducible context free or context-sensitive grammarand use this grammar to represent the data sequence.

Out of the above three techniques, dictionary and grammar-based techniques are expected to compress more than their sta-tistical counterparts. However, these techniques invariably usestatistical schemes to further compress the generated output.Thus, statistical techniques have an important role to play in thefield of lossless data compression.

Due to the simplicity of its implementation, Huffman codinghas been used in many hardware data compressors reported inthe literature [10]–[12]. However, this technique suffers fromtwo major problems, i.e., it needs an integral number of bits torepresent any symbol and performs poorly in adaptive data com-pression. On the other hand, arithmetic coding has the ability torepresent a symbol using even the fraction of a bit and is able tocompress close to the symbol entropy. Also, it allows adaptivecompression and incremental transmission, thereby permittingone to compress data on the fly.

For its working, arithmetic coding relies on the recursive up-dating of a code interval based on the probability of occurrenceof the input symbols and represents a stream of input symbolswith a single floating-point number. The amount of compressionone can get with arithmetic coding depends heavily on how ac-

curately one can estimate the symbol probabilities. Greater ac-curacy is possible if higher order context models are adopted[13], [14], [17]. However, using higher order models increasescomplexity of the implementation. So, hardware implementa-tion of arithmetic coding has been limited to simple binary al-phabets [15], which limit their execution speed. In [21], Boo et

al. have reported a VLSI architecture for arithmetic coding of multilevel images. Osorio and Bruguera [22] have extended theabove work to present VLSI architectures for both arithmeticcoding and decoding of multisymbol alphabets. However, theabove implementations use a 0-order context model with lim-ited precision, which would limit the amount of compression.

Parallel implementation has been envisaged as a means for

speeding up the arithmetic coding algorithm. In Jiang and Jones

1549-8328/$25.00 © 2007 IEEE



MAHAPATRA AND SINGH: FPGA-BASED IMPLEMENTATION 1679

[23], a parallel scheme is proposed for implementing arithmetic

coding where groups of eight symbols are processed at a time

while encoding any data sequence. However, this implementa-

tion relies on serial decoding. Both parallel encoding and de-

coding are illustrated by Jiang in [24]. But, this is meant for

binary alphabets only. More recently, Lee et al. [25] have re-

ported an ef ficient implementation of context-based arithmetic

coding for MPEG-4 shape coding. However, this implementa-

tion is targeted at a particular application and exploits the in-

herent features of the application in order to simplify the imple-

mentation and enhance its performance. It may not be applied

to any generalized application domain. A scheme for realizing

arithmetic coding and decoding both in parallel hardware was

outlined in [26]. Based on this scheme, a field-programmable

gate array (FPGA)-based modeling unit was realized and its per-

formance reported in [27]. This design used Order-0 modeling,

which limits its performance. Also, design of the coding unit

was not considered here.

An earlier version of the work presented in this paper, which

outlined the schemes for parallel implementation of arithmeticcoding for both Order-0 and Order-1 models, was presented in

Mahapatra and Singh [28]. In this paper, the authors have aug-

mented the work with the detail hardware design of both the

modeling and coding units. The performance of the proposed

implementation is compared to that of an existing sequential im-

plementation of arithmetic coding in terms of both compression

ratio (ratio of size of the compressed file to size of the original

file) and execution time. The design has been synthesized taking

the Xilinx Spartan 3 FPGAs as the target technology and the re-

sults have been reported.

The rest of the paper is organized as follows. Section II,

following the introduction, briefly explains the arithmetic

coding algorithm and its implementation. It brings out theshortcomings in the existing strategies that limit these to

software implementations only. The proposed schemes are

outlined in Section III, which presents algorithms and parallel

architectures for executing both the phases of the arithmetic

coding algorithm. The detailed hardware designs are presented

in Section IV. Section V contains the simulation and synthesis

results along with the related discussions. Finally, the paper

concludes in Section VI.

II. ARITHMETIC CODING

Arithmetic coding completely bypasses the conventional no-

tion of replacing an input symbol with a specific codeword of

reduced size for achieving compression. Instead, it substitutesa stream of input symbols with an interval of real numbers be-

tween 0 and 1, [1], [4]–[6]. Frequently occurring symbols, or

those with a higher probability, contribute fewer bits to the code-

word used to represent the interval, resulting in compression of

the input data. In the decoding phase, the original symbols are

retrieved by recursively working on this output codeword.

A. Operation

In arithmetic coding, the half-open interval [0, 1) is first di-

vided into a number of subintervals, each subinterval corre-

sponding to a symbol in the input alphabet. The width of a

subinterval is proportional to the probability of occurrence of

the symbol it represents. As the coding progresses, the followingequation is used to update the code interval when “ ”, the th

symbol, is encountered as the next symbol in the input data

stream

(1)

In (1), represents the th symbol, and

represent the probability of occurrence of the symbols and ,

respectively, and , respectively, represent the lower

end and range of the code interval, given the present string ,

and and represent the corresponding updated

values. is set to 0 and to 1 when is the null string.

Then, (1) is used to recursively update the code interval while

processing any of the symbols. At the end, a pointer to the final

code interval is used to represent the original data stream.

The modeling information maintained in the decoder is iden-

tical to that in the encoder so that the original data stream can

be retrieved from the code bits. Similar to the encoder, the de-

coder also starts with a code interval set to and dividesthe interval into multiple subintervals, each corresponding to a

distinct symbol. The decoder shifts in the code bits, checks to

see which subinterval they point to and then outputs the corre-

sponding symbol as the decoded one. Then, it updates the cur-

rent interval to the new subinterval, normalizes the code interval

and then shifts in the next group of code bits. The decoder pro-

ceeds likewise till no code bits are left.

B. Implementation of Arithmetic Coding

Arithmetic coding has become a better understood and widely

researched topic since the availability of its source code in [4].

Since then, a number of attempts have been made to improve

the performance of this algorithm and simplify its implementa-tion [13]–[22]. However, these two factors are somewhat com-

plementary to each other and at best one can try to achieve a

tradeoff.

Arithmetic coding is basically implemented as two distinct

modules: modeler and coder . The modeler estimates the prob-

ability of each of the input symbols and sends the probabilities

to the coder, which maintains and recursively updates a code in-

terval, generating the code bits in the process.

1) Modeler: The compression achieved through arithmetic

coding can be significantly improved by accurate prediction

of the symbol probabilities, possible through the use of higher

order context models [17], [20]. A 0th order model just keeps

track of the symbol occurrence counts and uses them to estimatethe required probabilities. The scheme proposed by Witten et al.

[4] of maintaining different arrays to store, update and reorder

the occurrence counts, works well for files that have a skewed

probability distribution of symbols, but performs poorly for

others such as object files, which have a character usage that

is less skewed. An alternative implementation proposed by

Moffat [18], where the symbols are visualized as nodes of a

binary tree, achieves an execution time of order

for both uniform and skewed files. But, this also needs

words for an alphabet of size and may prove expensive

when realized in hardware due to the excessive storage space it

requires. The scheme proposed by Fenwick [19], necessitates

less storage and has been used by Moffat et al. for proposing animproved software implementation of arithmetic coding [20].




However, this scheme, due to its complexity, may be avoided

in a hardware implementation.

In Order-1 modeling, occurrence of the last symbol is taken

into account while estimating the symbol probabilities. It helps

in predicting the symbol probabilities more accurately and thus

gives better compression, albeit at the cost of higher hardware

complexity. For a 256-symbol alphabet, instead of a single 256-

element array as in Order-0 modeling, 256 such arrays have

to be used for storing the symbol occurrence counts. Also, up-

dating the modeling information becomes more complex [17],

[20].

2) Coder: In the digital implementation of the coder, the

code interval is first initialized to , where depends on

the amount of precision desired. It is then divided into subinter-

vals corresponding to the occurrence probabilities of the sym-

bols in the alphabet. When a symbol is input, the code interval is

set to the subinterval corresponding to that symbol. Afterwards,

the code interval is expanded following a renormalization pro-

cedure [4]–[6], and as a result, either “0”s or “1”s are output as

the code bits.Many compression applications deal with a binary alphabet.

Coders designed for multisymbol alphabet can certainly handle

such an alphabet, but specially tailored routines are more ef fi-

cient [13], [14]. Binary coding allows a number of components

of a multi-alphabet coder to be eliminated, which results in an

improved performance if the implementation is targeted at a bi-

nary alphabet only. This concept has been used here to realize

arithmetic coding for multisymbol alphabets through multiple

binary coders operating in a pipelined manner at different levels

of a tree representing the source alphabet.

III. PROPOSED SCHEME

A. Modeling Unit

1) Order-0 Modeling: In the proposed implementation, the

data structure used to store the modeling information is a com-

plete binary tree, in which leaf nodes represent the symbols. Let

represent the cumulative occurrence count of symbol .

The model related information is implicitly stored in a single

variable at each of the intermediate nodes of the tree. Consid-

ering an alphabet of symbols, the information stored in the

th node at th level of the tree, i.e., node , is given by

(2)

One such binary tree for 16 symbols is depicted in Fig. 1. It

may be observed that storing the modeling information in this

manner helps in dividing the code interval at the root and at

each of the intermediate tree nodes into two parts, the lower and

upper part. In addition to the information stored in each of the

nodes, the model requires one more piece of information, i.e.,

the total symbol count , which is stored in a separate register.

is sent to the root node as the top value at the initiation of

the coding operation. In the following, it is assumed that the top

value received by node is represented as .

Adaptive modification of model related information is done

as follows. When any symbol is encountered, the informationstored in a node is incremented only if corresponding

Fig. 1. Modeling tree for 16 symbols.

Fig. 2. Algorithm followed by Order-0 modeling unit.

bit of the symbol is zero. This achieves the necessary adaptation

in the code space at node , which has 0 as the lower end

point, as the intermediate point, and as the upperend point. The algorithm followed by Order-0 modeling unit is

shown in Fig. 2.

2) Order-1 Modeling: Similar to Order-0 modeling, in

Order-1 modeling also, the modeling information is stored in

each of the intermediate nodes of a binary tree. However, for

an alphabet of symbols, the information is now stored at

each tree node as two -element linear arrays and ,

indexed by the symbols in the alphabet. For any tree node,

stores the count of “0”s and stores the total count of

“0”s and “1”s encountered by the node, after seeing the symbol

. The code space at node now has 0 as the lower end

point, as the intermediate point and as the

upper end point where represents the previous symbol. Thealgorithm followed by Order-1 modeling is shown in Fig. 3. In




Fig. 3. Algorithm followed by Order-1 modeling unit.

Fig. 4. Parallel arithmetic encoder.

this algorithm, has the same interpretation as in algorithm of

Fig. 2.

3) Implementation: The modeler is realized as a linear array

of pipelined units, to (Fig. 4). For a 256-symbol al-

phabet, there are eight levels in the modeling tree. Each unit

in the modeler stores the modeling information for all the nodes

at the corresponding tree level, i.e., , stores the

modeling information for the nodes to .

B. Coding Unit

The coding unit consists of multiple binary coders, i.e.,to as shown in Fig. 4. The coder , receives

the intermediate point and top of the corresponding code interval

as modeling information, as well as the bit , from . It then

uses 0 as the lower end of the interval and encodes either the

lower or upper part of the interval depending on .

In the proposed scheme, the input symbols are routed through

each of the modeling units. As a symbol reaches ,

it reads the bit and works according to Algorithm 1 or 2. The

coder , upon receiving the modeling information and ,

generates the code bits and shifts them into an internal buffer.

The input data stream is compressed in blocks of a particular

size and the resulting output files corresponding to the different

levels are either stored as the compressed data or sent out as asequence of code bits properly delimited from each other.

Fig. 5. Parallel arithmetic decoder.

TABLE IEXAMPLE MODELING INFORMATION

The decoder also has a linear array structure for both the de-

coding and modeling units (Fig. 5). The code bits, after being

disassembled, are given to the corresponding binary decoders.

Each of the decoders decodes a sequence of bits and combi-

nation of corresponding bits at all the units gives the decoded

symbols. It may be mentioned here that as the structure and op-

eration of the modeling units in both the decoder and encoder

are identical, the decoder can share the resources of the encoder.

C. Example

An example of Order-0 implementation is given below inorder to further clarify the proposed scheme. Let it be assumed

that the alphabet consist of eight symbols only (0 to 7). After 20

symbols are processed, the occurrence counts and the cumula-

tive occurrence counts of the symbols are as shown in Table I.

A snapshot of the modeling tree at this point, which shows the

values stored in the nodes, is given in Fig. 6. Suppose, the 21st

symbol is the symbol 3. Now, (“011”) is input to the root

node, i.e., node , along with the total T, which is equal

to 20. sends the values 8 and 20 to the coder and 8 to left

child, i.e., node , as top information. It also increments

. sends 5 and 8 to the coder and 3 to . Fi-

nally, just sends 2 and 3 to the coder. The updated mod-

eling tree after encoding the symbol 3 is as shown in Fig. 7. Thecoders in the three different levels encode the subinterval




Fig. 6. Example modeling tree.

Fig. 7. Updated modeling tree.

Fig. 8. Functionality of M for Order–0 modeling.

in the interval , the subinterval in the interval

and the subinterval in the interval , respectively.

IV. DETAIL HARDWARE DESIGN

A. Order-0 Modeling Unit

The functionality of for Order-0 modeling is shown in

Fig. 8. As mentioned in Subsection III-A, needs to store

only the root node of the modeling tree. It uses a 16-bit registerNODE_DATA to store the modeling information. The other

TABLE IIMEMORY REQUIREMENT FOR ORDER-0 MODELER

Fig. 9. Functionality of M

for Order-1 modeling.

nodes in the modeling unit differ only in terms of the amount of

storage space required as they have to store multiple tree nodes.

The memory requirements of the modeling units are given in

Table II. However, the functionalities of these units are identical

to each other. Each unit takes two values as input from

, i.e., SYMBOL and VALUE_FROM_TOP. It extracts

the corresponding bit of the symbol and sends it as CUR-

RENT_BIT to the coder together with VALUE_FROM_TOP

and NODE_DATA as TOP_VALUE and INTER_VALUE,

respectively. then computes the data to be sent to the next

unit and updates NODE_DATA.

B. Order-1 Modeling Unit Fig. 9 depicts the functionality of for Order-1 modeling.

The unit now stores the occurrence counts of 0’s and the total

count with respect to the different symbols for the root node. Ob-

viously, there is an increase in the memory requirement as com-

pared to the corresponding Order-0 modeling unit. Now,

has two memory units of 256*16 bits each and a register named

LAST_SYMBOL. Each of the other units, i.e., ,

stores the above modeling information for all the tree nodes at

the corresponding level. Hence, they differ only in the amount of

storage space needed. The memory requirements of the Order-1

modeling units are given in Table III. Each unit gets the

current symbol SYMBOL as input from and then using

LAST_SYMBOL as an index retrieves the intermediate and topvalues of the code interval from the memory units. These are




TABLE IIIMEMORY REQUIREMENT FOR ORDER-1 MODELER

Fig. 10. Coder functionality.

sent to the coder as INTER_VALUE and TOP_VALUE respec-

tively. After sending the values to the coder, updates the

memory units with respect to SYMBOL and sends it to .

Also, SYMBOL is moved to LAST_SYMBOL.

C. Coder Unit

Functionality of the coder has been depicted in Fig. 10.

Each coder consists of a 16-bit divider, a 16-bit multiplier and

a RENORMALIZATION UNIT. The current implementation

uses a 16-bit precision and both the registers LOW and RANGE

are initialized to . INITIALIZE is a 1-bit signalthat equals one when the operation starts and is zero afterwards.

The coder gets the intermediate value and top of the code in-

terval at that level from the corresponding modeler. It uses 0 as

the lower end of the code interval and encodes either the lower

part or the upper part of the interval depending on whether

CURRENT_BIT is 0 or 1. Algorithm used by coder can be seen

in Fig. 11. The updated values of LOW and RANGE are sent

to the renormalization unit, which uses the algorithm shown

in Fig. 12 to generate the output bits. The functionality of this

unit is shown in Fig. 13. In this figure, CODE_QUARTER and

CODE_HALF are registers with fixed values 16384 and 32768,

respectively. The renormalization unit checks to see if RANGEis less than CODE_QUARTER, and if it is, depending on the

Fig. 11. Algorithm used by coder.

Fig. 12. Algorithm used by renormalization unit.

current values of LOW, RANGE, and BITS_TO_FOLLOW,“0”s and “1”s are output and the interval is doubled.

D. Pipelined Operation

The pipelined processing of multiple symbols at different tree

levels is depicted in Fig. 14. In this figure, MODLER, MULT,

DIV, and RENUNIT represent the modeler, the multiplier, the

divider and the renormalization unit, respectively. The different

units operate as explained below.

When the symbol enters the modeler, three clock cycles are

consumed to read the data from memory, provide the current

bit, intermediate value and top value to the coder at that level,

and update the modeling information. The coder consists of the

divider, multiplier and the renormalization unit. The divider isimplemented as a 16-bit non-restoring divider, which consists of

256 controlled add/subtract (CAS) cells, arranged in 16 levels

[29]. Buffers have been inserted in between the levels to realize

a six-stage pipeline to optimally implement the required divi-

sion operation. The multiplier takes two clock cycles to give the

output. The renormalization unit takes a minimum of three clock

cycles to renormalize the interval and give the output as a “0”or “1” bit. Therefore, the pipeline latency comes out to be 14

clock cycles. Depending on the input symbol and the value of

BITS_TO_FOLLOW, the renormalization unit may take more

than three cycles. This is known after the first renormalization

cycle and in such a case signals are generated by RENUNIT

to introduce a pipeline stall by disabling the clock to the otherunits. One such situation is seen in the 16th clock cycle. When




Fig. 13. Functionality of the renormalization unit.

Fig. 14. Pipelined operation.

a pipeline stall is introduced, data is either being transferred be-

tween the units or the division operation is in progress. As the di-

vider has buffers in between the stages, no data would be lost due

to a pipeline stall. Similar pipelined execution takes place simul-

taneously at different levels of the binary tree. For a 256-symbol

tree, there are 8 such pipelined operations being executed in par-

allel, which improves the throughput.

V. RESULTS AND DISCUSSION

A. Simulation Results

A software simulation of the proposed implementation

strategy has been carried out in C++ and the amount of com-

pression as well as execution speed has been estimated while

compressing files in the Canterbury Corpus test set [30]. The

amount of compression is measured in terms of the compres-

sion ratio (defined as the ratio of the compressed file size to

size of the original file) obtained while compressing the files inthe corpus. Fig. 15 shows the variation in average compression

Fig. 15. Variation in average compression ratio with change in block size.

ratio obtained for all the files in the Canterbury Corpus using

different block sizes for compressing the files with a 32-bit pre-

cision. As per expectation, use of Order-1 modeling gives bettercompression as compared to use of an Order-0 model.




unit, which reduces the required hardware resources in addition

to reducing the encoding and decoding times.

Compression performance of the proposed implementation

has been compared with one of the existing serial implementa-

tions in terms of both compression ratio and execution time. It is

seen that the proposed implementation would result in a better

implementation as far as execution speed is concerned while

giving comparable compression ratios. The proposed scheme

also greatly simplifies the hardware realization of the different

modules.

REFERENCES

[1] T. Bell, J. Cleary, andI. Witten , Text Compression. EnglewoodCliffs,NJ: Prentice-Hall, 1990.

[2] J. Ziv and A. Lempel, “Compression of individual sequences viavariable rate coding,” IEEE Trans. Inf. Theory, vol. IT-24, no. 5, pp.530–536, Sep. 1978.

[3] D. A. Huffman, “A method for the construction of minimum redun-dancy codes,” in Proc. Inst. Radio Eng., 1952, vol. 40, pp. 1098–1101.

[4] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for datacompression,” Comm. ACM , vol. 30, no. 6, pp. 520–540, Jun. 1987.

[5] G. G. Langdon, “An introduction to arithmetic coding,” IBM J. Res. Dev., vol. 28, no. 2, pp. 135–149, Mar. 1984.

[6] P. G. Howard and J. S. Vitter, “Arithmetic coding for data compres-

sion,” Proc. IEEE , vol. 82, no. 6, pp. 857–865, Jun. 1994.[7] E. H. Yang and J. C. Keiffer, “Grammar-based codes: A new class of

universal source codes,” IEEE Trans. Inf. Theory, vol. 46, no. 5, pp.737–754, May 2000.

[8] J. C. Keiffer, E. H. Yang, G. Nelson, and P. Cosman, “Lossless com-pression via multilevelpattern matching,” IEEE Trans. Inf. Theory, vol.46, no. 7, pp. 1227–1245, Jul. 2000.

[9] E. H. Yang, A. Kaltchenko, and J. C. Keiffer, “Universal losslessdata compression with side informa-tion by using a conditional MPMgrammer transform,” IEEE Trans. Inf. Theory, vol. 47, no. 9, pp.2130–2150, Sep. 2001.

[10] S. Jones, “100 Mbits/s adaptive data compressor design using selec-tively shiftable content-addressablememory,” Proc. IEEE , vol.139,no.

4, pp. 498–502, Apr. 1992.[11] J.L. NunezandS. Jones, “Gbits/s lossless data compressionhardware,”

IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp.499–510, Jun. 2003.

[12] M. Milward, J. L. Nunex, and D. Mulvaney, “Design and implemen-tation of a parallel high-speed data compression system,” IEEE Trans.

Parallel Distr. Syst., vol. 15, no. 6, pp. 481–490, Jun. 2004.[13] G. G. Langdon and J. Rissanen, “Compression of black-and-white im-

ages with arithmetic coding,” IEEE Trans. Commun., vol. COM-29, no.6, pp. 858–867, Jun. 1981.

[14] S. Kuang, J. Jou, and Y. Chen, “The design of an adaptive on-line bi-nary arithmetic coding chip,” IEEE Trans. Circuits Syst. I , Fundam.

Theory Appl., vol. 45, no. 7, pp. 693–706, Jul. 1998.[15] G. V. Cormack and Horspool, “Data compression using dynamic

Markov modeling,” Comp. J., vol. 30, no. 6, pp. 541–550, Dec. 1987.[16] M. Nelson, “Arithmetic coding + statistical modeling = data com-

pression, part 1-arithmetic coding,” Dr. Dobb’s J. Feb. 1991 [Online].

Available: http://dogma.net/markn/articles/arith/part1.htm[17] M. Nelson, “Arithmetic coding

+

statistical modeling=

data compres-sion, part 2-statistical modeling,” Dr. Dobb’s J. Feb. 1991 [Online].Available: http://dogma.net/markn/articles/arith/part2.htm

[18] A. Moffat, “Linear time adaptive arithmetic coding,” IEEE Trans. Inf.Theory, vol. 36, no. 2, pp. 401–406, Mar. 1990.

[19] P. M. Fenwick, “A newdata structurefor cumulative frequency tables,”Soft. Prac. Exp. 24, vol. 3, pp. 327–336, Mar. 1994.

[20] A. Moffat and I. H. Witten, “Arithmetic coding revisited,” ACM Trans.

Inf. Syst., vol. 16, no. 3, pp. 256–294, Jul. 1998.[21] M. Boo, J. D. Bruguera, and T. Yang, “A VLSI architecture for

arithmetic coding of multilevel images,” IEEE Trans. Circuits Syst. II,

Analog Digit. Signal Process., vol. 45, no. 1, pp. 163, 168, Jan. 1998.[22] R. Osorio and J. Bruguera, “New arithmetic coder/decoder architec-

tures based on pipelining,” in Proc. IEEE Int. Conf. Application-Spe-

cific Syst., Arch. Process., Jul. 1997, pp. 106–115.

[23] J. Jiang and S. Jones, “Parallel design of arithmetic coding,” Proc. IEE Comp. Digit. Tech., vol. 144, no. 6, pp. 327–333, Nov, 1994.[24] J. Jiang, “A novel parallel architecture for black and white image com-

pression,” Signal Process. Image Commun., vol. 8, pp. 465–474, 1996.[25] K.-B. Lee, J.-Y. Lin, and C.-W. Jen, “A multisymbol context-based

arithmetic coding architecture for MPEG-4 shapecoding,” IEEETrans.

Circuits Syst. Video Tech., vol. 15, no. 2, pp. 283–295, Feb. 2005.[26] S. Mahapatra, J. Nunez, C. Feregrino-Uribe, and S. Jones, “Parallel

implementation of a multi-alphabet arithmetic coding algorithm,” in IEE Colloquium on Data Compression: Methodsand Implementations,23rd ed. London, U.K.: IEE, Nov. 1999.

[27] R. Stefo, J. L. Nunez, C. Feregrino, S. Mahapatra, and S. Jones,

“FPGA-based modeling unit for high speed lossless arithmeticcoding,” in Proc. 11th Int. Conf. FPL2001, Belfast, Northern Ireland,Aug. 2001, pp. 643–647.

[28] S. Mahapatra and K. Singh, “A parallel scheme for implementing mul-tialphabet arithmetic coding in high-speed programmable hardware,”

in Proc. IEEE Int. Conf. ITCC-05, Las Vegas, NV, 2005, pp. 79–84.[29] A. Guyot, “Divider,” TIMA-CMP, Grenoble, France, Feb. 2004 [On-

line]. Available: http://tima-cmp.imag.fr/~guyot/Cours/Oparithm/eng-lish/Divise.htm

[30] R. Arnold and T. Bell, “A corpus for the evaluation of lossless com-pression algorithms,” in Proc. IEEE Data Compression Conf , Snow-bird, UT, 1997, pp. 201–210.

Sudipta Mahapatra (S’93–M’95) received theB.Sc. (Engg) degree in electronics and telecommu-nications engineering from Sambalpur University,Sambalpur, India, in 1990, and the M.Tech. andPh.D. degrees in computer engineering from IndianInstitute of Technology (IIT), Kharagpur, India, in1992 and 1997, respectively.

From April 1993 to September 2002, he has

worked in various capacities in the ComputerScience and Engineering Department, of NationalInstitute of Technology (NIT), Rourkela, India. He

visited the Electronic Systems Design Group of Loughborough University,U.K., as a BOYSCAST Fellow of the Department of Science and Technology,Government of India, from March 1999 to March 2000. Currently, he isworking as an Assistant Professor in the Department of Electronics andElectrical Communication Engineering, IIT, Kharagpur, India. His areas of research interest include parallel and distributed computing and lossless datacompression hardware.

Dr. Mahapatra is a a life member of CSI.

Kuldeep Singh received the B.E. degree in elec-tronics engineering from Finolex Academy of Management and Technology, Ratnagiri, India,in 2002 and the M.Tech. degree in computer en-gineering from Indian Institute of Technology,

Kharagpur, India, in 2005.Since 2005, he has been with Philips Medical Sys-

tems, Bangalore, India working on software systems

used in various Philips medical equipments.

arith coding

Documents