on maximum likelihood soft-decision decoding of binary linear codes

7
IEEE TRANSACTIONS ON INFORMATlON THEORY, VOL. 39, NO. 1, JANUARY 1993 197 the channel capacity. This is due to the ability to take advantage of the information embedded in the (up to) T different values that the channel output has during a bit interval. This is clearly not the case in a physical model for the channel. 111. EFFICIENT ASYNCHRONOUS CHANNEL ACCESSING ALGORITHM The codes proposed in the last section assure that if exactly T users are simultaneously active the aggregate information rate is 1 bits/symbol. If less than T users are active, the aggregate rate decreases proportionally. In a realistic situation the value of T is neither constant nor can it be predetermined. We propose an asynchronous channel accessing algorithm at a throughput arbitrarily close to unity for any number of active users. This algorithm is suited for a slotted system in which the duration of each slot is equal to the time needed for transmission of m + A bits (= one message). It also assumes that there exists an independent, errorless feedback channel through which all users are notified by the receiver about one of two outcomes “success” or “collision.” A “success” is fed back when the receiver can uniquely determine the active users and their transmitted messages. Otherwise, a “collision” is fed back. (The time delays A,, i = 1, . . . , M in this case include also the various propagation times of the feedback signal to the users. We summarize briefly the main steps of the algorithm (a detailed description and analysis is given in [Z]). Each active user transmits the first m + A bits of its codeword. If there is only a single active user the feedback is “success” and the operation of the algorithm is successfully completed. If more users are active, the decoder cannot decode their codewords and a “collision” is fed back. %en a “collision” is fed back each active user transmits the next m + A bits of its codeword. Two active users will succeed at this stage. This process continues until “success” is fed back. If I< users are active this happens after I( slots, thus the aggregate information transfer rate is 1 bits/symbol. There are situations where at the lth stage of the algorithm a collision of T > 1 active users might be decoded as a “success” by the receiver. This is due to the fact that, at the first stage of the decoding procedure, a new vector 2’ is obtained from Y’ via a symbol by symbol modulo 2 operation, i.e., 2’ = Y’ mod 2. The vector 2‘ is a modulo 2 sum of up to 1 columns of H’. If T > 1 users yield the same 2‘ an undetected collision occurs. (Similar to an undetected error in the BCH decoding procedure). To resolve this problem each user might add as a prefix to its transmitted codeword a sequence of A “1”. In this way the receiver reveals the number of active users after the first slot of a session. Note, however, that the users need not know this namber and therefore a success/collision feedback suffices. ACKNOWLEDGMENT This work is a part of Ph.D. thesis carried out under teh supervision of Prof. I. Bar-David and Prof. R. Rom. The author would like to thank them for their instructive guidance and helpful suggestions. The author would also like to thank Dr. R. Roth for many fruitful discussions. I11 PI [31 [41 151 REFERENCES P. Mathys, “Coding for T active users out of N, for a multiple-access channel,” IEEE Trans. Inform. Theory, vol. 36, pp. 1206-1219, Nov. 1990. I. Bar-David, E. Plotnik, and R. Rom, “An efficient multiple-access method to the binary adder channel,” Proc. IEEE Infocom 89, Ottawa, Canada, Apr. 1989, pp. 1115-1120. E. C. Van der Meulen, “A survey of multi-way channels in information theory: 1961-1976,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 1-37, Jan. 1977. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes. Amsterdam: North-Holland, 1977. C. L. Chen, W. W. Peterson, and E. J. Weldon Jr., “Some results on quasi-cyclic codes,” Inform. Contr., vol. 15, pp. 407-423, Nov. 1969. On Maximum Likelihood Soft-Decision Decoding of Binary Linear Codes Niels J. C. Lous, Patrick A. W. Bours, Member, IEEE, and Henk C. A. van Tilborg, Senior Member, IEEE Absfract- Two new implementations of a maximum likelihood soft- decision decoding algorithm for binary linear codes are derived. Instead of checking only error patterns yielding a codeword when added to the rounded-off version of the received vector, these implementations principally consider all possible column patterns. Compared to known implementations, this allows a more efficient use of branching and bounding techniques. Index Terms- Binary (linear) codes, maximum likelihood decoding, compare soft-decision decoding, branching and bounding algorithms. I. INTRODUCTION We consider a situation where codewords of a binary linear code C are transmitted over a Gaussian channel. Thus, when a codeword c is sent, a word r with real components T, = cz + n, is received; the nt ’s are independent, normally distributed random variables with mean zero and variance CT’. The problem is to retrieve c from r. When performing maximum likelihood soft-decision decoding to do so, a codeword c’ is determined that maximizes the conditional probability that c’ was sent, given the received vector r. It is well known [l, ch. 151 that c’ is a codeword with minimal Euclidean distance to r, that is, c’ minimizes dgUC,(C’,f = (cl - TJ2. 1=1 In general, it is difficult to find a codeword minimizing the Euclidian distance to r, but it is easy to find the binary string b that minimizes Manuscript received December 28, 1990; revised July 19, 1991. This work was supported in part by the Dutch Organization for Scientific Research (NWO), The Netherlands. This work was presented in part at the IEEE International Symposium on Information Theory, Budapest, Hungary, June 24-28, 1991. The authors are with the Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands. IEEE Log Number 9203034. 0018-9448/93$03.00 0 1993 IEEE

Upload: hca

Post on 22-Sep-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On maximum likelihood soft-decision decoding of binary linear codes

IEEE TRANSACTIONS ON INFORMATlON THEORY, VOL. 39, NO. 1, JANUARY 1993 197

the channel capacity. This is due to the ability to take advantage of the information embedded in the (up to) T different values that the channel output has during a bit interval. This is clearly not the case in a physical model for the channel.

111. EFFICIENT ASYNCHRONOUS CHANNEL ACCESSING ALGORITHM

The codes proposed in the last section assure that if exactly T users are simultaneously active the aggregate information rate is 1 bits/symbol. If less than T users are active, the aggregate rate decreases proportionally.

In a realistic situation the value of T is neither constant nor can it be predetermined. We propose an asynchronous channel accessing algorithm at a throughput arbitrarily close to unity for any number of active users. This algorithm is suited for a slotted system in which the duration of each slot is equal to the time needed for transmission of m + A bits (= one message). It also assumes that there exists an independent, errorless feedback channel through which all users are notified by the receiver about one of two outcomes “success” or “collision.” A “success” is fed back when the receiver can uniquely determine the active users and their transmitted messages. Otherwise, a “collision” is fed back. (The time delays A,, i = 1, . . . , M in this case include also the various propagation times of the feedback signal to the users.

We summarize briefly the main steps of the algorithm (a detailed description and analysis is given in [ Z ] ) .

Each active user transmits the first m + A bits of its codeword. If there is only a single active user the feedback is “success” and the operation of the algorithm is successfully completed. If more users are active, the decoder cannot decode their codewords and a “collision” is fed back. %en a “collision” is fed back each active user transmits the next m + A bits of its codeword. Two active users will succeed at this stage. This process continues until “success” is fed back.

If I< users are active this happens after I( slots, thus the aggregate information transfer rate is 1 bits/symbol.

There are situations where at the lth stage of the algorithm a collision of T > 1 active users might be decoded as a “success” by the receiver. This is due to the fact that, at the first stage of the decoding procedure, a new vector 2’ is obtained from Y’ via a symbol by symbol modulo 2 operation, i.e., 2’ = Y’ mod 2. The vector 2‘ is a modulo 2 sum of up to 1 columns of H ’ . If T > 1 users yield the same 2‘ an undetected collision occurs. (Similar to an undetected error in the BCH decoding procedure). To resolve this problem each user might add as a prefix to its transmitted codeword a sequence of A “1”. In this way the receiver reveals the number of active users after the first slot of a session. Note, however, that the users need not know this namber and therefore a success/collision feedback suffices.

ACKNOWLEDGMENT

This work is a part of Ph.D. thesis carried out under teh supervision of Prof. I. Bar-David and Prof. R. Rom. The author would like to thank them for their instructive guidance and helpful suggestions. The author would also like to thank Dr. R. Roth for many fruitful discussions.

I11

PI

[31

[41

151

REFERENCES

P. Mathys, “Coding for T active users out of N, for a multiple-access channel,” IEEE Trans. Inform. Theory, vol. 36, pp. 1206-1219, Nov. 1990. I. Bar-David, E. Plotnik, and R. Rom, “An efficient multiple-access method to the binary adder channel,” Proc. IEEE Infocom 89, Ottawa, Canada, Apr. 1989, pp. 1115-1120. E. C. Van der Meulen, “A survey of multi-way channels in information theory: 1961-1976,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 1-37, Jan. 1977. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error Correcting Codes. Amsterdam: North-Holland, 1977. C. L. Chen, W. W. Peterson, and E. J . Weldon Jr., “Some results on quasi-cyclic codes,” Inform. Contr., vol. 15, pp. 407-423, Nov. 1969.

On Maximum Likelihood Soft-Decision Decoding of Binary Linear Codes

Niels J. C. Lous, Patrick A. W. Bours, Member, IEEE, and Henk C. A. van Tilborg, Senior Member, IEEE

Absfract- Two new implementations of a maximum likelihood soft- decision decoding algorithm for binary linear codes are derived. Instead of checking only error patterns yielding a codeword when added to the rounded-off version of the received vector, these implementations principally consider all possible column patterns. Compared to known implementations, this allows a more efficient use of branching and bounding techniques.

Index Terms- Binary (linear) codes, maximum likelihood decoding, compare soft-decision decoding, branching and bounding algorithms.

I. INTRODUCTION We consider a situation where codewords of a binary linear code

C are transmitted over a Gaussian channel. Thus, when a codeword c is sent, a word r with real components T , = cz + n, is received; the nt ’s are independent, normally distributed random variables with mean zero and variance CT’. The problem is to retrieve c from r .

When performing maximum likelihood soft-decision decoding to do so, a codeword c’ is determined that maximizes the conditional probability that c’ was sent, given the received vector r . It is well known [l, ch. 151 that c’ is a codeword with minimal Euclidean distance to r , that is, c’ minimizes

d g U C , ( C ’ , f = (cl - T J 2 .

1=1

In general, it is difficult to find a codeword minimizing the Euclidian distance to r , but it is easy to find the binary string b that minimizes

Manuscript received December 28, 1990; revised July 19, 1991. This work was supported in part by the Dutch Organization for Scientific Research (NWO), The Netherlands. This work was presented in part at the IEEE International Symposium on Information Theory, Budapest, Hungary, June 24-28, 1991.

The authors are with the Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands.

IEEE Log Number 9203034.

0018-9448/93$03.00 0 1993 IEEE

Page 2: On maximum likelihood soft-decision decoding of binary linear codes

198 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

d&(b, r ) : its components are level

0, if r z 5 0.5. 1, if T * > 0.5. b, = {

If b is a codeword, e‘ = b; if b C, we will invert one or more of its bits such that the resulting string is a codeword. However, inverting b, to b , t3 1 increases dguCl(b, r ) with 11 - 2 ~ ~ 1 . Thus, in order to find c’, we have to determine a set of positions I minimizing

under the condition that inverting b on the positions in I yields a codeword.

The positions in I correspond to a set of columns of the code’s parity check matrix H that add up to the syndrome s of b. To minimize (3), the columns in this set should be linearly independent, i.e., the column set should not contain any nontrivial subset summing to zero. Thus, if a certain column occurs more than once in H , only one of its copies may be chosen as an element of the column set corresponding to I . In the sequel, we therefore assume H not to have repeated columns. Equivalently, we assume that C has minimum distance at least 3.

These observations lead to the following maximum likelihood soft-decision decoding algorithm for a linear code C of length n (cf. [21).

Determine the binary vector b with minimal Euclidean distance to r. Determine the syndrome s of b using the code’s parity check matrix H. I F s = O

THEN Output c’ = b . ELSE BEGIN

Assign to the ith column of H the cost p, = 11 - 2 ~ , 1 , for i = 1. ...) n. Find a set of (linearly independent) columns of H which add up to s, such that the sum of the costs of these columns is minimal. Invert b on the positions corresponding to these columns. Let c’ be the result. Output c’.

end.

We want to emphasize that this algorithm outputs the codeword closest to r among all codewords (as in [3]). The usual approach ([4], [ 5 ] ) is to look for a codeword that is closest to r with sufficiently high probability.

The outline of the rest of the paper is as follows. In Section 11, we discuss two known implementations of the above algorithm. We will compare them to the new implementation presented in Section 111, and to its “improved” version as described in Section IV. In Appendix A, we show how to apply our method to non-linear codes. Appendix B contains simulation results of the discussed implementations.

11. TWO KNOWN IMPLEMENTATIONS

Before discussing the implementation of the above algorithm by Snyders et al. [2], and the implementation based on Bours’ [6], we introduce some terminology. A set of columns of H will be called an column pattern. Its weight is the number of columns in the set. The cost of a column pattern is defined as the sum of the costs of the columns in the pattern. A column pattern is cheaper than another one if its cost is lower; it is said to be more expensive if its cost is higher. An errorpattern is a column pattern of which the columns sum to the

Fig. 1. Binary tree in the case a = 4.

syndrome s of b. The codeword obtained by applying hard-decision decoding (HD) to b will be denoted by ~ H D : define CHD =b @ ~ H D .

One may transform b into a codeword by simply performing hard- decision decoding. The cost of this approach is

2 ( eHD) .= l

Column patterns containing columns of H with cost CHD or more do not have to be considered when minimizing (l), since the cost of such column patterns exceeds the cost of hard decision decoding. Let A be the set consisting of the columns of H with cost not exceeding CHD. We assume the columns of A to be ordered such that their costs are ascending. Let a denote the number of columns of A. Each column pattern can be represented by a binary vector of length a, the ith bit of which is a one iff the ith column of A belongs to the pattern.

The implementation proposed in [2] is to check (a subset of) the error patterns consisting of columns A. Given the syndrome s of b, the error patterns consisting of two columns only can be generated; from these patterns of weight two it is possible to construct the error patterns of weight three, etc. In [2], a branching and bounding rule is proposed to decrease the number of error patterns to be considered.

The implementation presented in [6], to which we will refer as the binary tree implementation, also checks patterns consisting of columns of A. However, instead of restricting itself to error patterns, the implementation principally considers all possible column patterns. This increases the number of candidate patterns, but it allows a more efficient use of branching and bounding techniques.

The order in which the column patterns are considered is deter- mined by a particularly labeled directed binary tree; the construction of this tree, and the labeling of its nodes is given by the tree construction rule.

Tree Construction Rule: The root of the binary tree is labeled 10.. .0, where the label has length a. Nodes with label e1e2 ... e,100.. .0, 0 5 i 5 a - 2, have two outgoing branches; the left one goes to the node with label elez . . . e,110. . . O , and the right one to the node labeled elez . . . e,010.. .O. Nodes with label el e2 . . . e,-l1 are endpoints.

A labeled tree in the case a = 4 is shown in Fig. 1. We say that node q succeeds node E (or, equivalently, that node

E preceeds node q) if q is on a higher level in the tree than E, and if 7 can be reached from E by “following” only ascending branches of the tree.

We identify the nodes in the tree with the column patterns represented by their labels. The following theorem, and the branching and bounding rule it implies, can easily be proved from the tree construction rule.

Theorem I: All the successors of a column pattern E in the binary tree are more expensive than E .

Page 3: On maximum likelihood soft-decision decoding of binary linear codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993 199

Rule 1: If a column pattern is more expensive than the cheapest error pattern yet found, or if it i s an error pattem and cheaper than the cheapest error pattern yet found, its successors need not to be checked.

In the sequel, we say that a column pattem in cheap enough if it is cheaper than the cheapest error pattern yet found; it is said to be too expensive otherwise. To consider the column patterns in a binary tree, we use depth first search. We start checking the pattern corresponding to the root of the tree. Having checked a pattern, we first consider the column patterns in its left subtree (i.e., the subtree reached by following the left outgoing branch), and then the patterns in the right subtree. When checking the column patterns in a subtree, we can follow the same procedure. Endpoints of the tree are considered to have empty subtrees.

Example: The order in which the column patterns in the binary tree of Fig. 1 are considered, is the following: 1000, 1100, 1110, 1111,

As the weight of e H D is minimal among all error patterns (by the definition of hard-decision decoding), we do not need to check column patterns of weight less than the weight of e H D . Furthermore, we can exclude patterns consisting of more than n - k columns, where k is the dimension of the considered code. This is so because the parity check matrix H has height n - k , so any ( n - k + 1)-tuple of its columns must contain linear dependencies.

The number of checked column patterns is upper bounded by E:=;" ( y ). For Hamming codes, we have n - k = log n, hence the worst case number of considered patterns is asymptotically upper bounded by (

Simulation results for both Snyders' and the previous implementa- tion, applied to Hamming codes of various lengths, can be found in Appendix B. Typically the number of nodes to be checked depends on the received vector r , so the average number of nodes to be checked depends on the reliability of the channel, i.e., the variance of the Gaussian distribution. However, once a column pattern is found to be too expensive, not only the patterns succeeding it in the tree, but far more patterns can be seen to be too expensive. For instance, if the pattern represented by 10100 is too expensive, so are 01100, 01011, 01111, and fifteen other patterns that are not successors of 10100!

This observation makes it worthwhile to look for a better im- plementation of the complete maximum likelihood soft-decision decoding algorithm.

1101, 1010, 1011, 1001,0100,0110,0111,0101,0010,0011,0001.

- 210gz n.

111. A NEW IMPLEMENTATION

The new implementation is based on the same idea as the binary tree implementation: it determines whether a column pattern con- sisting of columns A is cheap enough, and, if so, it checks if this pattern is an error pattern. However, instead of using a binary tree to determine the order in which the column patterns are checked, another graph is used, allowing a more efficient use of branching and bounding. The construction of the graph is described by the graph construction rule.

Graph Construction Rule: The root of the graph, i.e., the node on level 1, is labeled 10 . . '0 , where the label has length a. A node E

with label e = e1e2 ... e , on level j of the graph may have two types of immediate successors.

Type 1: If e = Oe2e3...ea, E has a successor on level j + 1 with label le2e3 . . . e,.

Type 2: If e = e1e2...e1/--110e2~+2 ... e , for some i ' , 1 5 i' 5 a - 1, E has a successor on level j + 1 labeled elez. .. e z ~ - - 1 0 1 e 2 ~ + 2 ~ ~ ~ e,.

level

10

9

8

1

1110 0

0 0001

1 I1000

Fig. 2. Graph in the case a = 4.

Example: Let a = 8. The node with label 01101110 has successors 11101110 (type l), 01011110 (type 2), and 01101101 (type 2). The node labeled 11111000 has only one successor with label 11110100 (type 2).

Fig. 2 shows the resulting graph in the case a = 4. From the graph construction rule, it is not immediately clear that

each node lies on a unique and well-defined level in the graph. However, as shown in Theorem 2, the level of a node is uniquely determined by its label.

Theorem 2: The level of a node with label e (of length a ) equals a

ie,. (4) i= l

Proof: The statement is trivially true for the root 10 . . . O of the graph. Let (4) be proved for all nodes up to level j . Consider a node E on level j + 1 with label e = e1e2 ... e,. Let d = d l d 2 ... d, be the label of a predecessor of E on level j . If e is a successor of d of type 1, we have

If e is a successor of type 2, there is an i', 1 5 i' 5 a - 1, such that i ' - 1

1=1'+2

id ,

a

= l + C i d , = j + l . E = l

0

As before, we identify the nodes of the graph with the column patterns represented by their labels.

Page 4: On maximum likelihood soft-decision decoding of binary linear codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993 200

TABLE I SIMULATION RESULTS FOR THE [31,26,3] HAMMING CODE (l00OOO SIMULATIONS PER u2)

~ ~~~~

Average Number of Checked Pattems

SD Impl. 1 Impl. 2

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15

0 0

45 207 738

2539 3452 4762 5891 6675 7218 7660 7713 7991 8325

0.00 0.00 0.00 1.78 2.68 2.98 4.08 5.86 8.41 9.19

10.22 10.38 12.17 11.92 11.51

0.00 0.00 2.00 7.92

28.07 24.96 45.45 60.98 77.02 91.37

104.36 108.45 116.61 122.42 123.55

Impl. 3

0.00 0.00 1.00 2.22 8.17 5.78

10.90 16.56 23.23 26.80 30.44 33.36 36.02 37.67 37.08

Impl. 4 Pe":;"

0.00 0.00 1.00 2.00 3.67 3.59 7.01

10.28 13.92 16.50 18.28 19.25 21.96 22.36 22.54

0.0000 o.Ooo0 0.0000 0.0000 0.0007 0.0031 0.0099 0.0199 0.0328 0.0461 0.0633 0.0764 0.0873 0.0963 0.1068

0.0000 0.0000 0.0000 0.0000 0.0068 0.0285 0.0895 0.1711 0.2688 0.3698 0.4831 0.5654 0.6311 0.6781 0.7252

TABLE I1 SIMULATION RESULTS FOR THE [63,57,3] HAMMING CODE (25000 SIMULATIONS PER 2)

Average Number of Checked Patterms

U 2 SD Impl. 1 Impl. 2 Impl. 3 Impl. 4 pbit err PZYd

0.1 0.02 0.03 0.4 0.05 0.06 0.07 0.08 0.09 0.10

0 0

160 770

2725 5671 7004 8259 8494 8914

0.00 0.00 0.14 6.88

18.81 22.41 49.05 83.48 97.52

115.58

0.00 0.00 2.42

21.12 124.37 175.62 273.94 324.09 382.04 425.15

0.00 0.00 1.00 6.86

22.36 30.97 60.74 83.08

104.09 116.51

0.00 0.00 1.00 3.57 7.67

17.20 31.09 43.81 54.19 60.86

0.0000 O.oo00 0.0000 0.0003 0.0022 0.0070 0.0211 0.0361 0.0502 0.0626

~~

0.0000 0.0000 0.0000 0.0058 0.0405 0.1229 0.3345 0.5236 0.6768 0.7735

TABLE I11 SIMULATION RESULTS FOR THE [127,120,3] HAMMING CODE (10000 SIMULATIONS PER uZ)

Average Number of Checked Pattems

p w o r d err p b i t e r r U2 SD Impl. 1 Impl. 2 Impl. 3 Impl. 4

0.01 0 0.00 0.00 0.00 0.00 o.Ooo0 o.ooO0 0.02 0 0.00 0.00 0.00 0.00 0.0000 0.0000 0.03 363 0.29 9.00 5.44 3.71 0.0000 0.0011 0.04 2090 255.43 210.58 84.91 17.55 0.0007 0.0239 0.05 6640 881.20 630.44 188.23 39.13 0.0042 0.1341

From the graph construction rule, we have that every successor of a given pattern is more expensive than that pattern. Furthermore, all immediate predecessors of a column pattern on level j are on level j - 1, and all its immediate successors are on level j + 1. Each node (except for the graph's root) has at least one immediate predecessor, and (except the top node) at least one immediate successor. This yields the following rule for branching and bounding

Rule 2: If all column patterns on a certain level are too expensive, so are all patterns on higher levels: they need not to be checked.

We apply Rule 2 by going through the successive levels of the graph, until a level is found on which all column patterns are too expensive. This, however, requires a way to construct these levels.

It turns out that Theorem 2 makes it easy to generate all labels on level j of the tree in lexicographical order. The first one is found by

starting with a binary string e = 11 . . . 10. . . O consisting of U ones followed by a - U zero, where U is minimal under the condition that S = xeli 2 j. If S = j , e is the first label on level j ; if S > j , the first label on level j is found by inverting coordinate S - j of e.

Example: Let a = 7. To find the first label on level j = 13 of the graph, notice that x:=li = 10 < 13, and c:=li = 15 2 13 SO

U = 5 and S = 15 > 13 = j . Thus, the label results from the binary string 11 11 100 by inverting coordinate S - j = 2, yielding 101 1100.

Having found a label e on level j , the lexicographically next is determined as follows. Reading from left to right, find the first 0 following the second 1 in the label; if such a 0 can not be found, e is the lexicographically last label on level j. Otherwise, replace that 0 by a 1, and replace the bits of e preceding it by zeroes. Add the indices of the ones in the resulting string; let their sum be A. Now set the first U bits of the string to 1, where U is minimal under the

Page 5: On maximum likelihood soft-decision decoding of binary linear codes

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993 201

TABLE IV SIMULATION RESULTS FOR THE [16,2*, 61 EXTENDED PREPARATA CODE (25 000 SIMULATIONS PER U * )

Average Number of Checked Patterns

SD Impl. 2 Impl. 3 Impl. 4 Pg;rd

0.01 0 0.00 0.00 0.00 0.0000 0.02 0 0.00 0.00 0.00 0.0000 0.03 0 0.00 0.00 0.00 0.0000 0.04 56 2.50 1.00 1.00 0.0000 0.05 509 3.34 1.72 1.45 0.0000 0.06 1627 3.58 2.07 1.44 0.0046 0.07 3031 4.42 2.16 1.67 0.0146 0.08 4682 5.00 2.10 1.84 0.0306 0.09 6916 5.63 2.33 1.89 0.0625 0.10 9088 6.37 2.67 2.01 0.9768 0.11 10889 7.02 2.92 2.16 0.1272 0.12 12457 7.84 3.17 2.27 0.1624 0.13 14020 8.36 3.11 2.18 0.1986 0.14 15010 8.86 3.17 2.24 0.2236 0.15 15748 9.69 3.20 2.23 0.2692 0.16 16605 10.21 3.26 2.37 0.3040 0.17 17359 10.73 3.67 2.51 0.3382 0.18 18051 11.41 4.02 2.63 0.3737 0.19 18616 11.69 4.04 2.68 0.4084 0.20 19183 12.06 4.11 2.69 0.4368 0.21 19604 12.26 4.11 2.64 0.4554 0.22 20155 12.50 4.04 2.65 0.4767 0.23 20485 12.92 4.21 2.74 0.4938 0.24 20888 13.06 4.13 2.72 0.5054 0.25 21255 13.01 4.07 2.66 0.5152

condition that S = x:=li 2 j - A. Finally, if S > j - A invert coordinate S - ( j - A).

Example: Let a = 7. The node with label e = 1011100 is on level 13 of the graph. To determine the lexicographically next label on that level, we proceed as follows. The first 0 succeeding the second 1 in e is on position 6. We replace that 0 by a 1, and we replace the first five bits of e by zero, yielding 0000010, and A = 6. As x:=li = 6 < 7 and x;‘=li = 10 < 7 we have U = 4 and S = 10 > 7 = j - A. Setting the first U = 4 bits of 0000010 to 1 yields 1111010. The lexicographical successor of e is now found by inverting coordinate S - (j - A ) = 3 of 1111010, which results in 1101010.

It should be,noted that this implementation, to which we will refer as the graph implementation, will not necessarily decode each possible received word t at least as fast as the binary tree implemen- tation. We refer to Appendix B for simulation results.

Iv. MAKING THE NEW IMPLEMENTATION MORE EFFICIENT It is straightforward that Rule 1 is not only applicable to the binary

tree presented in Section 11, but also to the graph used in the previous section. We will, therefore, proceed as follows. Each time we find a column pattern on level j - 1 that is cheap enough, but that is not an error pattern, we determine and store all its successors on level j . If two such nodes on level j - 1 have the same successor on level j , it is stored only once. For each stored node e we also store the number p e of predecessors it has among the patterns on level j - 1 that are cheap enough.

Theorem 3: Let e be the binary string corresponding to a node on level j in the graph ( j 2 1). Let e (0 l ) denote the number of occurrences of 01 in e, and let e l be the first bit of e. The total number of predecessors of e on level j - 1 in the graph equals te = e(0l) + e l .

We will not prove this theorem: it can easily be derived from the graph construction rule.

Notice that te 2 p e . If te > p e , e must have at least one predecessor that too expensive. From this we can conclude that e is also too expensive, without actually determining the cost of e. If te = p e , all the predecessors of e are cheap enough. This, however, gives no information about the cost of e, and we will have to determine this cost explicitly.

Rule 3: If all column patterns e on a certain level are too expensive, or satisfy te > p e , no patterns on higher levels need to be checked.

It should be noted that this “improved” version of the graph implementation requires a number of column patterns to, be stored, and that this number may (worst case) grow exponentially in the codelength. However, when testing the implementation, the maximum number of column patterns stored appeared to be quite small, even for long codes, and so did the average number of stored patterns. See Appendix B for simulation results of this “improved” implemen- tation.

V. CONCLUSION

In this correspondence, we presented and compared four implemen- tations of the maximum likelihood soft-decision decoding algorithm for binary codes. N o of these implementations are new; their efficiency is compared to the efficiency of two known implementa- tions, a short description of which is given in this paper. Simulation results of the four implementations are presented.

The essential difference between the implementations is the following. The first one ([2]) checks (a subset of) the error patterns. The other implementations check column patterns of which it is not apriori known if they are too expensive. Only if a pattern is cheap enough, the implementation checks whether it is an error pattern. The binary tree implementation uses a binary tree to run through the column patterns; the graph implementation and its “improved”

Page 6: On maximum likelihood soft-decision decoding of binary linear codes

202 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

TABLE V THE AVERAGE RESP. MAXIMUM NUMBER OF STORED ERROR PAlTERNS FOR IMPL. 4 (FOR THE HAMMING

CODES OF LENGTH 31, 63, AND 127, AND FOR THE EXTENDED PREPARATA CODE OF LENGTH 16)

n = 31 n = 63 n = 127 Prep.

U2 avg. max. avg. max. avg. max. avg. max.

0.01 0.00 0.02 0.00 0.03 1.00 0.04 1.28 0.05 1.51 0.06 1.52 0.07 2.30 0.08 3.01 0.09 3.71 0.10 4.27 0.11 4.57 0.12 4.75 0.13 5.26 0.14 5.28 0.15 5.36 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25

0 0 1 2 5

11 11 17 26 30 44 35 37 45 37

0.00 0.00 1.00 1.55 2.33 4.18 6.59 8.79

10.44 11.44

0 0 1 4

17 37 42 53 60 63

0.00 0 0.00 0 1.79 5 3.76 63 7.22 93

0.00 0.00 0.00 1.00 1.11 1.07 1.10 1.17 1.19 1.22 1.24 1.27 1.28 1.28 1.27 1.30 1.33 1.37 1.39 1.40 1.38 1.39 1.41 1.41 1.40

0 0 0 1 2 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 5 4 5 5 5

version use a graph that is designed to make a more efficient use of branching and bounding techniques.

For Hamming codes of length at least 63, the “improved” graph implementation appears to check less column patterns than the one presented in [2]; for Hamming cqdes of length up to 31, Snyders’ implementation appears to be more efficient. This is a consequence of the fact that the branching and bounding rule used in the “improved” graph implementation is based on information resulting from checking (some) column patterns in the graph that are not considered in Snyders’ implementation. Por shorter codes, the graph used in the graph implementation is relatively Small, and it is not efficient to check “superfluous” column patterns in order to increase the strength of the branching and bounding rule. For longer codes, however, the size of the graph increases, and an efficient branching and bounding rule decreases the number of checked column patterns so much that it is worthwhile checking some “superfluous” column patterns initially.

APPENDIX A APPLICATION TO NONLINEAR CODES

The algorithm presented in Section I can be applied to perform maximum likelihood soft-decision decoding of any nonlinear binary code, provided that there is an efficient way of checking whether inverting a number of positions of b yields a codeword. As an example, we consider the class of Preparata codes.

For even m, Preparata codes consist of a linear [2m, 2m - 3m + 11 code n which is contained in the extended Hamming code of length 2 m , together with 2”-’ - 1 cosets of in this extended Hamming code ([7, p. 4661). It is relatively easy to show that codewords of the Preparata code can be characterized by their syndrome with respect to the linear code n, together with some additional requirements. The algorithm will output c’ = b if the syndrome s satisfies these requirements. If s does not satisfy the requirements, the algorithm finds a set of columns of the parity check matrix of n such that inverting b on the positions corresponding to these columns yields

a codeword d , and such that the sum of the costs of these columns is minimal. To find c‘, the binary tree implementation, the graph implementation presented in Section 111, or its “improved” version may be used. We refer to Appendix B for simulation results obtained by applying these implementations to the (16, 2’, 6) Preparata code.

As we do not need to check column patterns of weight more than n - IC, which for the linear code n is equal to 3m - 1 - 3 log R, the number of considered patterns is at most (worst case) C:z ( ) . Thus, the worst case number of checked patterns is asymptotically upper bounded by ( ,o”, .) N 23 log* ”.

APPENDIX B SIMULATION RESULTS

We tested each of the implementations by using it to decode a number of words of Hamming codes of different lengths, the bits of which were disturbed by independent white Gaussian noise with variance n’. Tables I, 11, and I11 display the results for Hamming codes of length 31, 63, and 127. The results of applying the binary tree implementation, the graph implementation presented in Section 111, and its “improved” version to the (16, 2’, 6 ) (nonlinear) Preparata code are shown in Table IV.

The first column of each of the tables shows the considered values of r’; the second column shows how often the syndrome of b was not equal to 0, while at the same time there were at least two columns of which the cost did not exceed CHD. In other words, the number in the second column shows how many times it has been necessary to consider column patterns other than eHD.

The average number of column patterns that has been checked each time nontrivial patterns had to be considered, is displayed in the next columns. “Impl. 1” is the implementation presented in [2]; “Impl. 2” is the binary tree implementation; “Impl 3” is the graph implementation presented in this correspondence, and “Impl. 4” denotes its “improved” version as discussed in Section IV.

The last two columns of Tables I, 11, and 111 show the bit-error probability &: and the word-error probability p z d for each of

Page 7: On maximum likelihood soft-decision decoding of binary linear codes

lccc 1nru~?,~~l1ulva UIY LIYruKlvwIIUIY IHOUKX, VUL. 5Y, NU. I, J A N U A K Y L Y Y j 203

the checked variances. For the Preparata code we only determined the word-error probability, which is displayed in the last column of Table IV. The error probabilities are the same for each of the tested implementations, as they all perform complete soft-decision decoding.

Impl. 4 requires a number of column patterns to be stored. The maximum (max.) and the average (av.) number of patterns that have been stored during the simulation process depend on the code, its length, and on g z . The number of stored column patterns is shown in Table V.

ACKNOWLEDGMENT

The authors wish to thank the anonymous referees for their useful suggestions.

REFERENCES

R. E. Blahut, Theory and Practice of Error Control Codes. Reading, M A Addison-Wesley, 1983. J. Snyders and Y. Be’ery, “Maximum likelihood soft decoding of binary block codes and decoders for the Golay codes,” IEEE Trans. Inform. Theory, vol. 35, pp. 963-975, Sept. 1989. J.H. Conway and H.J.A. Sloane, “Soft decoding techniques for codes and lattices, including the Golay code and the Leech lattice,” IEEE Trans. Inform. Theory, vol. IT-32, pp. 41-50, Jan. 1986. D. Chase, “A class of algorithms for decoding block codes with chan- nel measurement information,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 170-182, Jan. 1972. S. N. Litsyn, “Fast decoding algorithm of Reed-Muller codes,” in Proc. Fourth Joint Swedish-Soviet Int. Workshop Inform. Theory, 1989,

P. A. H. Bours, “Soft decision decoding,” Master’s thesis, Eindhoven Univ. of Technol., Eindhoven, The Netherlands, 1989. F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes.

pp. 288-291.

Amsterdam, The Netherlands: North Holland, 1977.

Bounds on the Trellis Size of Linear Block Codes

Yuval Berger and Yair Be’ery

Abstract- The size of minimal trellis representation of linear block codes is addressed. l k o general upper bounds on the trellis size, based on the zero-concurring codewords and the contraction index of the subcodes are presented. The related permutations for attaining the bounds are exhibited. These bounds evidently improve the previous published general bound. Additional bounds based on certain code con- structions are derived. We focus on the squaring construction and obtain specific constructive bounds for Reed-Muller and repeated-root cyclic codes. In particular, the recursive squaring construction of Reed-Muller codes is explored and the exact minimal trellis size of this design is obtained. Efficient permutations, in the sense of the trellis size, are also demonstrated by using shortening and puncturing methods. The corresponding bounds are specified.

Index Terms-Trellis, maximum-likelihood decoding, soh-decision de- coding, zero-concurring codewords, contraction index.

Manuscript received June 12,1991; revised January 6, 1992. This work was presented in part at the 21st Annual IEEE Communication Theory Workshop, Rhodes, Greece, June 30-July 6, 1991.

The authors are with the Department of Electrical Engineering-Systems, Tel-Aviv University, Tel-Aviv 69978, Israel.

IEEE Log Number 9203030.

I. INTRODUCTION Trellis diagrams have been traditionally exploited for decoding

convolutional codes. Wolf introduced in [ l ] a trellis design for representing linear block codes. The Viterbi algorithm was applied then for soft decision decoding of these codes. Forney [2] further analyzed the trellis diagram and utilized it for efficient decoding of some block codes. Muder [3] provided a formalization of the trellis design for block codes based on graph-theoretic approach.

Let c = (cl, c2, . . . , e n ) be an n-tuple over GF(q). Denote e:) 4 (CI,C~,..~,C~), e$-’) 2 (cz,c,+l :... cn) , z = 1 , 2 . . . . , n. For an (n , IC) code C over GF(q), let CL’) be the linear code which consists of e$) for all c such that c E C, c y ) = ( O , O , . . . ,0). Similarly, let Cy) be the linear code that consists of cy’ for all c such that c E C, e t ) = (0, 0, . . . , 0). Denote the dimensions of these codes by p , and f i respectively. Define po 4 0, f n e 0. The trellis diagram for C consists of n + 1 levels I<, i = 0,1, . . . , n. Each level includes 11; I states. VO and V, consist of a single state each, referred as the initial and final states, respectively. Each branch in the trellis connects states from successive levels and labeled by 0,1, . . . , q - 1. A path from level VO to level K represents e$) for some c E C . A trellis is called minimal if (V,( is minimal among all trellis representations of C for z = 0,1, . . . , n. The dimension of the state space at level i for minimal trellis is denoted by s, A log, IKI. It is well known [2 ] , [3] that the general trellis design for linear codes described by Forney [2] is minimal a d unique, and st is identical in both C and its dual code. Hence, for a minimal trellis

The minimal trellis size index, defined as s max(s,), depends on the order of the code coordinates [2 ] , [3], i.e., the order of the columns in the generator matrix G or the parity check matrix H . The operation of reordering the columns of G or H is referred in the sequel as permutation. A slightly different parameter s ( C ) is defined as the minimal attainable s index for any permutation of C , namely for any equivalent code of C. s(C) is called the absolute minimal trellis size. A trellis of which s = s (C) is called an absolute minimal trellis.

Clearly s ( C ) reflects the decoding complexity of the code with respect to general decoding algorithms such as in [l]. Forney [2] recently introduced an alternative approach where the trellis is decoded in a section-by-section strategy. The sections themselves are efficiently decoded by a precomputation stage based on some type of “fast algorithms” schemes. The decoding complexity of such algorithm depends on the dimension of the state space sz at the sections boundaries and the branch complexity in the sections. However, the choice of the sections boundaries for efficient decoding is involved with the code structure, such as in the four-section design of Reed-Muller (RM) codes [2], and generally unknown especially for long codes. Nevertheless, it appears from [2] that “good” permutations, in sense of s ( C ) (which reflects the branch complexity at every bit level), are also suitable for section-by- section decoding, e.g., RM codes and the Golay (24,12) code. Such permutations may as well be a key for designing an efficient trellis-based decoder. The corresponding trellis size parameter is “an intuitive measure of the decoding complexity of the code and appears to be a fundamental descriptive characteristic” [3]. It can be used for either comparing different permutations of the same code or

1

0018-9448/93$03.00 0 1993 IEEE