joint source-channel turbo coding for binary markov sources

11
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006 1065 Joint Source-Channel Turbo Coding for Binary Markov Sources Guang-Chong Zhu, Member, IEEE and Fady Alajaji, Senior Member, IEEE Abstract— We investigate the construction of joint source- channel (JSC) Turbo codes for the reliable communication of binary Markov sources over additive white Gaussian noise and Rayleigh fading channels. To exploit the source Markovian redundancy, the first constituent Turbo decoder is designed according to a modified version of Berrou’s original decoding algorithm that employs the Gaussian assumption for the ex- trinsic information. Due to interleaving, the second constituent decoder is unable to adopt the same decoding method; so its extrinsic information is appropriately adjusted via a weighted correction term. The Turbo encoder is also optimized according to the Markovian source statistics and by allowing different or asymmetric constituent encoders. Simulation results demonstrate substantial gains over the original (unoptimized) Turbo codes, hence significantly reducing the performance gap to the Shannon limit. Finally, we show that our JSC coding system considerably outperforms tandem coding schemes for bit error rates smaller than 10 -4 , while enjoying a lower system complexity. Index Terms— Joint source-channel coding, turbo codes, AWGN and Rayleigh fading channels, Shannon limit, Markov sources, iterative decoding, bit error rate. I. I NTRODUCTION T HE fundamental goal of a communication system is to efficiently and reliably transmit information data from a source to a destination over a physical channel where noise distortion may occur. For the sake of efficiency, the source output, which often contains a substantial amount of redundancy, is usually compressed prior to transmission. This data compression procedure (also known as source coding), renders the sources more vulnerable to noise corruption. In order to make the transmission reliable, channel coding is usually employed to combat the noise by adding controlled redundancy to the data. Traditionally, source and channel coding are implemented independently, resulting in the so-called tandem coding scheme. 1 Furthermore, in almost all the theory and practice Manuscript received December 9, 2003; revised September 29, 2004 and March 8, 2005; accepted April 9, 2005. The associate editor coordinating the review of this paper and approving it for publication was R. Negi. This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Premier’s Research Excellence Award (PREA) of Ontario. Part of this work was presented at the Canadian Workshop on Information Theory, Waterloo, Ontario, Canada, May 2003. Guang-Chong Zhu is with the Dept. of Mathematics and Computer Science, Lawrence Technological University, 21000 West Ten Mile Road, Southfield, MI, USA (e-mail: [email protected]). Fady Alajaji is with the Dept. of Mathematics and Statistics and Dept. of Electrical and Computer Engineering, Queen’s University, Kingston, Ontario, Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/TWC.2006.05014 1 This is justified by Shannon’s well-known separation principle (or information-transmission theorem) [23] for single-user communication sys- tems which states that there is no loss of performance optimality if separately designed optimal source and channel codes are implemented in tandem as opposed to designing the codes jointly or using a single optimal joint source- channel code. This result inherently assumes that unlimited encoding/decoding delay and complexity are available at the transmitter and the receiver. of error-control coding, the data at the input of the channel encoder is assumed to be uniform and memoryless. Obviously, this is true only when the source coding part is optimal; that is, the source-coded bit stream contains zero redundancy. However, in reality, most existing source coding schemes are only sub-optimal (particularly fixed-length source codes), resulting in a certain amount of residual redundancy in the compressed bit stream. For example, the 2.4 kbits/s US Federal Standard MELP speech vocoder produces pitch and gain para- meters that contain 43 to 52.7% of residual redundancy due to non-uniformity and memory [14]. For uncompressed sources such as image and speech signals, the natural redundancy can be much higher. Therefore, transmission of sources with a considerable amount of natural or residual redundancy is an important issue. Several studies (e.g., [2], [3], [14], [22], [25], [28]–[31], [34], etc.) have shown that when the source redundancy is exploited in the channel coding design, the system performance can be significantly improved. The introduction of Turbo codes in 1993 [7] is regarded as one of the most significant achievements in channel coding. Since then, there has been a large number of publications regarding various aspects of Turbo codes. The original work by Berrou et al. demonstrated excellent performance of Turbo codes for the transmission of uniform memoryless sources over additive white Gaussian noise (AWGN) channels [7], [8]. Turbo codes were later extended to Rayleigh fading channels showing comparable performance [21]. However, most Turbo coding studies assume that the input stream to the encoder is a uniform memoryless source. To the best of our knowledge, only limited attention has been paid to the problem of exploiting the source redundancy in the Turbo coding context. This is in essence a joint source-channel (JSC) coding issue. The design of (systematic and non-systematic) Turbo codes for the transmission of non-uniform memoryless sources has been recently studied in [35], [36], where close to Shannon limit performance was achieved. For the scenario of sources with memory, Garcia-Frias et al. [15], [16] inves- tigated the design of Turbo codes for Hidden Markov sources over AWGN channels. Also in [14], Fazel et al. considered employing Turbo codes for the MELP speech coder output whose parameters are modeled by Markov chains. In [15], [16] and [14], the modifications to the BCJR algorithm [4] are only proposed for the first constituent decoder. Due to interleaving, the second constituent decoder remains the same as for memoryless sources. Another related work is by Cai et al. [11], who proposed a simple modification to the extrinsic information terms of both constituent decoders; but their de- coding algorithm remains the same as for uniform memoryless sources. Other related source-channel coding works based on the Turbo coding principle include [1], [17], [20]. 1536-1276/06$20.00 c 2006 IEEE

Upload: f

Post on 10-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joint source-channel turbo coding for binary Markov sources

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006 1065

Joint Source-Channel Turbo Codingfor Binary Markov Sources

Guang-Chong Zhu, Member, IEEE and Fady Alajaji, Senior Member, IEEE

Abstract— We investigate the construction of joint source-channel (JSC) Turbo codes for the reliable communication ofbinary Markov sources over additive white Gaussian noise andRayleigh fading channels. To exploit the source Markovianredundancy, the first constituent Turbo decoder is designedaccording to a modified version of Berrou’s original decodingalgorithm that employs the Gaussian assumption for the ex-trinsic information. Due to interleaving, the second constituentdecoder is unable to adopt the same decoding method; so itsextrinsic information is appropriately adjusted via a weightedcorrection term. The Turbo encoder is also optimized accordingto the Markovian source statistics and by allowing different orasymmetric constituent encoders. Simulation results demonstratesubstantial gains over the original (unoptimized) Turbo codes,hence significantly reducing the performance gap to the Shannonlimit. Finally, we show that our JSC coding system considerablyoutperforms tandem coding schemes for bit error rates smallerthan 10−4, while enjoying a lower system complexity.

Index Terms— Joint source-channel coding, turbo codes,AWGN and Rayleigh fading channels, Shannon limit, Markovsources, iterative decoding, bit error rate.

I. INTRODUCTION

THE fundamental goal of a communication system is toefficiently and reliably transmit information data from

a source to a destination over a physical channel wherenoise distortion may occur. For the sake of efficiency, thesource output, which often contains a substantial amount ofredundancy, is usually compressed prior to transmission. Thisdata compression procedure (also known as source coding),renders the sources more vulnerable to noise corruption. Inorder to make the transmission reliable, channel coding isusually employed to combat the noise by adding controlledredundancy to the data.

Traditionally, source and channel coding are implementedindependently, resulting in the so-called tandem codingscheme.1 Furthermore, in almost all the theory and practice

Manuscript received December 9, 2003; revised September 29, 2004 andMarch 8, 2005; accepted April 9, 2005. The associate editor coordinatingthe review of this paper and approving it for publication was R. Negi. Thiswork was supported in part by the Natural Sciences and Engineering ResearchCouncil (NSERC) of Canada and the Premier’s Research Excellence Award(PREA) of Ontario. Part of this work was presented at the Canadian Workshopon Information Theory, Waterloo, Ontario, Canada, May 2003.

Guang-Chong Zhu is with the Dept. of Mathematics and Computer Science,Lawrence Technological University, 21000 West Ten Mile Road, Southfield,MI, USA (e-mail: [email protected]).

Fady Alajaji is with the Dept. of Mathematics and Statistics and Dept. ofElectrical and Computer Engineering, Queen’s University, Kingston, Ontario,Canada (e-mail: [email protected]).

Digital Object Identifier 10.1109/TWC.2006.050141This is justified by Shannon’s well-known separation principle (or

information-transmission theorem) [23] for single-user communication sys-tems which states that there is no loss of performance optimality if separatelydesigned optimal source and channel codes are implemented in tandem asopposed to designing the codes jointly or using a single optimal joint source-channel code. This result inherently assumes that unlimited encoding/decodingdelay and complexity are available at the transmitter and the receiver.

of error-control coding, the data at the input of the channelencoder is assumed to be uniform and memoryless. Obviously,this is true only when the source coding part is optimal;that is, the source-coded bit stream contains zero redundancy.However, in reality, most existing source coding schemesare only sub-optimal (particularly fixed-length source codes),resulting in a certain amount of residual redundancy in thecompressed bit stream. For example, the 2.4 kbits/s US FederalStandard MELP speech vocoder produces pitch and gain para-meters that contain 43 to 52.7% of residual redundancy due tonon-uniformity and memory [14]. For uncompressed sourcessuch as image and speech signals, the natural redundancycan be much higher. Therefore, transmission of sources witha considerable amount of natural or residual redundancy isan important issue. Several studies (e.g., [2], [3], [14], [22],[25], [28]–[31], [34], etc.) have shown that when the sourceredundancy is exploited in the channel coding design, thesystem performance can be significantly improved.

The introduction of Turbo codes in 1993 [7] is regarded asone of the most significant achievements in channel coding.Since then, there has been a large number of publicationsregarding various aspects of Turbo codes. The original workby Berrou et al. demonstrated excellent performance of Turbocodes for the transmission of uniform memoryless sourcesover additive white Gaussian noise (AWGN) channels [7],[8]. Turbo codes were later extended to Rayleigh fadingchannels showing comparable performance [21]. However,most Turbo coding studies assume that the input stream tothe encoder is a uniform memoryless source. To the best ofour knowledge, only limited attention has been paid to theproblem of exploiting the source redundancy in the Turbocoding context. This is in essence a joint source-channel (JSC)coding issue. The design of (systematic and non-systematic)Turbo codes for the transmission of non-uniform memorylesssources has been recently studied in [35], [36], where closeto Shannon limit performance was achieved. For the scenarioof sources with memory, Garcia-Frias et al. [15], [16] inves-tigated the design of Turbo codes for Hidden Markov sourcesover AWGN channels. Also in [14], Fazel et al. consideredemploying Turbo codes for the MELP speech coder outputwhose parameters are modeled by Markov chains. In [15],[16] and [14], the modifications to the BCJR algorithm [4]are only proposed for the first constituent decoder. Due tointerleaving, the second constituent decoder remains the sameas for memoryless sources. Another related work is by Cai etal. [11], who proposed a simple modification to the extrinsicinformation terms of both constituent decoders; but their de-coding algorithm remains the same as for uniform memorylesssources. Other related source-channel coding works based onthe Turbo coding principle include [1], [17], [20].

1536-1276/06$20.00 c© 2006 IEEE

Page 2: Joint source-channel turbo coding for binary Markov sources

1066 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006

In this work, we consider stationary ergodic binary first-order Markov sources. We investigate the design of (system-atic) Turbo codes for transmitting such Markov sources overbinary phase-shift keying (BPSK) modulated AWGN channelsand Rayleigh fading channels with known channel state infor-mation. One main goal is to obtain the best performance thatis as close to the Shannon limit as possible. The proposedframework can be extended to high-order Markov sources byusing non-binary Turbo codes. Our contributions regardingdecoding and encoding design are as follows.

• According to the general principle of iterative decod-ing [7], [24], after decomposing the log-likelihood ratio(LLR) generated by the decoder of each constituentcode, the channel transition term and the a priori termshould not be passed on to the other constituent decoder;otherwise, performance degradation would occur. Whenthe BCJR algorithm is modified for the first decoder toincorporate the Markovian property of the source, theLLR can only be decomposed into two separate terms(instead of three as in the case of memoryless sources),where the new extrinsic information contains the a prioriterm. Furthermore, the extrinsic information generatedfrom the second constituent decoder cannot be used in thesame fashion as the a priori term for the first constituentdecoder to “update” the new extrinsic information.In the first Turbo coding paper by Berrou et al. [7],the original BCJR decoding algorithm was relativelycomplex to implement and, as a result, the extrinsicinformation was approximated by a Gaussian distributionand modeled as an additional input sequence to the otherconstituent decoder. Later Robertson [24] improved andsimplified Berrou’s BCJR algorithm, and his techniquehas been widely adopted ever since. We however hereinobserve that, for the Turbo coding of Markov sources,Berrou’s original method with the Gaussian assumptionis indeed a valid solution (to the above issues) for thefirst constituent decoder.

• Due to interleaving, the second constituent decoder isunable to adopt the above method; it thus employsRobertson’s decoding technique. Nevertheless, we at-tempt to exploit the source memory by modifying itsextrinsic information term using a weighted correctionfactor involving the Markov source statistics. Our pro-posed technique, is an improvement of an earlier methodof Cai et al. [11].

• Since the two constituent decoders employ different de-coding algorithms and thus have different error-correctingcapabilities, a symmetric Turbo code (i.e., a code withidentical constituent encoders) may not be the best struc-ture. We herein allow the use of asymmetric encoders.Furthermore, as revealed in [35], [36], optimization withrespect to the choice of the tap coefficients in each con-stituent encoder plays an important role vis-a-vis systemperformance. Therefore, for a given source transitionprobability matrix, we optimize the encoder structure tofurther improve the performance.

Our resulting JSC Turbo coding system demonstrates sub-stantial gains (from 0.45 up to 3.57 dB) over the original Turbo

codes that do not exploit the source statistics at both encoderand decoder. Furthermore, it performs within 0.73 to 1.45 dBfrom the Shannon limit for symmetric Markov sources, andapproximately within 0.64 to 1.67 dB for asymmetric Markovsources.

In relation to previous works in [15], [16], our approachregarding encoding/decoding design is different in the fol-lowing aspects: (a) we solve the design problem of the firstconstituent decoder in order to exploit the source memoryusing Berrou’s [7] original Gaussian approximation methodfor the extrinsic information, while [15], [16] employ a trellisbased generalized version of Robertson’s method [24]; (b)we attempt to exploit (although to a limited extent) thesource redundancy in the second constituent decoder in spiteof the use of interleaving; (c) [15], [16] do not optimizethe encoding structure since a fixed symmetric encoder isemployed. However, it is important to point out that [15], [16]consider a wider class of sources with memory, hidden Markovsources. Furthermore [16] introduces a new “universal” Turbodecoder that does not require advance knowledge of either thehidden Markov source statistics or the channel noise variance.The decoder instead iteratively estimates the source parametersand the channel noise power, and it is shown to produce nonoticeable performance degradation.

Finally, we compare, for identical transmission rates, theperformance of our JSC coding system with that of twotandem coding schemes, which employ an 8th-order Huffmancode followed by a standard Turbo code. The 8th-orderHuffman code performs near optimal data compression whilethe standard Turbo code is chosen such that it gives eitheran excellent water-fall bit error rate (BER) performance, or alower error-floor performance with a slightly worse water-fallperformance. It is shown that the tandem schemes inevitablysuffer from a high error-floor effect for medium to high signal-to-noise ratios (SNRs), while our less complex JSC codingsystem offers a robust and substantially superior performance(for BERs ≤ 10−4).

The rest of the paper is organized as follows. In Section II,we give a brief description of the system. In Section III,Turbo decoder design for Markov sources is discussed. Turboencoder optimization is then described in Section IV. InSection V, we address the determination of the Shannon limitfor the transmission of binary Markov sources over AWGNand Rayleigh fading channels. Simulation results are presentedin Section VI. In Section VII, we compare the performanceof our system with two tandem coding schemes. Finally,concluding remarks are given in Section VIII.

II. SYSTEM DESCRIPTION

We first describe the JSC Turbo coding system for thetransmission of binary Markov sources over noisy channels.We consider a stationary ergodic binary first-order Markovsource {Uk}, whose transition probabilities are described bythe transition matrix:

Π = [πij ] =[

q0 1− q01− q1 q1

],

where the transition probability is defined by

πij�=Pr{Uk = j|Uk−1 = i}, i, j ∈ {0, 1}.

Page 3: Joint source-channel turbo coding for binary Markov sources

ZHU AND ALAJAJI: JOINT SOURCE-CHANNEL TURBO CODING FOR BINARY MARKOV SOURCES 1067

I X

kX 1p

2p

k

X ks

BPSK mod.

BPSK

BPSK

mod.

mod.

UkChannel

TurboEncoder

W

W sk

1pkW

2pk

I

I −1

I

Enc 1

Enc 2

Dec 1 Dec 2

Yk

1p

Yks

2pYk

TurboDecoder

^Uk

Fig. 1. Block diagram of the system (I and I−1 stand for the interleavingand de-interleaving operations, respectively).

We denote the marginal distribution of the source as (p0, p1)where p0

�= Pr{Uk = 0}�=1 − p1. By stationarity, it can be

easily shown that p0 and p1 are given by the source stationarydistribution:

p0 = 1− p1 =1− q1

2− q0 − q1 . (1)

In general, the source described above is an asymmetricMarkov source (i.e., with q0 �= q1); its redundancy is inthe form of both memory and non-uniformity. When q0 =1 − q1, the source reduces to a memoryless source withmarginal distribution given by p0 = q0 and p1 = q1; in thiscase, the source redundancy is purely in the form of non-uniformity. When q0 = q1 �= 1/2, the source becomes asymmetric Markov chain with a uniform marginal distribution,i.e., p0=p1=1/2; in this case, the source redundancy is strictlyin the form of memory. When the source is symmetric, we

write q0 = q1�=q.

A Turbo code encoder consists of two systematic recursiveconvolutional encoders in parallel concatenation and linkedby an interleaver, which re-arranges the information sequenceinto a different order. In general, there can be more than twoconstituent encoders and they do not have to be identical.Each constituent encoder has memory size m; therefore, thenumber of possible encoder states is 2m. If at time k, thefirst constituent encoder is in state Sk, then an input bit Uk

brings the encoder state into Sk+1, generating a parity bit X1pk

(for the second constituent decoder, the parity bit is X2pk ). To

keep the notation consistent, we denote the systematic bit asXs

k , which is identical to Uk. The pair (Xsk, X

1pk ) is denoted

by Xk, and after transmission over the channel, it becomesYk = (Y s

k , Y1pk ), the noise corrupted version. We denote L-

tuples by Y L1

�=(Y1, Y2, · · · , YL), and we use capital letters for

random variables and lower case letters for their realizations.Corresponding to the two parallel concatenated constituent

encoders, a Turbo code decoder has two constituent decodersin serial concatenation, separated by an interleaver which isidentical to the one used in the encoder. In the feedback loopfrom the output of the second constituent decoder to the inputof the first constituent decoder, a de-interleaver is used topermute the sequence back into the original order.

Our JSC coding system is very simple. As depicted inFig. 1, a binary Markov source {Uk} is directly fed intothe Turbo encoder without source encoding. The Turbo codedsequence {Xk} is then BPSK-modulated as {Wk} (under anaverage symbol energy of Es = 1) before transmission overthe channel. The channel model considered here can be either

an AWGN channel or a Rayleigh fading channel described by

Yk = AkWk +Nk, k = 1, 2, · · · , L,where {Nk} is a memoryless or independent and identicallydistributed (i.i.d.) Gaussian noise source with zero mean andvarianceN0/2. For the AWGN channel,Ak = 1, k = 1, 2, · · · ,while for the Rayleigh fading channel, the amplitude fadingprocess {Ak} (also known as the channel state information)is assumed to be i.i.d. and Rayleigh distributed. We assumethat {Ak} is known at the decoder, and that Ak , Wk, andNk are independent of each other. At the receiver end, Turbodecoding is applied on the received sequence {Yk} to yieldthe decoded sequence {Uk}.

III. TURBO DECODER DESIGN FOR MARKOV SOURCES

A. BCJR Algorithm for I.I.D. Sources

For the sake of clarity and completeness, we first brieflydescribe the BCJR algorithm for i.i.d. sources. There areseveral slightly different versions of the BCJR algorithm in theliterature; the one we present here is due to Robertson [24].

To decode {Uk}Lk=1, based on the observation of the wholereceived sequence Y L

1 , each constituent decoder evaluates theconditional log-likelihood ratio (LLR) for every bit Uk definedby

Λiid(Uk) = logPr{Uk = 1|Y L

1 = yL1 }

Pr{Uk = 0|Y L1 = yL

1 }.

For memoryless sources and memoryless channels, the LLRcan be decomposed into three separate terms:

Λiid(Uk) = Lch(Uk) + Liidex (Uk) + Liid

ap (Uk), (2)

which are called the channel transition term, the extrinsicinformation term, and the a priori term, respectively. Theirexpressions are given as follows.

Lch(Uk) = logp{Y s

k = ysk|Uk = 1}

p{Y sk = ys

k|Uk = 0} ,

Liidex (Uk) = log

∑s

∑s′ γ(yp

k, 1, s|s′)αk−1(s′)βk(s)∑s

∑s′ γ(yp

k, 0, s|s′)αk−1(s′)βk(s),

Liidap (Uk) = log

Pr{Uk = 1}Pr{Uk = 0} ,

where p(·) denotes a probability density function, and αk(s),βk(s) and γ(yk, i, s|s′) are defined by

αk(s)�= Pr{Sk = s|Y k

1 = yk1},

βk(s)�=

p{Y Lk+1 = yL

k+1|Sk = s}p{Y L

k+1 = yLk+1|Y k

1 = yk1},

γ(yk, i, s|s′) �= p{Yk = yk, Uk = i, Sk = s|Sk−1 = s′}.

It is easy to show that αk(s) and βk(s) can be computed viathe following recursive relations:

αk(s) =∑

s′∑1

i=0 γ(yk, i, s|s′)αk−1(s′)∑s

∑s′∑1

i=0 γ(yk, i, s|s′)αk−1(s′),

βk(s) =∑

s′∑1

i=0 γ(yk+1, i, s′|s)βk+1(s′)∑

s

∑s′∑1

i=0 γ(yk+1, i, s′|s)αk(s).

To solve the above forward and backward recursions, properlydefined boundary conditions are needed; they are determined

Page 4: Joint source-channel turbo coding for binary Markov sources

1068 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006

by the initial and terminal states of the encoder. Convention-ally, the encoder starts and terminates at the all-zero state.However, as we are using very long sequences in this work,we leave the encoder unterminated since there is no noticeableperformance loss in comparison with the terminated case.Suppose the encoder has memory size m, then the final statecan be any one of the total 2m possible states. Therefore, theboundary conditions are

α0(0) = 1, α0(s) = 0, ∀ s �= 0,

βL(s) =1

2m, ∀ s = 0, 1, · · · , 2m − 1.

In iterative decoding, only Liidex (Uk) is passed on to the

next constituent decoder. After being re-arranged into theappropriate order by the interleaver (or the de-interleaver),the extrinsic information generated by one constituent decoderbecomes the a priori information for the other constituentdecoder. The reliability of the LLR soft decision improvesover the iterations, and the iterative decoding procedure mayterminate when the improvement becomes negligible, or thenumber of iterations exceeds a pre-determined threshold.

B. First Constituent Decoder

To exploit the source memory, the BCJR algorithm em-ployed in the Turbo code decoder has to be modified. Themodified BCJR algorithm in this section is derived for system-atic Turbo codes; however, the modifications can be extendedto non-systematic Turbo codes [36].

Given that the L-tuple Y L1 = yL

1 is received at the channeloutput, we have the equation at the top of the next page, wherethe summation for s is over all 2m possible states. Noting thatconditioned on Uk = i and Sk = s, the observation yL

k+1 doesnot depend on yk

1 , we have that

p(yLk+1|i, s, yk

1 ) = p(yLk+1|i, s).

Now if we define

αk(i, s)�= Pr{Uk = i, Sk = s|Y k

1 = yk1}, (3)

βk(i, s)�=

p(yLk+1|i, s)

p(yLk+1|yk

1 ), (4)

then the conditional probability becomes

Pr{Uk = i|Y L1 = yL

1 } =∑

s

αk(i, s)βk(i, s). (5)

Note that now αk(·, ·) and βk(·, ·) are functions of both theencoder state s and the input bit i. The exploitation of bitwisedependency is evident in the following recursions:

αk(i, s) =

∑i′,s′ γ(i, s, yk|i′, s′)αk−1(i′, s′)∑

i,s

∑i′,s′ γ(i, s, yk|i′, s′)αk−1(i′, s′)

, (6)

βk(i, s) =

∑i′,s′ γ(i′, s′, yk+1|i, s)βk+1(i′, s′)∑

i,s

∑i′,s′ γ(i′, s′, yk+1|i, s)αk(i, s)

, (7)

where (8), located at the top of the next page.Combining the marginal distribution of the Markov source

as given in (1), the boundary conditions for computing αk(·, ·)

and βk(·, ·) are now

α0(0, 0) = p0, α0(1, 0) = p1, (9)

α0(i, s) = 0, i = 0, 1, ∀ s �= 0, (10)

βL(0, s) =p0

2m, βL(1, s) =

p1

2m, ∀ s. (11)

Note that in the above boundary conditions, the sourceredundancy in the form of non-uniformity can be exploitedwhen p0 �= 1/2.

For iterative decoding, the LLR needs to be decomposed.Using (5)-(8), we can decompose the decoder’s LLR of Uk

into two separate terms:

Λ(1)k (Uk)

�= log

Pr{Uk = 1|Y L1 = yL

1 }Pr{Uk = 0|Y L

1 = yL1 }

= Lch(Uk)+L(1)ex (Uk),

(12)where

Lch(Uk) = logp(ys

k|1)p(ys

k|0)(13)

and (14), located at the top of the next page.If we compare with the decomposition of the LLR in the caseof Turbo coding of uniform i.i.d. sources in (2) where

Λiidk (Uk) = Lch(Uk) + Liid

ex (Uk) + Liidap (Uk),

we observe that in (12), the extrinsic information termL

(1)ex (Uk) is in essence the combination of Liid

ex (Uk) andLiid

ap (Uk), and Liidap (Uk) is actually the extrinsic information

generated from the second constituent decoder. However, animportant principle of iterative decoding is that the estima-tion generated by a constituent decoder should not be fedback to itself, otherwise the noise corruption will be highlycorrelated [7]. Thus, the decomposition in (12) renders thefirst constituent decoder unable to “update” its a priori termby using the extrinsic information generated from the secondconstituent decoder in the same way as in the memorylesssource case. Therefore, the extrinsic information term for thefirst constituent decoder must be further modified.

Soon after the debut of Turbo codes in 1993, Robertson [24]improved Berrou’s version of the BCJR algorithm in severalaspects. His algorithm (which is described in Section III-A)is less complex and uses a different method to exchange theextrinsic information between the two constituent decoders.However, we observe that our decomposition in (5) is moresimilar to Berrou’s version than Robertson’s, and that Berrou’smethod with the Gaussian assumption actually provides asuitable solution for the above decomposition problem.

By using Berrou’s method, the input to the first decoder hasnow three components:

yk = (ysk, y

1pk , y

ex(2)k ),

where yex(2)k is the extrinsic information term from the second

constituent decoder, L(2)ex (Uk), after de-interleaving. Using the

Gaussian assumption on yex(2)k as in [7], [12], [13], we obtain

that in (8), γ(i, s, yk|i′, s′) has an additional factor in itsexpression; the factor is described by the following density

p(yex(2)k |i) =

1√2πσ2

e− [yex(2)k

−(2i−1)M]22σ2 , i = 0, 1,

Page 5: Joint source-channel turbo coding for binary Markov sources

ZHU AND ALAJAJI: JOINT SOURCE-CHANNEL TURBO CODING FOR BINARY MARKOV SOURCES 1069

Pr{Uk = i|Y L1 = yL

1 } =∑

s

Pr{Uk = i, Sk = s|Y L1 = yL

1 }

=∑

s

p{Uk = i, Sk = s, Y k1 = yk

1 , YLk+1 = yL

k+1}p(yk

1 , yLk+1)

=∑

s

p{Uk = i, Sk = s, Y k1 = yk

1} · p(yLk+1|i, s, yk

1 )p(yk

1 ) · p(yLk+1|yk

1 )

γ(i, s, yk|i′, s′) �= p{Uk = i, Sk = s, Yk = yk|Uk−1 = i′, Sk−1 = s′}= p(ys

k|i)p(ypk|i, s)Pr{Sk = s|Uk = i, Sk−1 = s′}Pr{Uk = i|Uk−1 = i′}

�= p(ys

k|i) · p(ypk|i, s) · q(s|i, s′) · πi′i (8)

L(1)ex (Uk) = log

∑s

∑i′,s′ p(yp

k|1, s)q(s|1, s′)αk−1(i′, s′)βk(1, s)πi′1∑s

∑i′,s′ p(yp

k|0, s)q(s|0, s′)αk−1(i′, s′)βk(0, s)πi′0(14)

where M and σ2 are the estimated mean and variance obtainedvia an on-line estimation as follows:

M =1L

L∑i=1

|yi|,

σ2 =1

L− 1

L∑i=1

(|yi| − M)2.

Thus, in the forward-backward recursion, both αk(i, s) andβk(i, s) are computed according to this modification.

Finally, in the decomposition of Λ(1)(Uk), due to inter-leaving, yex(2)

k can be regarded as weakly correlated with ysk

and y1pk ; therefore, the LLR soft-output generated by the first

decoder is:

Λ(1)k = Lch(Uk) + L(1)

ex (Uk) + L(1)ap (Uk),

where the a priori term becomes

L(1)ap (Uk)

�= log

p(yex(2)k |1)

p(yex(2)k |0)

=2Mσ2

yex(2)k .

At this point, we can use the extrinsic information L(1)ex (Uk)

as the a priori term for the second constituent decoder.In [14], Fazel et al. consider designing a Turbo code for the

2400 bps MELP speech coder’s output modeled by non-binaryMarkov chains. However, in [14], both αk(·) and βk(·) havethe same forms as for memoryless sources (see Section III-A),and only γ is changed to take account of the source Markovianproperty. Therefore, the source memory is not fully exploitedby such modifications. Our approach is in the same spiritas in [15], [16], where Garcia-Frias et al. consider the issueof using Turbo codes for Hidden Markov sources (althoughin [15], [16], the modification of the extrinsic information in(14) is not explicitly addressed, to the best of our knowledge).

C. Second Constituent Decoder

As mentioned in [14], [15], due to interleaving, the Markov-ian property in the input sequence of the second constituent

decoder is destroyed. This renders the second constituent de-coder unable to adopt the same modifications as in Section III-B. Therefore for this decoder, we employ Robertson’s BCJRalgorithm for i.i.d. sources presented in Section III-A, with theexception that we alter the extrinsic information, as describedbelow.

In [11], Cai et al. proposed a different modification inan attempt to exploit the source memory. In their scheme,the original BCJR algorithm is employed in both decoders;however, the extrinsic information probability term Pex isupdated by adopting a modified probability according to:

P ′ex(uk) =

12

⎡⎣Pex(uk) +

∑uk−1

πuk−1ukPex(uk−1)

⎤⎦ ,

uk = 0, 1.

Although this method is simple to implement, it suffers fromthe problem that the above modified probabilities may notsum up to 1 due to round-up errors. This may cause an errorpropagation in the decoder. Furthermore, the reliability of thecorrection term is unknown; therefore, putting equal weightson the original probability and the correction term is ques-tionable. Thus, we propose the following modification to theextrinsic information generated from the second constituentdecoder:

L(2)ex (Uk) = c1L

iidex (Uk) + c2 log

[∑i πi1Pr{Uk−1 = i}∑i πi0Pr{Uk−1 = i}

],

(15)where {Uk} refers to the original uninterleaved source se-quence, Liid

ex (·) is the extrinsic information defined in Sec-tion III-A and c1, c2 ∈ [0, 1] with c1 + c2 = 1. The valuesof c1 and c2 are empirically chosen to yield the best possibleimprovement. Pr{Uk−1 = i} (with i=0, 1) is directly obtainedfrom the extrinsic information described by

Pr{Uk−1 = 0} = 1− Pr{Uk−1 = 1} =1

1 + eLex(Uk−1),

for k ≥ 2 with

Pr{U1 = 0} = p0 = 1− Pr{U1 = 1}.

Page 6: Joint source-channel turbo coding for binary Markov sources

1070 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006

IV. ENCODER STRUCTURE OPTIMIZATION

When the Markov source distribution is biased, the inputsequence to the encoder would contain long segments of1s or 0s. To see how the encoder structure may affect theperformance, we first consider sources whose transitionalprobability distribution is extremely biased. For example, a“1” followed by a very long segment of 0s. Suppose we areusing Berrou’s (37, 21) encoder [8], which starts at the all-zerostate 0000, where each digit represents the content of eachshift register. The encoder would remain in this state until thefirst “1” arrives, which would cause a transition to state 1000.The following state transitions would be 1100 → 0110 →0011 → 0001 → 1000, and from then on the state transitionwould be confined in a short cycle consisting of the above 5states while all the other states remain dormant. Similarly, along segment of 1s will also drive the encoder circulatingamong five states. This clearly indicates that the encodermemory is not fully exploited. In fact, it is equivalent to anencoder with smaller memory size. For such extremely biasedsources, it is easy to see that an encoder with feedback 31can excite all 16 states, and thus is intuitively a better choicethan one with feedback 37. As the source gets less biased, theadvantage of adopting an encoder with longer cycles becomesless obvious. For non-uniform i.i.d. sources, we showed in [36]that certain feed-forward polynomials are more desirable thanothers for a chosen feedback polynomial; however, the prooffor the Markov source case is not obvious. Nevertheless, itis evident that encoder optimization plays an important rolein determining the system performance. Since we employdifferent decoding methods in the constituent decoders, theyexploit the source redundancy with different degrees. It is thusjudicious to allow different encoding structures in our searchfor the best constituent encoders.

Since the original Turbo codes employ 16-state recursivesystematic convolutional (RSC) constituent encoders, we focuson the design of 16-state encoders (m = 4). We denote thecoefficients of the feedback and feed-forward polynomials ofa 16-state RSC encoder in binary form as {f0, f1, f2, f3, f4}and {g0, g1, g2, g3, g4}, respectively, where fi, gj = 0 or1, i, j = 0, 1, · · · , 4. An exhaustive search for the bestencoding coefficients is however computationally expensive.The search is indeed more taxing than in the case of symmetric(i.e., identical) encoders [35], since we allow different (orasymmetric) constituent encoders. We then search for sub-optimal encoder structures. Our search, with the restriction thatboth constituent encoders have f0=f4=g0 = 1, is performedas follows.

1) Fix the second constituent encoder as, for example,(31, 23), find (by simulation) the best feed-forward andfeedback polynomials of the first constituent encoder via theiterative steps described in [35].

2) Fix the best structure for the first constituent encoderas found in step 1), find the best feed-forward and feedbackpolynomials of the second constituent encoder.

The initial structure of the second constituent encoder instep 1) is selected according to our results obtained in [35].For example, for a general asymmetric Markov source, if themarginal distribution p0 ≥ 0.9, then the second constituent

encoder is initially chosen as (31, 23); if 0.6 ≤ p0 ≤ 0.8, wemay choose it as (35, 23). If the source is symmetric, sincethere is no redundancy in the form of non-uniformity thatcan be exploited by the second constituent decoder, we mayinitially choose the second constituent encoder in step 1) asBerrou’s (37, 21) code, which offers very good performancefor uniform i.i.d. sources. In our search, we used a sequencelength L = 65536 bits.2

V. SHANNON LIMIT

Shannon’s Lossy Information Transmission Theorem [23],[27] states that for a given memoryless source and a givenmemoryless channel with capacity C, the source can betransmitted, for sufficiently large source blocklengths, via asource-channel code over the channel at a transmission rateof r source symbols/channel symbol and reproduced at thereceiver end within an end-to-end distortion given by D if thefollowing holds

r ·R(D) < C, (16)

where R(D) is the source rate-distortion function. As shownin [27], the above result can be achieved by concatenating(in tandem) separately designed source and channel codesat an overall rate of r = Rc/Rs, where Rc and Rs arethe source and channel coding rates, respectively. The abovetheorem is also applicable when the source and the channeladditive noise are stationary ergodic [6]. Under the Hammingdistortion measure, the distortion D = Pe, the BER. Note thatthe capacity C for the considered BPSK-modulated channelswith average symbol energy Es is a function of Eb/N0, whereEb = Es/r is the average energy per source bit. Thus, solving(16) by assuming equality yields the optimum value of Eb/N0

to guarantee a BER of Pe; this value is called the Shannonlimit.

In general, for asymmetric Markov sources, there are noclosed-form expressions for R(D). In this case, Blahut’salgorithm [9] can be employed to compute the normalizednth order source rate-distortion function Rn(D), which is anupper bound to R(D). It can also be used to compute theWyner and Ziv lower bound [33] to R(D) which was latertightened by Berger [6]. In other words, the following twobounds for R(D) can be calculated:

Rn(D) ≥ R(D) ≥ Rn(D)−[

1nH(Un)−H(U∞)

], (17)

where H(Un) = H(U1, U2, · · · , Un) is the source blockentropy and H(U∞) is the source entropy rate. Remark thatthe above bounds hold for general stationary sources withmemory. For a stationary Markov source, (17) reduces to

Rn(D) ≥ R(D) ≥ Rn(D)− 1n

[H(U1)−H(U2|U1)] , (18)

where H(U1) is the source entropy and H(U2|U1) is thesource conditional entropy of U2 given U1. Note that the abovetwo bounds are asymptotically tight as n→∞; however, the

2We observed that when the performance of an encoder is better thananother encoder for a given sequence length, then this behaviour also holdsfor longer sequence lengths. We used a shorter sequence length than in thesimulations section to speed up the optimization procedure.

Page 7: Joint source-channel turbo coding for binary Markov sources

ZHU AND ALAJAJI: JOINT SOURCE-CHANNEL TURBO CODING FOR BINARY MARKOV SOURCES 1071

complexity for computing Rn(D) grows exponentially in n.Therefore, for a desired BER level and a given overall rater in source symbol/channel symbol, we may substitute R(D)with its upper or lower bound in (16), and hence obtain anupper or lower bound on the corresponding Shannon limit.

For the special case of binary symmetric Markov sourceswith transition probability

Pr{Uk+1 = j|Uk = i} = q, for i �= j, (19)

where it is assumed that q > 1/2 without loss of generality,Gray proved that [18]

R(D) = hb(q)− hb(D), if 0 ≤ D ≤ Dc, (20)

where hb(·) is the binary entropy function and Dc is thecritical distortion defined by

Dc =12

(1−

√1− (1− q)2

q2

). (21)

Therefore, when the desired distortion D satisfies D ≤ Dc,R(D) admits a closed-form expression identical to that of anon-uniform i.i.d. source with Pr{Uk = 0} = q. Under theHamming distortion measure, we obtain that, for the valuesof q that we considered, Gray’s critical distortion in (21) isindeed greater than our target BER (which is 10−5); hence,the Shannon limit can be determined exactly using (16) and(20).

VI. NUMERICAL RESULTS AND DISCUSSION

We present simulation results of our JSC systematic Turbocodes for the transmission of binary Markov sources overBPSK-modulated AWGN and Rayleigh fading channels. Allsimulated Turbo codes have 16-state constituent encoders anduse the same pseudo-random interleaver introduced in [7].To keep a tolerable computational complexity, the sequencelength is limited to L = 512×512 = 262144 and at least 200blocks are simulated; this would guarantee a reliable BERestimation at the 10−5 level with 524 errors.

In our modification of the extrinsic information generatedfrom the second constituent decoder, the weights c1 and c2in (15) between the original extrinsic information term andthe correction term are selected by simulations. Given thatc1 + c2 = 1, we tried the following values for c1: 0, 0.1, 0.2,· · · , 0.9, 1.0. The resulting best choice is to set c1=0.8 andc2=0.2, and this choice is robust for the (q0,q1) pairs that wehave tested.

A. Symmetric Markov Sources

We first present the results for symmetric Markov sources.The number of iterations in the Turbo code decoder is 20.Simulations are performed for rate Rc = 1/3 with q=0.7,0.8 and 0.9. The best encoder we found for q=0.9 has the firstconstituent encoder as (31, 23), and the second as (35, 23). Forq=0.8 and 0.7, assigning (35, 23) for both constituent encodersturns out to be the best choice.

Fig. 2 shows the performance of our systematic Turbo codesdesigned for transmitting binary symmetric Markov sourcesover AWGN channels. The Shannon limit curves, computed

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

BE

R

Eb/No (dB)

q=0.9; (31,23), (35,23)q=0.8; (35,23), (35,23)q=0.7; (35,23), (35,23)q=0.9; Shannon Limitq=0.8; Shannon Limitq=0.7; Shannon Limit

any q; Berrou; w/o JSC Dec.

Fig. 2. Turbo code performance and Shannon limit for binary symmetricMarkov sources, Rc=1/3, L=262144, AWGN channel.

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

BE

R

Eb/No (dB)

q=0.9; (31,23), (35,23)q=0.8; (35,23), (35,23)q=0.7; (35,23), (35,23)q=0.9; Shannon Limitq=0.8; Shannon Limitq=0.7; Shannon Limit

any q; Berrou; w/o JSC Dec.

Fig. 3. Turbo code performance and Shannon limit for binary symmetricMarkov sources, Rc=1/3, L=262144, Rayleigh fading channel with knownchannel state information.

via (20) and (16), are also plotted for different target BERs.To illustrate the gains achieved due to the exploitation of thesource redundancy (in both encoder and decoder), we includethe performance curve of the standard Berrou (37,21) Turbocode which ignores the source statistics. The decoder of theBerrou code hence assumes that the source is uniform i.i.d. andemploys Robertson’s decoding algorithm of Section III-A. InFig. 2, we refer to such system using the label “w/o JSC Dec.”which stands for “without joint source-channel decoding.” Weclearly observe from the plots that at a BER level of 10−5,when q = 0.7, our system offers a gain of 0.45 dB over the“w/o JSC Dec.” scheme, yielding a gap of 0.73 dB from theShannon limit; when q=0.8 and 0.9, the gains become 1.29 dBand 3.03 dB, respectively, which are 0.94 dB and 1.36 dB awayfrom the Shannon limit.

Fig. 3 illustrates similar results for BPSK-modulatedRayleigh fading channels with known channel state informa-tion. When q = 0.7, the gain over the “w/o JSC Dec.” schemedue to exploiting the source memory in our design is 0.52 dB,which is 0.87 dB away from the Shannon limit. When q=0.8,the gain increases to 1.55 dB, bringing the performance at adistance of 1.08 dB away from the Shannon limit. For q = 0.9,the gain reaches 3.57 dB; resulting an gap of 1.45 dB tothe Shannon limit. The exact Shannon limit values and theperformance gaps vis-a-vis the Shannon limit for a BER target

Page 8: Joint source-channel turbo coding for binary Markov sources

1072 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006

TABLE I

SHANNON LIMIT (SL) VALUES AND GAPS IN Eb/N0 AT BER=10−5 LEVEL (IN DB), TURBO CODES FOR BINARY SYMMETRIC MARKOV SOURCE,

Rc=1/3, L=262144, AWGN AND RAYLEIGH FADING CHANNEL. THE EXACT SHANNON LIMIT VALUE IS COMPUTED USING (20) AND (16).

Source Turbo AWGN Rayleigh

Distribution Codes SL value SL gap SL value SL gap

q = 0.7 (35,23), (35,23) -1.19 0.73 -0.34 0.87

q = 0.8 (35,23), (35,23) -2.24 0.94 -1.56 1.08

q = 0.9 (31,23), (35,23) -4.40 1.36 -3.96 1.45

TABLE II

SHANNON LIMIT LOWER BOUND (SL-LB), UPPER BOUND (SL-UB), AVERAGE VALUE (SL-AVG) AND GAP IN Eb/N0 AT BER=10−5 LEVEL (IN DB) OF

TURBO CODES FOR BINARY ASYMMETRIC MARKOV SOURCES OVER AWGN CHANNELS WITH Rc=1/3, L=262144 AND 30 DECODING ITERATIONS. THE

SHANNON LIMIT LB AND UB VALUES ARE OBTAINED FROM (18) AND (16) BY CALCULATING THE SOURCE n’TH RATE-DISTORTION FUNCTION Rn(D)

VIA BLAHUT’S ALGORITHM USING BLOCKLENGTH n = 12 FOR ALL SOURCES (EXCEPT THE FIRST AND FOURTH SOURCE FOR WHICH n = 10 IS USED).

Source Distribution Turbo Codes SL-LB SL-UB SL-avg SL gap

q0 = 0.7, q1 = 0.65, p0 = 0.538 (37,21), (35,23) -1.04 -0.99 -1.015 0.64

(35,23), (35,23) -1.04 -0.99 -1.015 0.69

q0 = 0.8, q1 = 0.7, p0 = 0.6 (35,23), (35,23) -1.80 -1.70 -1.75 0.83

q0 = 0.9, q1 = 0.8, p0 = 0.667 (35,23), (35,23) -3.59 -3.32 -3.455 1.14

q0 = 0.7, q1 = 0.25, p0 = 0.714 (37,21), (35,23) -1.31 -1.31 -1.31 0.89

(35,23), (35,23) -1.31 -1.31 -1.31 0.92

q0 = 0.9, q1 = 0.7, p0 = 0.75 (35,23), (35,23) -3.42 -3.25 -3.335 1.31

(31,23), (35,23) -3.42 -3.25 -3.335 1.36

q0 = 0.95, q1 = 0.8, p0 = 0.8 (31,23), (35,23) -5.49 -5.13 -5.31 1.67

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

BE

R

Eb/No (dB)

q=0.9; (31,23), (35,23)q=0.8; (35,23), (35,23)q=0.7; (35,23), (35,23)q=0.9; (37,21), (37,21)q=0.8; (37,21), (37,21)q=0.7; (37,21), (37,21)

Fig. 4. Comparison with Berrou’s (37, 21) code for binary symmetric Markovsources, Rc=1/3, L=262144, AWGN channel.

of 10−5 are summarized in Table I.To demonstrate the gains achieved by encoder optimization,

we compare our system with Berrou’s (37, 21) code, whichalso exploits the source statistics at the decoder in exactlythe same way. The results are shown in Fig. 4 for AWGNchannels (a similar behavior is also observed for Rayleighfading channels). We remark that as the source transitionalprobability increases, the gain due to encoder optimizationbecomes more significant yielding substantially lower errorfloors. For example, at the 10−5 BER level, the gain is around0.65 dB when q = 0.9. We have also separately obtainedthe performance of our systems studied thus far for shortersequence lengths (L = 32 × 32 = 1024 and L = 128 ×128 = 16384), since in some practical situations, delay may be

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-1.2 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1

BE

R

Eb/No (dB)

q0=0.8, q1=0.7; (35,23), (35,23)q0=0.8, q1=0.7; (37,21), (37,21)

q0=0.7, q1=0.25; (37,21), (35,23)q0=0.7, q1=0.25; (35,23), (35,23)q0=0.7, q1=0.25; (37,21), (37,21)

Fig. 5. Turbo code performance for binary asymmetric Markov sources,Rc=1/3, L=262144, AWGN channel.

limited. Our results indicate a similar performance advantageof our JSC coding system even for relatively small values ofL (e.g., L = 1024).

B. Asymmetric Markov Sources

We next present simulation results for asymmetric Markovsources. The sequence length is again L = 262144 and therate is Rc = 1/3. The number of decoding iterations is 30. Byvarying the values of (q0, q1), we generate a series of Markovsources which have different amounts of redundancy in theform of memory and non-uniformity; in total six asymmetricsources are considered. The results are summarized in Table II.

Page 9: Joint source-channel turbo coding for binary Markov sources

ZHU AND ALAJAJI: JOINT SOURCE-CHANNEL TURBO CODING FOR BINARY MARKOV SOURCES 1073

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-4.2 -4 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2 -1.8 -1.6 -1.4 -1.2

BE

R

Eb/No (dB)

q0=0.95, q1=0.8; (31,23), (35,23)q0=0.95, q1=0.8; (37,21), (37,21)q0=0.9, q1=0.7; (31,23), (35,23)q0=0.9, q1=0.7; (35,23), (35,23)q0=0.9, q1=0.7; (37,21), (37,21)

Fig. 6. Turbo code performance for binary asymmetric Markov sources,Rc=1/3, L=262144, AWGN channel.

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

BE

R

Eb/No (dB)

(31,23), (35,23) w/ full JSC Dec.(31,23), (35,23) w/ Dec. exploiting memory

(37,21), (37,21) w/ full JSC Dec.(31,23), (35,23) w/ Dec. exploiting nonuniformity

(31,23), (35,23) without JSC Dec.

Fig. 7. Turbo code performance due to optimizing the encoder and exploitingthe source memory/non-uniformity to various degrees for binary asymmetricMarkov source with transition distribution given by q0 = 0.95 and q1 = 0.8(p0 = 0.8), Rc=1/3, L = 16384, AWGN channel.

In Figs. 5 and 6, performance curves of our optimized Turbocodes are provided for four sources. We also illustrate the ben-efits of our encoder optimization by plotting the performanceof the Berrou code under identical decoding. We clearly notethat our codes offer considerably lower error floors than theBerrou code.

The performance of our codes vis-a-vis the Shannon limitis also assessed in Table II for all six asymmetric sources.Since the Shannon limit cannot be exactly computed forsuch sources, we provide the lower and upper bounds to theShannon limit using (18) and (16). We then use the average ofthe lower and upper bounds as a benchmark to compute theperformance gap to the Shannon limit. As shown in Table II,the gap varies from 0.64 dB to 1.67 dB. More conservatively, ifwe instead use the Shannon limit lower bound as a benchmark,then the performance gap to the Shannon limit varies from0.665 dB to 1.85 dB.

In Fig. 7, we present the effects of optimizing the encoderand exploiting the source memory and non-uniformity tovarious degrees at the decoder on the Turbo code performancefor the transmission of the binary asymmetric Markov sourcewith q0 = 0.95 and q1 = 0.8 over AWGN channels. Asequence length of L = 16384 bits is used with at least3200 simulated blocks; 30 iterations are used at the decoder.Note that the Markov source has substantial asymmetry asits marginal distribution is p0 = 0.8. As shown in Table II,

the best encoding structure for this source is (31,23) for thefirst constituent encoder and (35,23) for the second constituentencoder. The performance of five systems is plotted in Fig. 7:

(i) Our JSC system using the best (31,23), (35,23) encoderwith full JSC decoding as described in Sections III-Band III-C.

(ii) Our JSC system using the (31,23), (35,23) encoder andJSC decoding as in Sections III-B and III-C with theexception that it assumes that the source marginal distri-bution is uniform in the boundary conditions (9) and (11)for α(·, ·) and β(·, ·) of the first constituent decoder. Thisdecoder hence fully exploits the source memory, and itexploits the source asymmetry partially as it depends onthe received systematic stream {Y s

k }.(iii) An unoptimized encoder system using the Berrou (37,21)

code with full JSC decoding.(iv) A system using the (31,23), (35,23) encoder and a

decoder that only exploits the source asymmetry. Thedecoder assumes that the source is non-uniform i.i.d. andis designed as in [35].

(v) A system using the (31,23), (35,23) encoder without JSCdecoding. In other words, the decoder assumes that thesource is uniform and i.i.d. and employs Robertson’smethod [24].

We note from the figure that our JSC system (i) offers thebest performance. Its gain over system (ii) though is minor asthe only difference between the two is that system (ii) doesnot exploit the source non-uniformity in (9) and (11). The gainof our system over system (iii) due to encoder optimization issignificant: for a BER of 10−5 the performance gain is morethan 1 dB. Furthermore, our system outperforms system (iv)by more than 3.5 dB for the same BER level. This large gainindicates that exploiting the source memory is considerablymore important than exploiting the source asymmetry for thissource. Finally, an additional 0.7 dB is gained (for a totalgain of more than 4.2 dB) over system (iv) which exploits nosource redundancy at the decoder.

In spite of its good performance, our JSC system is sub-optimal for both symmetric and asymmetric Markov sources,particularly for q0, q1 ≥ 0.9. As remarked earlier, the in-terleaving prevents the second decoder from systematicallyexploiting the source memory. By modifying the extrinsicinformation as shown in (15), an improvement is achieved;but clearly this method does not fully exploit the sourcedependencies. Furthermore, for asymmetric Markov sources,exploiting the source non-uniformity via non-systematic Turbocodes [36] may yield further improvements, as it will reducethe mismatch (due to the systematic structure of the Turboencoder) between the biased distribution of the systematicbitstream at the encoder output and the uniform input distrib-ution needed to achieve the capacity of the binary modulatedAWGN/Rayleigh channel (cf. [26], [36]). Another promisingmethod for exploiting the source non-uniformity, particularlyif the source is strongly asymmetric, is to use non-binarymodulation with unequal energy allocation as recently inves-tigated in [10]. Hence, the design of a more sophisticatedTurbo coding system, which can perform even closer to theShannon limit for Markov sources with very biased transition

Page 10: Joint source-channel turbo coding for binary Markov sources

1074 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 5, NO. 5, MAY 2006

1e-01

1e-02

1e-03

1e-04

1e-05

1e-06-2 -1.5 -1 -0.5 0 0.5

BE

R

Eb/No (dB)

(31,23), (35,23) Turbo code for Markov sources8th-order Huffman + (37,21) Turbo code8th-order Huffman + (35,23) Turbo code

Fig. 8. Performance comparison of the JSC Turbo coding system (withRc = 1/2) with that of tandem coding schemes (using Huffman coding withRs = 2/3 and standard Turbo coding with Rc = 1/3) for binary symmetricMarkov sources, q=0.848315, L=12000, AWGN channel.

distribution (e.g., q0, q1 ≥ 0.9), remains an interesting andchallenging problem.

VII. COMPARISON WITH TANDEM CODING SCHEMES

In this section, we compare our JSC Turbo codes with twotandem coding schemes for the transmission of a symmetricMarkov source with transition probability q. As in [36], thetandem schemes consist of a Huffman code followed by astandard symmetric Turbo code with constituent encoders(37, 21) or (35, 23), where the (35, 23) Turbo code offers alower error-floor performance than that of its (37, 21) peer atthe price of a slight performance loss in the water-fall region.

The comparison is made at the same overall rate r =Rc/Rs = 1/2 source symbol/channel symbol. Our JSC codingsystem has Rc = 1/2 and Rs = 1 (no source coding); thetandem scheme, however, has Rc = 1/3, therefore it needsto have Rs = 2/3. This can be achieved by numericallychoosing the value of q such that the Huffman code producesan output sequence with average codeword length close to2/3. Due to the source memory, a Markov source is harderto compress than a non-uniform i.i.d. source. In fact, usinga 4th-order Huffman code, to guarantee Rs = 2/3, we needto have q=0.881685. However, at this value of q, the entropyrate of the symmetric Markov source is 0.524. To achieve abetter compression efficiency, we adopt an 8th-order Huffmancode, which can achieve Rs=0.666667 at q=0.848315, whilethe source entropy rate is 0.614.

The sequence length is L=12000, and at least 60000 blocksare simulated to produce a reliable average performance inthe error-floor region. An S-random interleaver with S=10 isadopted replacing the pseudo-random interleaver. The numberof iterations in the Turbo decoder is 20. The detailed imple-mentation procedure is similar to that described in [36].

Fig. 8 shows the performance comparison of our JSC codingsystem with the two tandem schemes over AWGN channels.We observe that although initially the tandem schemes offerbetter water-fall performance, they are quickly outmatched byour system due to their high BER performance from mediumto high SNRs. By using the (35, 23) code in the constituentencoders, the error-floor performance of the second tandemscheme is improved over the Berrou code based tandem

scheme (the floor is lowered from the 10−3 to the 10−4

BER level), at the cost of a slight water-fall degradation.However, both tandem schemes have a high error floor (withrespect to our system) due to the inevitable error-propagationincurred by the Huffman decoder. The use of a fixed-lengthsource code instead of the Huffman code (or other entropycodes) would still not yield a satisfactory overall performancefor the tandem schemes, since standard fixed-length sourcecodes may not on their own achieve good compression (al-though they would remedy the error propagation problem).Unlike the tandem systems, our JSC coding system enjoyslower complexity (since no source encoding and decoding areperformed), while offering a superior performance for BERsless than 10−4. Note that the tandem system performancecan be improved if we replace its source decoder with asoft-input source decoder (e.g., [32]), although at a priceof increased complexity. Furthermore, alternative JSC codingsystems that employ variable-length source encoding followedby Turbo encoding and jointly designed soft-input variable-length source decoding and Turbo decoding with possibleiterations between both decoders (e.g., [5], [19]), may becompared in a future study to our JSC coding system; however,our system is significantly simpler in terms of complexity. Onthe other hand, the tandem scheme may have less delay sincein comparison with the JSC system, the interleaver size in itsTurbo coding part is shorter.

VIII. SUMMARY

In this work, we investigate the design of JSC Turbo codesfor binary stationary ergodic Markov sources. In the firstconstituent decoder, the decoding algorithm is modified toexploit the source redundancy in the form of memory and non-uniformity, where Berrou’s original decoding method based onthe Gaussian assumption for the extrinsic information providesa suitable solution for the LLR decomposition problem. Due tointerleaving, the second constituent decoder is unable to adoptthe same modifications. However, its extrinsic information ismodified via a weighted correction term that depends on theMarkovian source statistics. The Turbo encoder structure isalso optimized in accordance with the source statistics. Sub-stantial gains (from 0.45 up to 3.57 dB) are demonstrated byour system over the original Turbo codes, and its performancegap vis-a-vis the Shannon limit ranges from 0.73 to 1.45 dBfor symmetric Markov sources, and from approximately 0.64to 1.67 dB for asymmetric Markov sources. Finally, oursystem is compared with two tandem coding schemes, whichemploy 8th-order Huffman coding (to achieve near-optimaldata compression) and regular rate-1/3 Turbo coding. Oursystem is shown to provide a robust and substantially betterperformance (for BERs less than 10−4) while benefiting froma lower overall complexity.

IX. ACKNOWLEDGMENT

The authors are grateful to the editor and the anonymousreviewers for their comments which were instrumental insignificantly improving the paper.

Page 11: Joint source-channel turbo coding for binary Markov sources

ZHU AND ALAJAJI: JOINT SOURCE-CHANNEL TURBO CODING FOR BINARY MARKOV SOURCES 1075

REFERENCES

[1] M. Adrat, J.-M. Picard, and P. Vary, “Softbit-source decoding basedon the turbo-principle,” in Proc. Veh. Tech. Conf., Oct. 2001, vol. 4,pp. 2252–2256.

[2] F. Alajaji, N. Phamdo, N. Farvardin, and T. Fuja, “Detection of binaryMarkov sources over channels with additive Markov noise,” IEEE Trans.Inform. Theory, vol. 42, pp. 230–239, Jan. 1996.

[3] F. Alajaji, N. Phamdo, and T. Fuja, “Channel codes that exploits theresidual redundancy in CELP-encoded speech,” IEEE Trans. Speech andAudio Processing, vol. 4, pp. 325–336, Sep. 1996.

[4] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding oflinear codes for minimizing symbol error rate,” IEEE Trans. Inform.Theory, vol. 20, pp. 248-287, Mar. 1974.

[5] R. Bauer and J. Hagenauer, “On variable length codes for iterativesource/channel decoding,” in Proc. Data Compression Conf., Mar. 2001,pp. 273–282.

[6] T. Berger, Rate Distortion Theory: A Mathematical Basis for DataCompression. Englewood Cliffs, NJ: Prentice Hall, 1971.

[7] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror-correcting coding and decoding: Turbo-codes(1),” in Proc. IEEEInt. Conf. Commun., May 1993, pp. 1064–1070.

[8] C. Berrou and A. Glavieux, “Near optimum error correcting coding anddecoding: Turbo-codes,” IEEE Trans. Commun., vol. 44, pp. 1261–1271,Oct. 1996.

[9] R. E. Blahut, Principles and Practice of Information Theory. Reading,MA: Addison Wesley, 1988.

[10] F. Cabarcas, R. D. Souza, and J. Garcia-Frias, “Source-controlled turbocoding of nonuniform memoryless sources based on unequal energyallocation,” in Proc. Int. Symp. Inform. Theory, June-July 2004, p.166.

[11] Z. Cai, K. R. Subramanian, and L. Zhang, “Source-controlled channeldecoding using nonbinary turbo codes,” IEEE Electron. Lett., vol. 37,no. 1, pp. 39–40, Jan. 2001.

[12] G. Colavolpe, G. Ferrari, and R. Raheli, “Extrinsic information in Turbodecoding: a unified view,” in Proc. Globecom, Dec. 1999, pp. 505–509.

[13] H. El Gamal and A. R. Hammons, Jr., “Analyzing the turbo decoderusing the Gaussian approximation,” IEEE Trans. Inform. Theory, vol.47, pp. 671–686, Feb. 2001.

[14] T. Fazel and T. Fuja, “Robust transmission of MELP-compressed speech:An illustrative example of joint source-channel decoding,” IEEE Trans.Commun., vol. 51, pp. 973–982, June 2003.

[15] J. Garcia-Frias and J. D. Villasenor, “Combining hidden Markov sourcemodels and parallel concatenated codes,” IEEE Commun. Lett., vol. 1,pp. 111–113, July 1997.

[16] J. Garcia-Frias and J. D. Villasenor, “Joint turbo decoding and estimationof hidden Markov sources,” IEEE J. Select. Areas Commun., vol. 19,pp. 1671–1679, Sep. 2001.

[17] N. Görtz, “On the iterative approximation of optimal joint source-channel decoding,” IEEE J. Select. Areas Commun., vol. 19, no. 9,pp. 1662–1670, Sep. 2001.

[18] R. M. Gray, “Information rates for autoregressive processes,” IEEETrans. Inform. Theory, vol. 16, pp. 412-421, July 1970.

[19] L. Guivarch, J.-C. Carlach, and P. Siohan, “Joint source-channel soft de-coding of Huffman codes with Turbo-codes,” in Proc. Data CompressionConf., Mar. 2000, pp. 83–92.

[20] J. Hagenauer and N. Görtz, “The turbo principle in joint source-channelcoding,” in Proc. IEEE Inform. Theory Workshop, Mar.–Apr. 2003,pp. 275–278.

[21] E. K. Hall and S. G. Wilson, “Design and analysis of Turbo codeson Rayleigh fading channels,” IEEE J. Selec. Areas Commun., vol. 16,pp. 160–174, Feb. 1998.

[22] J. Kroll and N. Phamdo, “Source-channel optimized trellis codes forbitonal image transmission over AWGN channels,” IEEE Trans. ImageProcessing, vol. 8, pp. 899–912, July 1999.

[23] R. J. McEliece, The Theory of Information and Coding, 2nd ed..Cambridge, UK: Cambridge University Press, 2002.

[24] P. Robertson, “Illuminating the structure of parallel concatenated recur-sive systematic (turbo) codes,” in Proc. Globecom, Nov. 1994, vol. 3,pp. 1298-1303.

[25] K. Sayood and J. C. Borkenhagen, “Use of residual redundancy in thedesign of joint source-channel codes,” IEEE Trans. Commun., vol. 39,pp. 838–846, June 1991.

[26] S. Shamai and S. Verdú, “The empirical distribution of good codes,”IEEE Trans. Inform. Theory, vol. 43, pp. 836–846, May 1997.

[27] C. E. Shannon, “Coding theorems for a discrete source with a fidelitycriterion,” IRE Nat. Conv. Rec., pt. 4, pp. 142–163, Mar. 1959.

[28] M. Skoglund, “Soft decoding for vector quantization over noisy channelswith memory,” IEEE Trans. Inform. Theory, vol. 45, no. 4, pp. 1293–1307, May 1999.

[29] K. P. Subbalakshmi and J. Vaisey, “On the joint source-channel decod-ing of variable-length encoded sources: The BSC case,” IEEE Trans.Commun., vol. 49, pp. 2052–2055, Dec. 2001.

[30] G. Takahara, F. Alajaji, N. C. Beaulieu, and H. Kuai, “Constellationmappings for two-dimensional signaling of non-uniform sources,” IEEETrans. Commun., vol. 51, pp. 400–408, Mar. 2003.

[31] R. E. Van Dyck and D. Miller, “Transport of wireless video usingseparate, concatenated, and joint source-channel coding,” in Proc. IEEE,Oct. 1999, vol. 87, pp. 1734–1750.

[32] J. Wen and J. Villasenor, “Soft-input soft-output decoding of variablelength codes,” IEEE Trans. Commun., vol. 50, pp. 689–692, May 2002.

[33] A. Wyner and J. Ziv, “Bounds on the rate-distortion function forstationary sources with memory,” IEEE Trans. Inform. Theory, vol. 17,pp. 508–513, Sep. 1971.

[34] W. Xu, J. Hagenauer, and J. Hollmann, “Joint source-channel decodingusing the residual redundancy in compressed images,” in Proc. ICC,June 1996, pp. 142–148.

[35] G.-C. Zhu and F. Alajaji, “Turbo codes for non-uniform memorylesssources over noisy channels,” IEEE Commun. Lett., vol. 6, pp. 64–66,Feb. 2002.

[36] G.-C. Zhu, F. Alajaji, J. Bajcsy, and P. Mitran, “Transmission of non-uniform memoryless sources via non-systematic Turbo codes,” IEEETrans. Commun., vol. 52, pp. 1344–1354, Aug. 2004.

Guang-Chong Zhu (M’04) received the B.Sc. andM.Sc. degrees in computational mathematics fromShandong University, Shandong, China, in 1994and 1997, respectively. He then received the M.Sc.degree in applied mathematics and the Ph.D. degreein mathematics and engineering from Queen’s Uni-versity, Kingston, ON, Canada, in 1998 and 2003,respectively. From 2003 to 2005, he was a Post-doctoral Fellow in the Department of Mathematicsand Statistics at Queen’s University, and then in theDepartment of Electrical and Computer Engineering

at University of Toronto. In August 2005, He joined Lawrence TechnologicalUniversity as an assistant professor in the Department of Mathematics andComputer Science.

His current research interests include channel coding, joint source-channelcoding, and information theory.

Fady Alajaji (S’90–M’94–SM’00) was born inBeirut, Lebanon, on May 1, 1966. He received theB.E. degree (with Distinction) from the AmericanUniversity of Beirut, Lebanon, and the M.Sc. andPh.D. degrees from the University of Maryland,College Park, all in electrical engineering, in 1988,1990 and 1994, respectively. He held a postdoctoralappointment in 1994 at the Institute for SystemsResearch, University of Maryland.

In 1995, he joined the Department of Mathemat-ics and Statistics at Queen’s University, Kingston,

Ontario, where he is currently an Associate Professor of Mathematics andEngineering. Since 1997, he has also been cross-appointed in the Departmentof Electrical and Computer Engineering at the same university. He heldresearch visits to the Department of Electrical and Computer Engineering,McGill University, Montreal, in the Fall of 2001, and to the Departmentof Mobile Communications, Technical University of Berlin, Germany, inthe Winter of 2004. His research interests include information theory, jointsource-channel coding, error control coding, data compression and digitalcommunications.

Dr. Alajaji currently serves as Editor for Source and Source-ChannelCoding for the IEEE TRANSACTIONS ON COMMUNICATIONS. Heserved as co-chair of the Technical Program Committee of the 2004 BiennialSymposium on Communications. In 1999, he co-chaired and organized theSixth Canadian Workshop on Information Theory in Kingston, Ontario. Healso served as chair of the Selection Committee for the 2000 CanadianAward in Telecommunications. In 2001, he received the Premier’s ResearchExcellence Award (PREA) from the Province of Ontario.