ieee transactions on signal processing, vol. 52 ...after zero padding, each -long...

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 3, MARCH 2004 721

Near-Optimum Soft Decision Equalization forFrequency Selective MIMO Channels

Shoumin Liu, Student Member, IEEE, and Zhi Tian, Member, IEEE

Abstract—In this paper, we develop soft decision equalization(SDE) techniques for frequency selective multiple-input mul-tiple-output (MIMO) channels in the quest for low-complexityequalizers with error performance competitive to that of max-imum likelihood (ML) sequence detection. We demonstrate thatdecision feedback equalization (DFE) based on soft-decisions,expressed via the posterior probabilities associated with feedbacksymbols, is able to outperform hard-decision DFE, with a lowcomputational cost that is polynomial in the number of symbolsto be recovered and linear in the signal constellation size. Buildingon the probabilistic data association (PDA) multiuser detector,we present two new MIMO equalization solutions to handle thedistinctive channel memory. The first SDE algorithm adopts azero-padded transmission structure to convert the challengingsequence detection problem into a block-by-block least-squareformulation. It introduces key enhancement to the original PDAto enable applications in rank-deficient channels and for higherlevel modulations. The second SDE algorithm takes advantageof the Toeplitz channel matrix structure embodied in an equal-ization problem. It processes the data samples through a seriesof overlapping sliding windows to reduce complexity and, atthe same time, performs implicit noise tracking to maintainnear-optimum performance. With their low complexity, simpleimplementations, and impressive near-optimum performanceoffered by iterative soft-decision processing, the proposed SDEmethods are attractive candidates to deliver efficient receptionsolutions to practical high-capacity MIMO systems. Simulationcomparisons of our SDE methods with minimum-mean-squareerror (MMSE)-based MIMO DFE and sphere decoded quasi-MLdetection are presented.

Index Terms—Efficient reception algorithms, equalizationfor frequency selective MIMO channels, overlapping slidingwindowing, soft decision.

I. INTRODUCTION

ENORMOUS increase in bandwidth efficiency is promisedby the use of multiple-input-multiple-output (MIMO) sys-

tems in wireless radio frequency links [1]. In high-data-rate ap-plications, channel-induced intersymbol interference (ISI) canbe mitigated using serial equalization techniques, which unfor-tunately encounter major challenges in MIMO channels, be-cause of the need for signal detection in the presence of bothmultiple access interference (MAI) as well as ISI. The BCJR

Manuscript received December 18, 2002; revised May 16, 2003. A portion ofthe manuscript was presented in the Proceedings of the 2nd IEEE Sensor Arrayand Multichannel Signal Processing Workshop (SAM’2002), Rosslyn, VA, Au-gust 2002. The associate editor coordinating the review of this paper and ap-proving it for publication was Prof. Dhananjay A. Gore.

The authors are with the Department of Electrical and Computer Engineer-ing, Michigan Technological University, Houghton, MI 49931 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TSP.2003.822376

maximum a posteriori (MAP) [2] and Viterbi equalizers per-form optimum sequence detection to account for the channelmemory but incur prohibitive complexity that is exponential inthe number of inputs and the channel memory. Their equaliza-tion complexity can be reduced using standard reduced-com-plexity methods for single-input channels (e.g., [3], [4]), butsuch methods do not apply to MIMO channels. For complexityconsiderations, typical equalizers consist of linear processingof the received signal, i.e., linear equalization (LE), and pos-sibly past symbol estimation, e.g., decision feedback equaliza-tion (DFE). The optimum MIMO DFE settings in the minimummean-square error (MMSE) sense have been derived in [5]–[8].In these schemes, tentative decisions on both past symbols andsymbols from MAI sources are made by quantizing properly de-rived decision statistics. Such a hard-decision-based approachmay suffer from catastrophic error propagation and, in mostcases, incurs nontrivial performance degradation relative to anoptimal maximum likelihood (ML) detector, in terms of the biterror rate (BER) performance. Recent advances include turbodetection and equalization [9]–[11], in which low-complexitysoft-input-soft-output (SISO) LE and DFE equalizers are de-vised based on the MMSE criterion [10].

This work accentuates the low-complexity, near-optimumequalization for frequency selective MIMO channels by takingon a soft-decision equalization (SDE) approach. We recognizethat block transmission by zero padding permits the conversionof sequence detection to block detection; thus, MIMO channelequalization can be viewed as a general high-dimensionalinteger least-square (LS) problem in the form of ,where is an input vector that takes on finite-al-phabet values, and are the output vector, the channelmatrix, and the noise vector, respectively. Unique to channelequalization, the channel matrix entails a special Toeplitzstructure, to be discussed in Section II. The exact ML solutionto the integer-LS problem unfortunately incurs prohibitivelyhigh computational complexity that is exponential in . Inthe quest for low-complexity implementations of ML blockdetection, quasi-ML solutions have been developed, suchas sphere decoding [12], semi-definite relaxation [13], andprobabilistic data association (PDA) filtering [14], [15]. ThePDA detector [14] provides near-optimal performance at alow overall complexity of . It employs a multistagedetection structure and replaces the intermediate finite alphabetsymbol decisions by soft decisions, which are expressed viatheir associated posterior probabilities. Such a soft-decisionstructure leads to significant computation reduction when MAIis approximated to obey a Gaussian probability distribution:an idea originated from the PDA filter for target tracking

1053-587X/04$20.00 © 2004 IEEE

722 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 3, MARCH 2004

[16]. Despite its impressively low BERs, the PDA method inits current form has limited applications. The original PDAdetector was derived for multiuser detection (MUD) withinthe framework of code-division multiple access (CDMA) infrequency flat-fading channels, in which case, the channelmatrices are confined to be square matrices made of users’cross-correlation coefficients. Zero-forcing preprocessingvia square matrix inverse is performed in PDA, imposing aninvertibility constraint on .

Symbol detection in practical MIMO systems encounterspronouncedly different system parameters and comparedwith those in MUD for single-channel CDMA. The channelmatrix is typically nonsquare and possibly rank-deficient. Forexample, downlink transmissions often face channels that arefat, that is, there are more columns than rows in . Such achannel matrix is no longer invertible, even though the systemmay still be identifiable for digital inputs. A typical case isa BLAST system [17] with a larger number of transmit thanreceive antennas. Moreover, in high-capacity wireless systems,the input signal constellation size increases to improve spectralefficiency, such as quadrature phase shift keying used inthird-generation cellular and 16/64-quadrature amplitude mod-ulation (QAM) in IEEE 802.11a. Higher level modulation isone of the primary reasons that obviate ML detection, and it hasnot been discussed in the current PDA detection algorithm. Inconjunction with Kalman filtering, the PDA method can handlea short channel memory length of 2 induced by asynchronismonly [15], but the result cannot be extended to channels with alonger memory length. The PDA MUD is hampered by theseobstacles; hence, its remarkable features cannot be fully appre-ciated by wireless MIMO systems. Constructing a successfulPDA-type MIMO channel equalizer calls for a research effortthat is beyond the scope of simple generalization of the existingPDA block detectors.

In this paper, we propose MIMO SDE techniques that incor-porate the PDA MUD principle. Different from the hard-de-cision multistage parallel interference cancellation (PIC), theSDE approach generates tentative decisions on ISI and MAIsymbols in the form of posterior probabilities instead of quan-tized bits, and the decision updating is simplified by forcingthe composite effect of noise and interference to be Gaussian:a strategy used in the PDA detector. The SDE detection perfor-mance improves with additional iterations and stabilizes quicklyin three to five iterations for high SNR, and seven to 14 iterationsfor low SNR. Compared with the suboptimum hard-decisionMMSE-DFE [6], the SDE method demonstrates close-to-op-timal equalization performance at a comparable low complexitythat is polynomial in the number of inputs and only linear in theconstellation size of . Compared with other quasi-ML methodssuch as sphere decoding [12], the SDE is not only competitivein both performance and complexity but also applies to situa-tions where sphere decoding does not work well, such as thefat channel case. The SDE is distinct from PDA MUD in sev-eral key accounts. First, the SDE approach is tailored to handlenear-optimum symbol sequence detection in the presence ofchannel memory, which is difficult to accommodate in PDAMUD. Our main contribution is to adopt the block transmis-sion structure via zero padding to enable block detection and to

apply sliding windows for ISI cancellation and noise trackingto attain near-MAP detection performance at a low complexity.Second, we propose an alternative implementation of the PDAprinciple that eliminates the zero-forcing preprocessing. As aresult, the restriction on full-rank square channel matrices islifted. Third, we extend the PDA algorithm to bandwidth-effi-cient higher level modulation schemes. Through this work, thepotential of the iterative soft-decision PDA philosophy can befully enjoyed by practical wireless systems to achieve near-op-timal, low-complexity detection and equalization.

The ensuing paper is organized as follows. Section II de-scribes the input–output model for block transmissions throughfrequency selective MIMO channels. Section III developstwo SDEs: One applies to a general integer-LS problem byenhancing the original PDA algorithm with an alternativeimplementation that obviates zero-forcing preprocessing; theother focuses on frequency selective channels and capitalizeson the distinctive Toeplitz channel structure to reduce the equal-ization complexity. Analytical comparisons between our SDEalgorithms and the existing iterative methods, including PDAand turbo SISO-MMSE, are discussed. Section IV performsevaluation of the proposed MIMO SDE algorithms in termsof their BER performance and the computational load, withelaboration on the tradeoffs in choosing the sliding windowsize and on their application in the rank-deficient fat channelcase. Comparisons with ML detection, sphere decoding, andMIMO MMSE-DFE are presented via computer simulations,followed by a summary in Section V.

II. SIGNAL MODEL

We consider the discrete-time block transmission equivalentmodel of a baseband communication system with inputsand outputs. There are a total of links in this MIMOsystem, wherein the link between each input–output pair is mod-eled as a linear finite impulse response (FIR) dispersive channelwith no greater than symbol-spaced taps in the channelresponse. The sampled channel response from the th input tothe th output, including transmit and receive filters, is denotedby . We adopt a block trans-mission structure with zero padding to eliminate interblock in-terference, hence alleviating the performance degradation dueto noise enhancement or error propagation [19]. The informa-tion-bearing symbols are parsed into -long frames, with theinsertion of zeros at the tail of each frame. The th frameof the input vector is denoted as

, where containsthe information symbols from all inputs at the th samplinginstant. After zero padding, each -long information-bearingsymbol frame creates a transmit symbol frame of framesize , where the first entries convey messages

, followed by trailingzeros , for any frame index andinput . Correspondingly, the received data vectorat the th frame is a concatenation of noise-contaminatedsample vectors , where

consists of the th re-ceived signals at all outputs. The redundancy per transmittedframe is measured by the ratio , whereas at the receiver,

LIU AND TIAN: NEAR-OPTIMUM SOFT DECISION EQUALIZATION 723

the data rate is reduced by the same amount. We set andtypically choose a large frame size (and ) to maintain thetransmission rate.

In linear FIR channels, the received is expressedby

(1)

where denotes the additive zero-mean Gaussianstationary noise received at the th output. The noise terms atthe th transmission frame are grouped into a vector

, where, and the covariance of is denoted

by . With these definitions, the single-linkinput–output model (1), when assembled into the vector-matrixformat, results in a MIMO channel model in the form of

, where the MIMO channel matrix takes on a bandedToeplitz structure [19]:

.... . .

. . ....

(2)

where

......

(3)

Sequence detection incurred by the channel memory can nowbe alternatively solved by frame-by-frame symbol detection, forwhich we will drop the subscript without raising confusion.This system model subsumes a broad range of MIMO transmis-sion scenarios. The multiple input could result from the combi-nation of three situations:

i) multiple transmit antennas with a single user, e.g., thesingle-user space-time coding case;

ii) single transmit antenna with multiple users, e.g., asencountered in the single-channel multiuser detectionproblem;

iii) transmit-induced diversities including orthogonal fre-quency division multipexing (OFDM) and CDMA.

Thus, the number of inputs is determined by both the numberof transmit antennas and the number of multiple-access users.Meanwhile, multi-output is invoked when there are multiple re-ceive antennas and/or when fractional sampling is used. Thus,

is determined by the total number of distinct samplers oper-ating within a symbol period.

With this MIMO model, the integer-LS problem describedin the Introduction deals with a channel matrix of size

, where and . The optimal

ML solution to such a MIMO system faces a major implemen-tation challenge, as its complexity increases exponentially in

, and . Our objective in this paper is to developsoft-decision-based symbol detection and channel equalizationschemes that achieve near-optimal BER performance at lowpolynomial complexity. Throughout, the MIMO channel is as-sumed to be frequency selective and slowly varying. It is time in-variant within each frame of symbol periods but may changeindependently from frame to frame. We suppose the receiver hasperfect knowledge of the channel state information and thenoise variance .

III. NEAR-OPTIMUM SDE

We now develop SDE methods based on the PDA-type soft-decision multistage detection principle. We begin by enhancingthe PDA detector to enable its applicability to a generic blocktransmission system with higher level modulation. A key mod-ification is to associate each symbol with its channel responsevector when deriving its posterior probability density function(pdf) and updating its soft decisions. This strategy eliminatesthe preprocessing of channel matrix inversion, thus relaxing theconstraint on channel invertibility. Next, we develop a new SDEalgorithm that is specifically tailored to near-optimum symboldetection in channels with memory. Taking advantage of theToeplitz channel matrix structure, we apply a series of overlap-ping sliding windows to obtain reduced-complexity local MAPestimation of current symbols and, at the same time, track theISI and noise components to ensure near-optimum performance.The overall computational load is reduced to a fraction (propor-tional to the transmission redundancy) of that of the enhancedPDA solution. For each SDE algorithm, we present efficient im-plementations of the iterative soft-decision updating rule to fur-ther reduce the complexity.

A. SDE by PDA Enhancement

In the general MIMO model , we emphasize theth element of by rewriting the received signal as

(4)

where and are the th and th columns of , re-spectively, denoting the channel responses of and

, as shown in Fig. 1(a). The transmitted bits takevalues from a finite alphabet set on -ary mod-ulation, where the modulation format is typically chosen fromphase shift keying (PSK) and QAM. Given , there are poste-rior probability values associated with each digital input, whichwe denote as , for and

.Finite-alphabet symbol detection on can be alternatively

carried out by estimating , giving rise to soft decisions.Unfortunately, direct evaluation of via the correspondinglikelihood function still incurs exponential complexityin , considering that is a Gaussian mixture with

modes [22]. To avoid the combinatorial complexity,we adopt the PDA filtering idea and treat the transmitted


Fig. 1. Channel models. (a) SDE-1. (b) SDE-2. Each vertical line representsthe signature vector of the corresponding input denoted by a horizontal line.The length of the signature vector determines the detection complexity for thecorresponding input.

symbols as Gaussian random variables. Since theFIR MIMO channel is a linear system, the posterior pdf of

remains Gaussian; thus, it can be fully characterized byits mean and variance, conditioned on the received signal .Define and cov as theconditional mean and covariance of , respectively. Thesedefinitions are different from [14], noting that each symbolis now associated with its channel response vector , insteadof a simple unit vector. Such definitions eliminate the needto perform the channel decorrelating preprocessing in [14],which is not applicable for nonsquare channel matrices andrank-deficient channels. When the transmitted symbols areindependent and identically distributed (i.i.d.), it follows from(4) that

(5)

var (6)

where ( denotes the magnitude of a complex quantity)

(7)

var

(8)

In antipodal signaling, , which leads to, and

var , which corroborates [14].When is approximated as a Gaussian vector with the

matched mean and covariance, its pdf can be described byand as follows:

(9)

This pdf calibrates the posterior probabilities of all possibleconstellation points of , and the soft decisions can be derivedfrom (9) by setting , where is anormalization factor such that . Note thatthe conditional pdf in (9) is determined by and , regardlessof the constellation size . Therefore, the overall complexityof evaluating all soft decisions is linear in .

To obtain in an efficient manner, we introduce the ratios, for , which can be deduced

from (9) as

(10)

Using the probability normalization condition ,we obtain

(11)

Subsequently, the MAP estimate of is decided to be thevalue that yields the largest . The decision rule can be fur-ther simplified in the special binary modulation case [14], [18].

For algorithm implementation, we note that computingfor one input involves computing and

, where both depend on of all other inputs. This intertwined relationship among all unknown

inputs prompts an iterative multistage procedure [14], where thepair is computed from (5) and (6) based on tentative

soft decisions obtained at a previous stage. Eachcan be updated from (11) successively until all

converge for all , followed by decision making onthis frame of symbols via the MAP rule.

In the above iterative soft-decision updating procedure, themost computationally expensive operation is to compute the co-variance matrix inverse , which incurs complexity on thethird order of the number of outputs . Direct matrix inversioncan be avoided using the matrix inversion lemma, which willlower the overall complexity by an order. This speed-up mea-sure is discussed in [14], and it shares the same principle withthe widely used recursive least-square (RLS) adaptive filtering


TABLE ISOFT DECISION EQUALIZATION ALGORITHM: SDE-1

[21]. We now describe these results for our -ary modulationcase.

First, we form two auxiliary variables:is the conditional mean of the noise

term , and var isthe conditional covariance matrix of . Apparently

(12)

var (13)

Applying the matrix inversion lemma on (13) yields [21]

varvar

(14)

and, conversely

var

var(15)

By keeping the updated versions of and andcan be obtained from (12) and (14) at a low com-

plexity of for each input and an overall complexityof in one iteration. The overall soft-decision MIMOequalization algorithm, which is enabled by zero-padded blocktransmission, is summarized in Table I.

B. SDE by Sliding Windowing

So far, we have not utilized the unique Toeplitz structureof the channel matrix in the equalization problem. When thepdfs in (9) are computed via matrix operations, the complexityof the Gaussian-forcing MAP detection is determined by thelength of each input’s channel response vector , which is

in the SDE-1 algorithm. On the other hand,when the block size is chosen to be much largerthan the channel length to reduce the transmission redun-dancy, there are a large number of zeros in the channel matrix,

which could be avoided to save computation. Next, we utilizethe effective portion of each channel response vector and con-struct a soft decision approach that is tailored to the equalizationproblem. The objective is to maintain the near-MAP detectionaccuracy of SDE-1 and, at the same time, reduce the computa-tional complexity to be proportional to the channel memoryinstead of the frame size .

1) Algorithm Development: Revisiting the signal model(4) illustrated in Fig. 1(b), we divide the symbol vector

into sub-blocks , where each sub-blockcontains the information

symbols from all inputs at the th sampling instant.Due to the finite channel memory length, only af-fects output elements, which we group into

[see Fig. 1(b)]. Thisoutput block contains the sufficient statistics of , aswell as contributions from residual ISI elements .Compared with , the reduced-sized vector containsall the observations relevant to . It is thus possible toconstruct an optimum detector for from in lieuof , provided that the contributions from other symbols tothis output block are properly accounted for, possibly throughISI cancellation and noise tracking. Focusing on one inputsub-block at a time, we will convert the signal model in (4)into a set of sub-models, each describing one of thesufficient statistics , such that each sub-model canbe expressed by shorter channel response vectors comparedwith (4). In the th sub-model depicted by the shaded area inFig. 1(b), we denote the channel response matrix from the thinput sub-block to by , which refers tothe sub-block matrix of at the intersection of the th rowsub-block and the th column sub-block. To be exact, it is theportion of bordered by the th to the thcolumns and the th to the th rows.There are columns in ,

where is the channel response vector

of during the -dependent observation window cov-ering the th to the th symbol periods. It is furtherobserved that when represents the nonzeroportion of each column block and is independent of . In fact,

, which happensto be the -tap FIR MIMO channel impulse response.With these definitions, can be obtained from the generalsignal model (4) by sliding over a -dependent observationwindow of symbol periods, yielding

(16)

where is the zero-mean whiteGaussian noise component that falls within the sliding window.Its covariance matrix is readily availableas a sub-block of . Hence, we have estab-lished reduced-sized sub-models, resulting from overlappingsliding observation windows.


Next, we will show how to utilize the enhanced PDA MUDdetector within each reduced-sized sub-model for symbol-by-symbol detection. Focusing on in the th sliding windowdescribed by (16), we further dissect the channel matrixand represent the nonzero (effective) portion of the channel re-sponse vector for as , which is the th column of

and is independent of . Defineas the posterior probabilities of the

-ary modulated symbol . The conditional mean andvariance of are then given by

(17)

var

(18)

respectively. To make MAP detection on , the task now isto evaluate its posterior probability distribution. We suppose theposterior pdf is Gaussian after the Gaussian forcing approxima-tion, hence, can be fully characterized by its mean and varianceconditioned on .

Defining and

emphasizing each individual element in , we rewrite(16) as

(19)

Equation (19) follows the same structure as (4); hence, theSDE-1 algorithm can be applied within this local time windowfor detecting . To do so, we express the signal componentfrom by

(20)

The conditional mean and covariance of are thusgiven by

(21)

cov

cov (22)

To obtain the conditional mean and covariance of , weadopt the following approximations for all and

(23)

var var (24)

As a result, we have

(25)

cov

(26)

Now, we are ready to use the enhanced PDA frameworkto compute the soft-detection on . As a result of theGaussian forcing approximation, the posterior probabilities of

can be obtained as

(27)

where . Once the posterior probabilitiesare established

from (27), the MAP detection on can be made inthe same manner as described in (10) and (11).

2) Iterative Implementations: As discussed in SDE-1, theevaluation of posterior probabilities for eachinput involves computing and , whichare dependent on not only the unknown of MAIsymbols at the th sampling time but, in addition,on the unknown of ISI symbols .Algorithm implementation via iterative multistage processingis, thus, in order. Similar to the speed-up strategy describedin (14) and (15), we introduce two auxiliary variables thatare instrumental to computational saving in updating the

pair: One is the conditional mean of thenoise term ,given by

(28)


The other is the conditional covariance matrix of inthe form of

var

var

(29)

Both and take into account of the channel responsesto all the elements in and all the ISI components as well.For reasons that will be explained in Section III-B3, we term theupdating of due to the ISI components as soft ISI cancel-lation and that of as noise tracking.

An obvious contrast between (21) and (22) and (28) and (29)shows that

(30)

(31)

Therefore, for within each local window , wehave

(32)

(33)

During iterative processing, and are first up-dated via (30) and (32), using tentative soft informationand from the previous stage. Subsequently, the poste-rior pdf of can be computed from (27), yielding up-dated soft information and of the currentstage. The auxiliary variables and are then updatedvia (30) and (33) using the new values. Bear in mind thatin each sliding window only yields the soft information of oneinput block , but also affects previous overlap-ping windows and future overlapping windows

. Therefore, there are additionalpairs of auxiliary variables that need to beupdated from the new estimates of . The updating of theseISI pairs can be carried out instantaneously when any

and become available or after these soft-informationvalues are updated for all and ,resulting in two implementation procedures of different compu-tational loads.

In the first procedure, we update the related ISI auxiliary pairswhenever and become available for anyand . Following (28), an auxiliary ISI mean can be updated by

(34)

TABLE IISOFT DECISION EQUALIZATION ALGORITHM: SDE-2 (I)

To update , it is observed from(29) that

(35)

where is indepen-

dent of . Based on (35), we can apply the matrix inversionlemma twice to update from and

(36)

(37)

The intermediate matrix inverse plays a similar role tothe updating of as to . The computational loadof updating all the relevant auxiliary variables for each input ison the order of , which representsan approximate reduction compared with the SDE-1 algo-rithm. The overall algorithm is summarized in Table II.

In the above procedure, overlapping time windows are pro-cessed in serial, resulting in a total oftimes updating the auxiliary pairs inside each stage. To reducethe number of covariance matrix inverse to be processed, we


TABLE IIISOFT DECISION EQUALIZATION ALGORITHM: SDE-2 (II)

may take an alternative procedure to update the auxiliary ma-trices . In each stage, we first process thelocal windows in parallel to update the soft decisions of all in-puts , using tentative decisions and auxiliary vari-ables from the previous stage. When all the new soft-informa-tion values are available, we recalculate the auxiliary pairs

, possibly by using their definitions in (28)and (29). On average, there are only pairs of auxiliary vari-ables to be updated inside each stage, and the complexity orderper symbol is the smaller of (whenthe matrix inversion lemma is used) and (whendirect matrix inversion is used). This procedure is summarizedin Table III. The implementations in both Tables II and III areexpected to converge to the SDE-1 algorithm, which have beenverified in our simulations. The second procedure offers a com-plexity advantage when the channel length is much less than theframe size, i.e., .

3) Comparisons With Existing Algorithms: The SDE-2MIMO equalizer resembles a concatenation of a series ofenhanced PDA/SDE-1 detectors, each operating on a truncatedsub-model to reduce the overall complexity. It is worth empha-sizing that the pdf estimators in (9) for SDE-1 and in (27) forSDE-2 are equivalent when both converge to the steady state.The only approximations involved are (23) and (24), which, ifat the steady state, does not incur performance loss. The multi-stage iterative processing nature of the SDE algorithms promptstheir links with the PIC [22] and the soft-input-soft-output(SISO) MMSE detector that was developed for turbo detection.Next, we compare SDE with existing interference cancellationtechniques.

a) Comparison With Hard-Decision PIC: In each trun-cated output vector can be viewed as theinterference to the desired symbol block , and the sub-model (19) can be written as

. This appears to be deduced from (16)using the PIC structure [22]. A close examination shows that the

SDE-2 method treats the tentative decisions on differentlyfrom PIC. An equivalent model for SDE-2 is described as

(38)

where the observation vector is modified to in the form of

(39)

The noise term is independent of and is assumed tobe Gaussian with zero-mean and covariance being equalto cov in (26), i.e.,

(40)

The efficacy of (38) can be established by the fact that bothand in (21) and (22) that are required to fully characterizethe posterior pdf of can be equivalently obtained from(38)–(40). Compared with PIC, our SDE-2 method entails threemajor differences.

i) The finite-alphabet ISI symbols are cancelled outby their soft-decision alternative insteadof tentative hard decisions.

ii) In addition to the soft-decision interference cancellation,the conditional variances of the soft estimates are trackedand lumped into the variance of noise , as seen in(40). In contrast, hard-decision PIC does not change thestatistics of the noise term during iterations. Due tothis key noise tracking step, the formulation in (38) re-tains optimality subject to the Gaussian forcing approxi-mation, whereas the conventional PIC is suboptimum.

iii) The SDE-2 method performs interference cancellationalong overlapping blocks, using a sliding window of size

outputs, such thatall theobservations related toeach input are retained within the corresponding window.The conventional PIC receiver, on the other hand,operateson a nonoverlapping sliding window of size 1.

b) Comparisons Between SDE, PDA, and SISO-MMSE-Based Turbo Detection: The soft-decision iterative processingnature of our SDE algorithms prompts their links to turbo signalprocessing. In turbo detection [9], soft information in the formof log-likelihood ratios (LLR) is exchanged; interchangeably,the PDA, SDE-1, and SDE-2 methods iteratively feed back themeans and variances of Gaussian distributed random symbols.We now compare the multistage Gaussian forcing principle usedin PDA, SDE-1, and SDE-2 with the SISO-MMSE-based turboprinciple for uncoded systems. For different algorithms, we willexplain the posterior probabilities derived for in the generalmodel . The comparison will bebased on binary modulation.

In SDE-1 for binary signaling, the posterior probability dis-tribution of in (9) is reduced to

(41)

where and are given by (5) and (6), respectively.


The PDA algorithm focuses on MUD in synchronous CDMAmultiple access, where is a real-valued, square cross-correla-tion matrix, and the noise variance is . Letrepresent a Gaussian random variable with mean and variance

. Under the special system setup, the PDA MUD establishesa Gaussian model for in the form of [14]

(42)

wherevar , and is a column vector whose th ele-ment is 1, whereas all other components are 0. Premultiplying

on both sides of (42) yields

(43)

The soft decision made by PDA is thus given by

(44)

Using the equality , it can be established that. Therefore, the PDA result in (44) is

the same as that of SDE-1 in (41) in this special case. As anenhancement to PDA, our SDE-1 algorithm does not restrict

to be proportional to , and it applies even whendoes not exist.

The SISO-MMSE method generates an MMSE filtered deci-sion statistic for estimating , where boils down to of [9,(41) and (50)]

(45)

The LLR of is thus given by

(46)

The comparison between SISO-MMSE and PDA (a special caseof SDE-1) is clearly illustrated by the similarities and differ-ences between (43) and (44), and (45) and (46).

As to the SDE-2 method, it is specially tailored to the equal-ization problem with a Toeplitz channel structure; therefore, itis not directly comparable with the existing turbo detectors. In-terestingly, the iterative processing in SDE-2 suggests the flow-chart in Fig. 2, which interprets the algorithm by a turbo struc-ture in which two major function blocks (MUD and ISI can-cellation) exchange information in an iterative manner. SDE-2builds on the Gaussian forcing idea and incorporates slidingwindowing to reduce the overall complexity without sacrificingthe detection performance. The key to retaining optimality afterdata truncation is to carry out the noise tracking step (40) prior toeach reduced-dimension local SDE-1 detection on (38). Noisetracking via Kalman filtering has appeared in the context ofPDA detection for asynchronous CDMA under frequency flatfading [15]. Such a PDA-Kalman tracker cannot be generalizedto track the noise in a channel with a memory lengthsince the underlying dynamic model is no longer first-order,thus obviating Kalman filtering. In our development, we inter-pret noise tracking as a means to update the variance of theISI estimates. This viewpoint allows us to generalize the noise

Fig. 2. Turbo-like flow chart of the SDE-2 algorithm.

tracking method easily to channels with long memory length,and there is no need to perform channel Cholesky decomposi-tion and Kalman filtering, as in [15].

IV. ALGORITHM EVALUATION AND SIMULATIONS

In this section, we investigate the characteristics of ourtwo soft-decision equalization methods through computersimulations. In both the SDE-1 and SDE-2 algorithms, the softdecisions are derived to converge to the sequence MAP esti-mates through multistage iterations; therefore, close-to-optimalsymbol detection performance is anticipated. On the otherhand, Gaussian forcing explains the low-complexity feature ofthese SDE methods. These claims will be verified here by com-parisons with other competing methods, including the optimumML detection by brute-force enumeration, quasi-ML by spheredecoding (SD) [12], [20], and the suboptimal hard-decisionMIMO FIR MMSE-DFE method [6]. Performance metrics ofinterest are the BER performance in both full column-rankand rank-deficient channels, as well as the computationalcomplexity in terms of the number of operations versus theframe data size .

A. BER Performance in Full-Rank MIMO Channels

In the simulated MIMO system, each input–output radio linkis generated independently from the broadband wireless highperformance European radio LAN (HIPERLAN) model [23],[24]. The channels are complex-valued, and the noise is as-sumed to be complex white Gaussian. The time-varying FIRchannels are generated according to the channel model A spec-ified by ETSI for HiperLAN/2 [23], resulting in a maximumchannel memory length of symbols. Each channel tapvaries according to Jakes’ model with a maximum Doppler fre-quency of 52 Hz, corresponding to a typical terminal speed anda carrier frequency of 5.2 GHz.

We study the BER performance versus the signal-to-noiseratio (SNR) of various detectors under different modulationschemes and numbers of antennas. The total transmit power isheld constant, irrespective of the number of transmit antennas.For each given SNR, the simulation keeps running until thenumber of errors for the (near-optimum) sphere decodingalgorithm reaches 100 or greater. With this number of errors,the simulated BER is within 20% of the true BER.

We start with the case of more number of receive antennasthan transmit antennas, i.e., . Due to the block trans-mission structure with a transmit redundancy of padded zeros,the Toeplitz channel matrix in (2) has a full column rank


Fig. 3. Performance comparison under 16/64QAM, N = 1; N = 4.

Fig. 4. Performance comparison under 16/64QAM, N = 2; N = 4.

of and is guaranteed invertibility, irrespective ofchannel nulls [19].

Fig. 3 illustrates the performance comparison of SDE-1,SDE-2, SD, FIR MMSE-DFE, and ML (by brute-force search)for and . The symbol block size is chosenas 8. The results for both 16- and 64-QAM are presented.The same MIMO setup is considered in Fig. 4, except that thenumber of transmit antennas is increased to . In both fig-ures, it can be seen that the BER curves of SDE-1, SDE-2, andSD are nearly identical for different high-bandwidth-efficiencymodulation schemes. They all approach that of the optimumML detection. This corroborates the near-optimum property ofthe reduced-complexity SDE-2 technique. FIR MMSE-DFE,however, experiences nontrivial performance degradation in allthe above scenarios. The performance gap in the case ismore pronounced than that in the case. As increases,the information-theoretic capacity is expected to grow linearlyin , given [1]. MMSE-DFE cannot deliver thedesired performance as capacity-driven MIMO systems exploit

Fig. 5. Performance comparison in fat channel case, N = 3;N = 1.

more transmit antennas. On the other hand, our SDE methods,with their near-optimum performance, are very promisingcandidates to bring the potency of MIMO systems to practice.

B. BER Performance in Rank-Deficient MIMO Channels

Rank-deficient MIMO channels exist in many wireless sce-narios, such as in mobile downlink transmission, where thereis typically more transmit than receive antennas. A so-calledfat channel matrix arises when , which meansthat has more columns than rows, and its pseudo-inverseno longer exists. This poses a significant challenge for symboldetection since a input data vector is projected ontoan output/observation space of a smaller dimension[20]. Such a channel is identifiable only when each distinctfinite-alphabet input can be mapped into a distinct andresolvable output when free of noise. Even when the systemidentifiability condition is satisfied, a rank-deficient channelis difficult to process. First, many detection techniques thatrequire channel invertibility do not apply. This includes thelinear zero-forcing and MMSE detectors [22] and the originalPDA MUD filter [14]. Second, even when a detector does notface implementation difficulty, its detection performance mayexhibit an unacceptably large noise floor. Examples includeFIR MMSE-DFE and SD. In MMSE-DFE, a fat channel matrixseverely reduces the power efficiency of the feedforward filter,which in turn compromises ISI cancellation in the feedbackfilter design. The original SD judiciously uses the latticestructure of the finite-alphabet input data to perform quasi-MLsearch at a low complexity. Unfortunately, such a lattice searchis infeasible for [12], and the generalized SD(GSD) does not preserve optimality due to a reduced-dimensionlattice projection [20].

To investigate the behavior of our SDE methods in thefat-channel case, we choose a MIMO setup withreceive antenna, transmit antennas, and a block size of

. The corresponding channel matrix is thus 13 15 indimension. In Fig. 5, the performance of channel equalizationby brute-force ML is plotted as a baseline, along that ofSDE-1, SDE-2, GSD, and MMSE-DFE. The BER values for


both MMSE-DFE and GSD stay above , even for highSNR. The SDE methods also incur considerable performancedegradation compared with the optimum ML but do not seem toexhibit an error floor. Intuitively, soft-decision-based methodswith Gaussian forcing track the composite covariance of MAIcomponents as noise. Even when there is a rank reduction, orsome MAI components are too close in the signal space, thecomposite noise effect could still retain full rank under theill-conditioned channel, thus leading to convergence in symboldetection. The performance of SDE in this case could be poten-tially improved by input ordering inside the iterative process.Since soft decisions are updated sequentially based on tentativedecisions made previously, a good input-ordering mechanism,which starts the soft-decision estimation from stronger andwell-identifiable inputs to weaker and less-separable inputs,can be expected to enhance the detection performance, possiblyto near-optimum. Finding effective input-ordering methods forSDE is one of our future research topics. Other prominent so-lutions to tackle rank-deficient channels include complex-fieldtransmitter precoding, which will not be discussed here, as wefocus on channel equalization at the receiver end.

C. Complexity Evaluation

As we explained in Section II, the input size in aMIMO system may come from multiple access and/or multipleantennas; therefore, it is potentially very large for a high-ca-pacity MIMO system. The computational load of the optimumML detection is , where is the alphabet size of theinput data. Such complexity is infeasible for a high-capacity(large ), high-throughput (large ) system, which motivatesthe search for near-optimal, low-complexity symbol detectionand channel equalization solutions.

Quasi-ML by sphere decoding entails polynomial complexityon the order of , [12],where is a lower bound for the eigenvalues of the Grammatrix , and is the square of the initial searchingradius. Choosing a large value for improves the BER perfor-mance but also incurs higher complexity. Typically, close-to-op-timal performance can be achieved at a polynomial complexityindex between 3–6.

The SDE algorithms are iterative routines that compute theposterior probabilities of each input symbol in a sequentialfashion. The number of iterations required for convergencevaries from one input to another. Based on our MIMO setups,we have observed from simulations that the posterior proba-bilities typically converge in three to five iterations for higherSNR ( dB), and in seven to 14 iterations for lower SNR( dB). In each iteration, the computational load is mainlycomposed of two parts: One results from computingand the other from evaluating the posterior probabilities using(9) or (27). If we define as the sliding window length interms of the number of sub-blocks, then the number of symbolswithin the window is defined by . The value isalso the size of the covariance matrix . In the followingdiscussions, we evaluate the performance-complexity tradeoffin choosing the window size .

1) Full-Size Windowing : This case correspondsto the SDE-1 algorithm. There is only one full-size window;

Fig. 6. Performance versus m.

therefore, all the output sub-blocks are processed simultane-ously. The dimension of the covariance matrix and theauxiliary matrix are both determined by . Because thematrix inversion lemma is used, the complexity of computing

is . The complexity involved in evaluatingposterior probabilities is per symbol. Hence, theoverall complexity per symbol is on the order of .The complexity for detecting symbols in one iteration is thengiven by . Noting , the complexity ofSDE-1 per symbol is on the second-order polynomial in theinput size and is only linear to the constellation size .

2) Optimum Window Size : This casecorresponds to the SDE-2 algorithm. SDE-2 takes advantageof the sparse Toeplitz structure of the channel matrix to reducethe equalization complexity. The sliding window only containsthe nonzero part of the channel response vector. A symbol canat most affect output blocks in a -memory channel.These output blocks form the sufficient statistics ofeach symbol. As a result, SDE-2 can retain the near-optimumperformance and, at the same time, save the computationalcost. In SDE-2, computing and updating the auxiliarymatrix costs . Finding the pdfs of each symboltakes operations. The overall complexity per symbolis then , and the complexity per iteration is

. Compared with SDE-1, the dimensionof each conditional covariance matrix is reduced from

(corresponding to in SDE-1) to , thuslowering the complexity order of from 2 to 1 per input. Moreimpressively, such a complexity reduction does not inducenoticeable BER performance loss.

3) Suboptimum Window Size : Theimplementation procedures for SDE-2 can be used when thewindow size . As shrinks, the complexity

decreases. However, since the slidingwindow only covers a portion of the channel response for theintended input, the algorithm does not make full use of the suf-ficient statistics, leading to performance degradation. In Fig. 6,we plot the BER curve at a sliding window size of , along


Fig. 7. Complexity versus N .

Fig. 8. Complexity versus N .

with SDE-1 and SDE-2 . It is shown thatthe small window size yields inferior performance to the othertwo near-optimum schemes. In fact, is the optimalwindow length because a longer length does not render betterperformance, but the complexity increases, whereas a shorterlength sacrifices the performance. When , the algorithmbecomes a symbol-by-symbol equalization technique, andtherefore, the performance cannot match up to that of sequencedetection in ISI channels.

The complexity evaluation results are further verified by sim-ulations in Figs. 7 and 8. The system parameters are set to

, and and . The computationalload in terms of the number of operations versus the data blocksize is depicted for each detection method in Fig. 7. OurSDE methods, along with SD and MMSE-DFE, avoid asymp-totic computational explosion suffered by the brute-force MLalgorithm at a large data size. More detailed comparison of theselow-complexity algorithms are illustrated in Fig. 8. In this simu-lation setting, the sphere decoder has the same third-order com-plexity in as the SDE-1 algorithm. This is not always the case,

as the complexity of SD could be higher if the Gram matrixhas very small eigenvalues, and the search radius is chosen tobe large. The overall complexity of the SDE-2 algorithm is be-tween SD and SDE-1, but asymptotically, its complexity orderin is only 2 instead of 3, as witnessed by its close match withthe function at large . Such a reduction in the complexityorder will pay off for high-capacity MIMO systems.

V. SUMMARY

Soft-decision based equalization techniques have been de-veloped in this paper for frequency selective MIMO multipathchannels. Relying on iterative posterior probability updatingand PDA-type Gaussian forcing, the proposed SDE algorithmsattain remarkable near-ML performance at low complexitythat is polynomial (on the third-order) in the input and outputsizes and linear in the modulation constellation size. Unlikeexisting fast MUD algorithms, our development for MIMOchannel equalization relies on zero-padded block transmissionto enable block detection for a sequence detection problemand capitalizes on the distinct Toeplitz channel structure tosimplify the equalization complexity. Near-optimum symboldetection in the presence of channel memory is attained byvirtue of soft-decision MAP multiuser detection, multistage ISIcancellation, and implicit noise tracking. Our algorithms alsoapply to rank-deficient channels, provided that the channelsare identifiable for the signal constellation. This work not onlyestablishes the soft-decision approach as an attractive candidatefor near-optimum channel equalization and symbol detectionfor practical MIMO systems but also enhances the applicabilityof the original PDA filter to a generic integer-LS problemthat is omni-present in many signal processing and wirelesscommunications applications.

REFERENCES

[1] G. J. Foschini and M. J. Gans, “On the limits of wireless communicationsin a fading environment when using multiple antennas,” Wireless Pers.Commun., no. 6, pp. 315–335, 1998.

[2] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linearcodes for minimizing symbol error rate,” IEEE Trans. Inform. Theory,vol. 44, pp. 744–765, Mar. 1998.

[3] M. V. Eyuboglu and S. U. Qureshi, “Reduced-state sequence estimationwith set partitioning and decision feedback,” IEEE Trans. Commun., vol.36, pp. 12–20, Jan. 1988.

[4] A. Duel-Hallen and C. Heegard, “Delayed decision-feedback sequenceestimation,” IEEE Trans. Commun., vol. 37, pp. 428–436, May 1989.

[5] A. Duel-Hallen, “Equalizers for multiple-input multiple-output channelsand PAM systems with cyclostationary input sequences,” IEEE J. Select.Areas Commun., vol. 10, pp. 630–639, Sept. 1992.

[6] N. Al-Dhahir and A. H. Sayed, “The finite-length multi-inputmulti-output MMSE-DFE,” IEEE Trans. Signal Processing, vol. 48,pp. 2921–2936, Oct. 2000.

[7] A. Lozano and C. Papadias, “Layered space-time receivers for frequencyselective wireless channels,” IEEE Trans. Commun., vol. 50, pp. 65–73,Jan. 2002.

[8] C. Komninakis, C. Fragouli, A. H. Sayed, and R. D. Wesel, “Multi-inputmulti-output fading channel tracking and equalization using Kalman es-timation,” IEEE Trans. Signal Processing, vol. 50, pp. 1065–1076, May2002.

[9] X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellationand decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp.1046–1061, July 1999.

[10] M. Tuchler, R. Koetter, and A. C. Singer, “Turbo equalization: Principlesand new results,” IEEE Trans. Commun., vol. 50, pp. 754–767, May2002.


[11] G. Bauch and N. Al-Dhahir, “Reduced-complexity space-time turbo-equalization for frequency selective MIMO channels,” IEEE Trans.Wireless Commun., vol. 1, pp. 819–828, Oct. 2002.

[12] U. Finkce and M. Pohst, “Improved methods for calculating vectors ofshort length in lattice, including a complexity analysis,” Math. Comput.,vol. 44, pp. 463–471, Apr. 1985.

[13] W. K. Ma, T. N. Davidson, K. M. Wong, Z. Q. Luo, and P. C. Ching,“Quasimaximum-likelihood multiuser detection using semi-definite re-laxation with application to synchronous CDMA,” IEEE Trans. SignalProcessing, vol. 50, pp. 912–922, Apr. 2002.

[14] J. Luo, K. R. Pattipati, P. K. Willett, and F. Hasegawa, “Near-optimalmultiuser detection in synchronous CDMA using probabilistic data as-sociation,” IEEE Commun. Lett., vol. 5, no. 9, pp. 361–363, 2001.

[15] D. Pham, J. Luo, K. R. Pattipati, and P. K. Willett, “A PDA-Kalman ap-proach to multiuser detection in asynchronous CDMA,” IEEE Commun.Lett., vol. 6, no. 11, pp. 475–477, 2002.

[16] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Tech-niques, and Software. Norwell, MA: Artech House, 1993.

[17] G. D. Golden, G. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky,“Detection algorithm and initial laboratory results using the V-BLASTspace-time communication architecture,” Electron. Lett., vol. 35, no. 1,pp. 14–15, 1999.

[18] Z. Tian, “Soft decision feedback equalization for pre-coded FIR-MIMOsystems,” in Proc. 2nd IEEE Sensor Array Multichannel Signal Process.Workshop, Rosslyn, VA, Aug. 2002.

[19] Z. Wang and G. B. Giannakis, “Wireless multicarrier communica-tions—Where Fourier meets Shannon,” IEEE Signal Processing Mag.,vol. 17, pp. 29–48, May 2000.

[20] M. O. Damen, K. Abed-Meraim, and J.-C. Belfiore, “A generalized lat-tice decoder for asymmetrical space-time communication architecture,”in Proc. ICASSP, 2000, pp. 2581–2584.

[21] S. Haykin, Adaptive Filter Theory, 3rd ed. Upper Saddle River, NJ:Prentice-Hall, 1996.

[22] S. Verdú, Multiuser Detection. Cambridge, U.K.: Cambridge Univ.Press, 1998.

[23] Channel Models for HIPERLAN/2 in Different Indoor Scenarios, 1998.[24] R. D. J. van Nee, G. A. Awater, M. Morikura, H. Takanashi, M. A. Web-

ster, and K. W. Halford, “New high-rate wireless LAN standards,” IEEECommun. Mag., vol. 37, pp. 82–88, Dec. 1999.

Shoumin Liu (S’02) received the B.S. degreein electrical engineering and the M.S. degree incomputer engineering from Harbin Institute ofTechnology, Harbin, China, in 1993 and 1996,respectively. He is currently pursuing the Ph.D.degree in the Department of Electrical and ComputerEngineering at Michigan Technological University,Houghton.

His research interests include multiple-inputmultiple-output communication systems, soft-inputsoft-output decoding, and near-optimum soft-deci-

sion multiuser detection and equalization.

Zhi Tian (M’98) received the B.E. degree in elec-trical engineering (automation) from the Universityof Science and Technology of China, Hefei, in 1994and the M.S. and Ph.D. degrees from George MasonUniversity, Fairfax, VA, in 1998 and 2000, respec-tively.

From 1995 to 2000, she was a graduate researchassistant with the Center of Excellence in Command,Control, Communications, and Intelligence (C3I),George Mason University. Since August 2000, shehas been an Assistant Professor with the Department

of Electrical and Computer Engineering, Michigan Technological University,Houghton. Her current research focuses on signal processing for wirelesscommunications. She has worked in the areas of digital and wireless communi-cations, statistical array and signal processing, target tracking and data fusion,decision networks, and information fusion.

Dr. Tian serves as an Associate Editor for IEEE TRANSACTIONS ON WIRELESS

COMMUNICATIONS. She is the recipient of a National Science Foundation CA-REER award.

ieee transactions on signal processing, vol. 52 ...after zero padding, each -long...

Documents