parameter estimation of superimposed signals using the em

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, A N D SIGNAL PROCESSING, VOL. 36, NO. 4, APRIL 1988 471

Parameter Estimation of Superimposed Signals Using the EM Algorithm

MEIR FEDER AND EHUD WEINSTEIN

Abstract-We develop a computationally efficient algorithm for parameter estimation of superimposed signals based on the EM algorithm. The idea is to decompose the observed data into their signal components and then to estimate the parameters of each signal component separately. The algorithm iterates back and forth, using the current parameter estimates to decompose the observed data better and thus increase the likelihood of the next parameter estimates. The application of the algorithm to the multipath time delay and to the multiple source location estimation problems is considered.

I. INTRODUCTION HE general problem of interest here may be characterized using the following model:

K T

y ( t ) = S k ( t ; 0,) -k n ( t ) k = 1

where 0, are the vector unknown parameters associated with the kth signal component and n ( t ) stands for the additive noise. This model covers a wide range of problems involving superimposed signals. The specific problem we have in mind is multiple source location estimation; in that case, ( t ; 0,) are the array signals observed from the kth source, and 0 k are the unknown source location parameters.

In this paper, we develop a computationally efficient scheme for the joint estimation of e , , - , OK based on the Estimate Maximize (EM) algorithm. The idea is to decompose the observed data y ( t ) into its signal components and then estimate the parameters of each signal component separately. The algorithm iterates back and forth, using the current parameter estimates to decompose the observed data better and thus improve the next parameter estimates. Under the stated regularity conditions, the algorithm converges to a stationary point of the likelihood function where each iteration cycle increases the likelihood of the estimated parameters.

Manuscript received February 20, 1986; revised September 28, 1987. This work was supported in part by the MIT-WHO1 Joint Program, in part by the Advanced Research Projects Agency monitored by ONR under Con- tract NO00144 1-K-0742 and the National Science Foundation under Grant ECS-8407285 in MIT, and in part by the ONR under Contract N00014-85- K-0272 in WHOI. It is WHO1 Contribution 6584.

M. Feder is with the Department of Electrical Engineering and Com- puter Science, Massachusetts Institute of Technology, Cambridge, MA 02139, and the Department of Ocean Engineering, Woods Hole Oceano- graphic Institution, Woods Hole, MA 02543.

E. Weinstein is with the Department of Electronic Systems, Faculty of Engineering, Tel-Aviv University, Tel-Aviv 69978, Israel, and the De- partment of Ocean Engineering, Woods Hole Oceanographic Institution, Woods Hole, MA 02543.

IEEE Log Number 8719013.

The results developed in this paper can be viewed as a generalization of the results presented by the authors in [l] and [2]. We note that the idea of iteratively decomposing the observed signal and estimating the parameters of each signal component separately has been previously proposed in several applications (e.g., [3]-[5]). How- ever, in most cases, the approach is ad hoc, and there is no proof of convergence of the algorithm. As will be shown, the EM method suggests a very specific way of decomposing the observed signals, leading to a numeri- cally as well as statistically stable algorithm.

The paper is organized as follows. In Section I1 the EM method is represented following the derivation in [6]. In Section I11 the EM method is applied to the parameter estimation of superimposed signals, and the basic algorithm is developed for deterministic (known) signals and for stationary Gaussian signals. In Section IV the algorithm is applied to the multipath time-delay estimation, in Section V the algorithm is applied to the multiple source angle of arrival estimation, and in Section VI we sum- marize the results.

11. MAXIMUM LIKELIHOOD ESTIMATION V I A THE EM ALGORITHM

The EM algorithm, developed in [6], is a general method for solving maximum likelihood (ML) estimation problems given incomplete data. The considerations leading to the EM algorithm are given below.

Let Y denote the observed (incomplete) data, possess- ing the probability densityfy( y; 0 ) indexed by the parameter vector 0 E 8 C R K , and let X denote the “complete” data, related to Y by

H(X) = Y (1 )

where H ( ) is a noninvertible (many-to-one) transformation. Express densities

f a x ; 0 ) = f x / Y = y ( x ; 0 ) * f r (y ; 0) v H ( x ) = Y

( 2 ) where f x ( x ; 0 ) is the probability density associated with X and f X l y = y (x; 0 ) is the conditional probability density of X given Y = y. Taking the logarithm on both sides of (2) 9

lOgfy(Y; 0 ) = logfx(x; 0) - logfx/Y=y(x; 0 ) . ( 3 ) Taking the conditional expectation given Y = y at a

0096-3518/88/0400-0477$01 .OO O 1988 IEEE

478 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36, NO. 4. APRIL 1988

parameter value e’, logfy(y; 0) = E{logfx(x; e ) / Y = Y; e’}

- E{logfx/y=y(x; e ) / Y = y; O r } .

( 4 )

L ( e ) = logfy(Y; e > ( 5 )

u(e, e l ) = E { iogfx(x; O ) / Y = Y; o r } ( 6 )

v(e, e!) = E{ iogfx/y=y(x; e ) /y = Y; e r } . ( 7 )

~ ( e ) = u(e, er ) - q e , o r ) . ( 8 )

Define, for convenience,

and

With these definitions, (4) reads

We identify L ( 8 ), the log-likelihood of the observed

cannot be found, and thus the computation of U ( e , O r ) at each iteration cycle generally requires multiple integra- tion. Appendix A has importance of its own since it covers a wide range of problems.

We note that the EM algorithm is not uniquely specified. The transformation H ( ) relating X to Y can be any noninvertible transformation. Obviously, there are many possible “complete” data specifications X that will gen- erate the observed data Y . However, the choice of “complete” data may critically affect the complexity and the rate of convergence of the algorithm, and the unfortunate choice of “complete” data may yield a completely use- less algorithm.

In the next section, it will be shown that for the class of problems involving superimposed signals, there is a natural choice of “complete” data, leading to a surpris- ingly simple algorithm to extract the ML parameter estimates.

data, as the function we want to maximize. Invoking the Jensen’s inequality,

(9)

111. PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS IN NOISE

The general problem of interest here may be character- v(e, e l ) 5 v(er, e r ) . Hence, if ized using the following model:

u(e, e r ) > u(w, e r ) then

~ ( e ) > q w ) . (10)

The relation in (10) forms the basis for the EM algorithm. The algorithm starts with an arbitrary initial guess 6“’, and denote by the current estimate of 8 after n iterations of the algorithm. Then, the next iteration cycle can be described in two steps, as follows:

E step: Compute

u(e, 6 ( n ) ) . (11)

(12)

M step:

Max U ( 0 , 6‘”)) + 6(”+’) . 0

If U ( 0 , e‘) is continuous in both 8 and O r , the algorithm converges to a stationary point of the log-likelihood function [7] where the maximization in (12) ensures that each iteration cycle increases the likelihood. Of course, as in the case of all “hill climbing” algorithms, the convergence point may not be the global maximum of the likelihood function, and thus several starting points may be needed. The rate of convergence of the algorithm is exponential, depending on the fraction of the covariance of the “complete” data that can be predicted using the observed (incomplete) data [6], [8]. If that fraction is small, the rate of convergence tends to be slow, in which case one could use standard numerical methods to accel- erate the algorithm.

In the Appendix we derive a closed-form analytical expression for U ( 8, O r ) for the case where X and Y are jointly Gaussian, related by a linear transformation. We note that in general a closed-form analytical expression

K

where Ok are the vector unknown parameters associated with the kth signal component and n ( t ) denotes the additive noise. This model covers a wide range of problems arising in array and signal processing, including multitone frequency estimation, multipath time-delay estimation, and multiple source location estimation.

Given observations of y ( t ) , we want to find the ML estimate of e l , e*, * - * , 0 K jointly. We shall now consider the cases of deterministic (known) signals and stochastic Gaussian signals separately.

A. Deterministic Signals Consider the model of (13) under the following as-

sumptions. 1) Thes,( t ; Q k ) , k = 1, 2, * - , K, are conditionally

known up to the vector parameters O k . 2) The n ( t ) are vector zero-mean white Gaussian pro-

cesses whose covariance matrix is E { n ( t ) n* ( a) } = Q

Under these assumptions, the log-likelihood function is - 6 ( t - a).

given by (e.g., see [9])

K

Q-’[y( t ) - k = 1 ~ d t ; e,)] dt (14)

where = 1 if n ( t ) is real-valued, X = 2 if n ( t ) is complex-valued, and c is a normalizing constant. In the case of discrete-time observations, L ( 8 ) is still given by (14) where the integral is replaced by the sum over all discrete

FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS 419

points t E T. Thus, to obtain the ML estimate of the various Ok’s, one therefore must solve

In that case, the log-likelihood of the complete data x( t) is

K * A - ’ [ x ( t ) - s ( t ; e ) ] dt Q-’[Y(t) - C S k ( t ; e k ) ] d t ] . (15)

= d + s * ( t ; e ) A - ’ x ( t ) dt k- I

2 T This is a complicated multiparameter optimization problem. Of course, brute force can always be used to

coarse grid to locate roughly the global minimum and then

+ l x*(t) A - ’ s ( t ; e ) dt solve the problem, evaluating the objective function on a 2 T

applying the Gauss method, the Newton-Raphson, or - 1 J s* ( t ; e, A - ~ s ( t ; e ) dt ( 2 2 ) some other gradient-search iterative algorithm. However, 2 T when applied to the Problem at hand, these methods tend where d contains all the terms that are independent of 0.

s (t; 0 ) and A are the mean and the covariance matrix of x(t) given, respectively, by

to be computationally complex and time consuming. Having the EM algorithm in mind, we want to simplify

the computations involved. To apply the algorithm to the problem- at hand, the first step is io specify the “complete” data. A natural choice for the “complete” data x ( t ) is obtained by decomposing the observed data y ( 1 ) into its signal components; that is,

1 . 1 \ I

where

Xk(t) = sk(t; e k ) + nk(t) (17)

where the s(t) are obtained by arbitrarily decomposing the total noise n ( t ) into the K components, so that

where the notation in (24) indicates that A is a block-diagonal matrix. Taking the conditional expectation of (22) given y ( t ) at a parameter value O f ,

K

From (13), (17), and (18), the relation between the “complete” data x ( t ) and the incomplete data y ( t ) is given by

K

y ( t ) = C xk(t) = H ’ X(t) (19) k = 1

where

L ( 2 0 ) H = [ Z Z . * * Z ] . v

Kterms

We shall find it most convenient to let the nk( t) De statistically independent, zero-mean, and Gaussian with the covariance matrix E{ nk(t) nk*(u)} = QkS(t - a) where Qk = &Q and the &’S are arbitrary real-valued scalars satisfying

s*( t ; e) A - ’ i ( t ) dt

* A - ’ [ i ( t ) - s ( t ; e)] dt ( 2 5 )

where

f ( t ) = E{x(t)/y(t); O f } ( 2 6 )

and e is a constant independent of 8. Since x ( t ) and y ( t ) are jointly Gaussian, related by the linear transformation y ( t ) = Hx(t), then using (A7) of the Appendix,

480 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36, NO. 4, APRIL 1988

Substituting (23) and (24) into (25) and following straightforward matrix manipulations, we obtain

X K w e , e') = e - ,zl S, [w - sk(t ; e,)] *

Q i ' [ % ( t ) - s , ( t ; e , ) ] dt (28) where the a,( t ) are the components of P ( t ) given by

i , ( t ) = s , ( t ; ett) + O , [ y ( t ) - k = i 1 S k ( t ; e;)]. (29)

Observing that the maximization of V ( 0 , e') with respect to 8 is equivalent to the minimization of each of the terms in the k sum of (28) separately, the EM algorithm assumes the following form.

Estep: Fork = 1, 2, - , K , compute

i :" ' ( t ) = s, ( t ; ep) + 0, [ y ( t ) - ,Il c s , ( t ; e r n ) ) ] .

( 30 )

Mstep: Fork = 1, 2, * - * 7 K ,

min j [ i f ' ( t ) - sk(t; e , ) ]* e k T

Q - ' [ i r ' ( t ) - sk( t ; e,)] dt -+ @ F + l ' . (31) We observe that @p+" is, in fact, the ML estimate of

8, based on a:"' ( t ) . The algorithm is illustrated in Fig. 1 . We note that in the case of discrete observations, the integral in (31) is substituted by the sum over all points t E T.

The most striking feature of the algorithm is that it decouples the complicated multiparameter optimization into K separate ML optimizations. Hence, the complexity of the algorithm is essentially unaffected by the assumed number of signal components. As K increases, we have to increase the number of ML processors in parallel; however, each ML processor is maximized separately. Since the algorithm is based on the EM method, each iteration cycle increases the likelihood until convergence is accomplished.

We note that the 0,'s must satisfy the constraint stated in (21), but otherwise they are arbitrary free variables in the algorithm. The 0,'s can be used to control the rate of convergence of the algorithm and possibly to avoid the convergence to an unwanted stationary point of the algorithm. These considerations are currently under investi- gation.

B. Gaussian Signals Consider the model of (13) under the following as-

sumptions. 1) n ( t ) and s , ( t ; e,), k = 1, 2, * - , K , are mutually

uncorrelated, wide-sense stationary (WSS), zero-mean Gaussian vector stochastic processes whose spectral density matrices areN(w) andSk(w; e,), k = 1, 2, * - * , K , respectively.

- , , " + / I

4:"' fil") 81

Fig. 1 . The EM algorithm for deterministic (known) signals.

2) The observation interval T is long compared to the correlation time (inverse bandwidth) of the signals and the noise, Le., WT/2n >> 1 .

Fourier analyzing y ( t ) , we obtain

Under the above assumptions, y ( t ) is WSS, zero-mean, and Gaussian. Thus, for WT/27r >> 1 , the Y ( w l ) ' s are asymptotically uncorrelated, zero-mean, and Gaussian with covariance matrix P ( w l ; e ) where P ( w ; 0 ) is the spectral density matrix of y ( t ) given by

K

P ( W ; e) = C e,) + N(u). (33) k = 1

The log-likelihood function given the Y ( q ) ' s is therefore given by

q e ) = - C [logdet rp (wl ; e) 1

+ Y*(Wl) P - ' ( q ; 0) y ( 4 ] (34)

where the summation in (34) is carried over all wI in the signal frequency band. In the case of discrete observations y , = y (iAt), the log-likelihood is still given by (34) where the Y ( w l ) ' s are the discrete Fourier transform (DFT) of the y l ' s . To obtain the ML estimate of the various Ok's, we therefore must solve

min [T [logdet P(w,; e) 81,82,' ' ' ,8k

+ Y * ( ~ , ) P - ' ( ~ ~ ; e) Y ( ~ ~ ) ] . (35) I We want to use the EM method to bypass this compli-

cated multiparameter optimization. Following the same considerations as in the deterministic signal case, let the "complete" data x ( t ) be given by

where

T l

FEDER A N D WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS 481

and the n , ( t ) are chosen to be mutually uncorrelated, zero-mean, and Gaussian with respective spectral density matrices Nk( w ) = 0, - N ( w ) where the &'S are arbitrary real-valued constants subject to (2 1).

Following the same considerations leading to (34), the log-likelihood of the "complete" data is given by

logfx(x; e ) = -C [logdet T A ( ~ , ; e) 1

+ X * ( 4 0) X(4] = - 7 [logdet T A ( u [ ; 0)

+ tr (A-'(ai; 0) X ( w i > X * ( w 1 ) ) ] (38)

where

(39)

Exploiting the block-diagonal form of A ( w ; e), (43) becomes

where P ( w l ; e ) is defined in (33). Observing that the maximization of U ( 8, e') with respect to 8 is equivalent to the minimization of each of the terms in the k sum of (46) separately, the EM algorithm assumes the following form.

(40)

Mstep: Fork = 1, 2,

min C [ logdet A,( w l ; e,) , K ,

A,(@; 0,) = &(w; 0,) + P k ' N ( ' " ) . (42) Taking the conditional expectation of (38) given Y ( w l )

at a parameter value e', 8' I

We observe that 6?+" is the ML estimate of 0, where the sufficient statistic X,( w,) X,* (or) is substituted by its

current estimate X k ~ l ) ' n ' . The algorithm is illustrated in Fig. 2. The most attractive feature of the algorithm is that it decouples the full multidimensional optimization (35) into optimizations in the smaller-dimensional parameter subspaces. Thus, as in the deterministic signal case, the complexity of the algorithm is essentially unaffected by the assumed number of signal components. As K increases, we have to increase the number of ML processors in parallel; however, an ML processor is maximized separately. Since the algorithm is based on the EM method, each iteration cycle increases the likelihood until convergence is accomplished.

482 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING. VOL. 36, NO. 4. APRIL 1988

$"I A i " ) -, Fig. 2. The EM algorithm for stochastic Gaussian signals.

IV. APPLICATION TO MULTIPATH TIME-DELAY ESTIMATION

Let the observed signal y ( t ) be modeled by

K

y ( t ) = kzl a k s ( l - rk) + n ( t ) . (50 )

This model characterizes multipath effects where the transmitted signal s ( t ) is observed at the receiver through more than one path. We shall concentrate here on the case where s ( t ) is a deterministic known signal. The extension to the case where s ( t ) is deterministic but unknown, and to the case where s ( t ) is a sample function from a Gauss- ian random process, can be found in [ l l ] and [l] , respectively. We suppose that the additive noise n ( t ) is spectrally flat over the receiver frequency band. The problem is to estimate the pairs (q, rk), k =

The direct ML approach requires (1511

1 , 2 , e - . , K . the solution to [see

This optimization problem is addressed in [lo], where it is shown that, at the optimum, the (Yk's can be expressed in terms of the Tk's. Thus, the 2 K-dimensional search can be reduced to a K-dimensional search. Howeyer, as pointed out in [lo], for K 2 3 the required computations become too involved. Consequently, ad hoc approaches and suboptimal solutions have been proposed. The most common solution consists of correlating y ( t ) with a rep- lica of s ( t ) and searching for the K highest peaks of the correlation function. If the various paths are resolvable, i.e., the difference between Tk and T~ is long compared to the temporal correlation of the signal for all combinations of k and 1, this approach yields near-optimal estimates. However, in situations where the signal paths are unre- solvable, this approach is only distinctly suboptimal.

We identify the model in (50) as a special case of (13). Therefore, in correspondence with (30) and (31), we obtain the following algorithm.

Estep: Fork = 1 , 2, * , K, compute

$ ' ( t ) = &u:"'s(t - $')

1 + bk[ y ( t ) - 5 &l"'s(t - ?I"') . (52) I = 1

M s t e p : Fork = 1 , 2, * , K,

(53)

Assuming that the observation interval T is long compared to the duration of the signal and the maximum ex- pected delay, the two-parameter maximization required in (53) can be carried out in two steps as follows:

(54)

where E = j T I s ( t ) I dt is the signal energy and P

(56) &r) = JT i l " ' ( t ) s * ( t - r ) dt.

We note that gp'( r ) can be generated by passing if'( t ) through a filter matched to s ( t ) . The algorithm is illustrated in Fig. 3. This computationally attractive algorithm decreases iteratively the objective function in (5 1 ) without ever going through the indicated multiparameter optimization. The complexity of the algorithm is essentially unaffected by the assumed number of signal paths. As K increases, we have to increase the number of matched fil- ters in parallel; however, each matched filter output is maximized separately.

If the number K of signal paths is unknown, several criteria for its determination are developed in [ 121. These criteria are based on the ML parameter estimates and can therefore be easily incorporated into the algorithm.

To demonstrate the performance of the algorithm, we have considered the following example. The observed signal y ( t ) consists of three signal paths in additive noise

3

y ( t ) = c (.kS(t - 7 k ) + n ( t ) k = I

where s ( t ) is the trapezoidal pulse

O ~ t < 5

s(t) = 1/4 5 j t 5 1 5 [i:: 10)/20 15 < t 20.

The actual delays are

71 = 0 7 2 = 5 7 3 = 10,

and the amplitude scales are

(Yk = 1 K = 1 , 2, 3.

The additive noise is spectrally flat with a spectral level of N = 0.025 (so that the postintegration signal-to-noise ratio (SNR) per pulse is approximately 16 dB). The observed data consist of 100 time samples, as illustrated in

l ' - 1 1 1

FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS

PEAK n ("+I! n ( " + I /

I S E L E C T O R k a ' "I -- Y ill I SIGNAL I

n ( " t l ) ("*I! 'K , T K

MATCHED PEAK FILTER SELECTOR

Fig. 3. The EM algorithm for multipath time-delay estimation.

Fig. 4. In Fig. 5 we have plotted the matched filter output as a function of delay. As we can see, the conventional method cannot resolve the various signal paths and estimate their parameters.

First, we have computed the ML estimates by direct minimization of the objective function in (51) using ex- haustive search. We obtain

?I = 0.0117 ?2 = 5.0031 ?3 = 9.9884

& I = 1.1511 &2 = 0.7799 &3 = 0.9471,

and the value of the objective function at the minimum (corresponds to the value of the log-likelihood function at the maximum) is

J = 0.45879.

We have also computed lower bounds on the root mean square (rms) error of each parameter estimate using the Cramer-Rao inequality. We obtain

a(?l) = 0.028 a(?2) = 0.030 a(?3) = 0.028

~ ( & 1 ) = 0.076 ~ ( & 2 ) = 0.079 ~ ( & 3 ) = 0.076.

a ( ? k ) denotes the minimum attainable rms error in the Tk estimate, and ( T ( & k ) denotes the minimum attainable rms error in the (Yk estimate.

We now apply our algorithm. In Fig. 6 we have plotted the matched filter response to the various signal paths, as they are evolved with the iterations. In Fig. 7 we have tabulated the results using several arbitrarily selected starting points as indicated by the first line of each table. We see that after 10-15 iterations, the algorithm converges within the Cramer-Rao lower bound to ML estimates of all the unknown parameters, regardless of the initial guess. We note that the small differences in the fi- nal estimates result from the finite grid used for the optimization.

Using the asymptotic efficiency and lack of bias of the ML estimates, we can claim with some confidence that the rms error performance of the algorithm is the mini-

483

I -40 -30 -20 -10 0 10 20 30 40 50 60

Fig. 4 . The observed data.

Fig. 5 . The conventional matched filter response.

-80 -60-40-20 0 20 40 60 80

-+: i I : I -80-60-40-20 0 20 40 60 80

mum attainablk characterized by the- Cramer-Rao lower Fig. 6. The matched filter response to each signal path

bound.

V. PASSIVE MULTIPLE SOURCE LOCATION ESTIMATION The basic system of interest here consists of K spatially

distributed sources radiating noise-like signals toward an

propagation conditions in the medium, and ignoring amplitude attenuations of the signal wavefronts across the

array, the actual waveform observed at the mth sensor Output is

K

array of M spatially distributed sensors. Assuming perfect y m ( t ) = S k ( f - T k m ) + n r n ( f ) k = I

m = 1,2, , M (57)

484 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. VOL. 36, NO. 4, APRIL 1988

E . . - ,o 1 , - ., . * : I., ( ' I *, I I ", . S I :- , . : , . - . , , L 11, ., - , I i , l G 23 - 4

: , .: .] , I , .l I 3 2 .. c .... s, I- -0 il 3 . . . . . . , I ...... > j . A, ,,, " 1 e .- ....... . . . . . . . . . . . . . . . . . I . . ......,,......,.. , , , . . I >.,

-

C - N

<U ~1 0 0 0 0 0 0 0 0 0 0 0 0 I> 0 i) 0

. . . . . . . . . . . . . . . . E , 0 3 u f, " c1 i) i> i J .> I, 0 0 3 0

? , , - 7 4,: 3 a , <a 5 u "1 m, u. 7 -.a c- c) c, L. P- 3 " - r- ,..I dJ .1 >I : I - 0 0 0 b 1 ld C J O1 4 h l L? - 0 0 0 0 0 0 0

" > 11, c 4 - 0 3 0 0 0 0 CI 0 0 0 0 0 . . . . . . . . . . . . . . . . -. .... - .. -. -. .- - .- .. I -. ... 4

o D o 4 c vi -0 I- 01 r i 0 r- m LI - L: o m m n o N ( I .: - n m - r i 3 r r- o t Q - N m o n w e r m v. o u- o o o m r m o o r m o o o ~ ~ u o . . . . . . . . . . . . . . . .

n"'.*, 0

C - n <I- - . . . .

I I I I ~~

o m r- P D m m ~ i l r o m b i - Q m o o m "7 o m - r a e m o P N - 0 o o y7 D m - D li: m m - o o o o o o . . . . . . . . . . . . . . . . k, , I 0 c t 1 8 - 0 o 0 0 c .a 0 0 0 . . - - I I ! I I , I I I I I I I O

o - ( 8 m o m o r a e o - : I m u %I 1 .. _. - ..

o r - - o m r m o m r- o c - - r- 2 u m o r 3 r i o Q u o P m m - * e U) - m ~ u ~ o m ~ m o m . - 3 a o n - 0- 4 n o m o o Q m .> c -.- -- o c =. - . . . . . . . . . . . . . . . . E Q P m N - 0 0 0 0 0 0 0 0 0 0 0 2 I - Q 51 m G- m I I ~ o N n a or- - o N r-

P F *a or m o m m Q - N o m m - o m P 1.1 - I- n m u? N o m r- o Q Q n -. e ,o L- u o n -3 u i n u e v o e o - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

o o N P m Q P r o o r* N m I- - N

. . . . . . . . . . . . . . . . O O O O O i J O O O O O O O O O O

o ~ N ~ ~ o - ~ - N ~ N o - P 9 i o - m m I- m N o J o -. L I Y -. -7 o m o o I o r - m -3 J m c .- o 3

n n o a I - r - r - m m m m m m o a - . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 3 ' > o e

r- I 1

. . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 7 0 0 0 c 0 o 3

o m - r i ~ r - c - o r - . > - - o Q ca o P m N I- o r a 3 v - ., -1 .. o o c - e m - m b? Q P- Q R D c- o b i Q a r P- r- m m m m p1 w Y w n > . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 3 . 3 C. 0

o n - r d c - r Q pi w r i . 3. F .i o t-4 m m r* ra o o lil F - v 4 :, .. J o Q r r, r o m u ri 9 (1 - o o o o . . . . . . . . . . . . . . . . 0 P m k l P b> m - 3 0 0 0 0 0 u b7 0 m lr - -4 - - -I 4 .. 4 I 4

1 %-

o N r* - Q N o r 4 Q m c e m r.1 - .- O N ~ N I N - ~ - C ~ O ~ ~ N O o m 91 m r- o m n r n m - o o n ~ m r - ~ m n m ~ - o o o o o o o . . . . . . . . . . . . . . . .

o - N m o n 4 I- m P o - r q 1 1 c *> - 4 - .. - -

-Ir 11

FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS 485

where sk ( t ) is the kth source signal, n, ( t ) is the additive noise at the rnth sensor output, and Tkm is the travel time of the signal wavefront from the kth source to the rnth sensor.

Information concerning the various source location parameters can be extracted by measuring the various Tkm.

In the passive case, one can only measure the travel time differences, obtainable by selecting one sensor as a reference and comparing its output to that of every other sensor. If we let sensor M be the reference and set 7km = 0, then Tkm measures the travel time difference of the kth signal wavefront to the ( m , M ) sensor pair.

To simplify the exposition, suppose that the various signal sources are relatively far-field so that the observed signal wavefronts are essentially planar across the array. If we further suppose that the array sensors are colinear, then

where d, is the spacing between sensors m and M, c is the velocity of propagation in the medium, and d k is the angle of arrival of the kth signal wavefront relative to the boresight.

Substituting (58) into (57) and concatenating the various equations, we obtain

where

where yrn = d , / c . Suppose that the various sk ( t ) and the various n, ( t ) are

mutually independent, WSS , zero-mean Gaussian random processes with spectral densities & ( w ) and N, (w) , respectively (the cases where the sk( t ) are deterministic known/unknown signals are presented in [2] and [ 1 11, respectively). We want to find the ML estimates of e l , &, . . . ,

Assuming that the observation interval T is long compared to the correlation time (inverse bandwidth) of the signals and the noises, the direct ML approach requires the solution to [see (35)]

given observations of y ( t ).

T '

min c [logdet P ( q ; 0) + Y * ( q ) Bt,el,. . ,eK 1

- P-'(wl; e ) Y ( w , ) ] . (61 1 where Y ( w I ) are the Fourier transform coefficients [or the DFT in the discrete case of y ( t ) ] and

K

P ( w ; 8 ) = c s k ( w ) v(w; 0, ) v*(w; 6,) + N ( w ) k = 1

(62) where

and

1

This is a complicated multiparameter optimization problem in several unknowns. Consequently, numerous ad hoc solutions and suboptimal approaches have been proposed in the recent literature (e.g., [13]-[18]). Still, the most common approach consists of beamforming and searching for the K highest peaks. If the various signal sources are widely separated, this approach is nearly optimal. However, if the source signals are closely spaced, we are likely to obtain poor estimates of the various 0 k ' s .

Identifying the model in (59) as a special case of (13), the algorithm specified by (48) and (49) is directly appli- cable, where

Ak(w; e,) = & ( w ) V ( w ; 8,) v*(w; 6, ) + P k N ( W ) .

( 6 5 ) Now,

1 det & ( W ; 0,) = 1 + - s k ( W ) v*(Cd; 6,) [ P k

486 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. VOL. 36. NO. 4, APRIL 1988

Substituting (66) and (67) into (49) and ignoring the terms that are independent of e,, the M step can be further simplified. The resulting algorithm is as follows.

X m ) ' " ) = &(al; 8:"') - Ak(wl; 8:"') * P-'( 4; 6'"') A,( q ; 8:"') + &(a1; 8:"') P - ' ( q ; e'"')

E step: For k = 1, 2 , , K , compute

Y ( 4 Y " ( 4 (68)

P-'( w l ; e'"') A,( w l ; e ? ) ) . Mstep: Fork = 1, 2 , * 9 K ,

. N - ' ( ~ / ) e ) -+ a:"+? We note that the objective function in (69) is the array

beamformer, where the product X, ( w , ) Xz (q) is substituted by its current estimate X k m l ) ( " ) . The algorithm is illustrated in Fig. 8. This computationally attractive algorithm decreases iteratively the objective function in (61) without ever going through the indicated multiparameter optimization. The complexity of the algorithm is essentially unaffected by the assumed number of signal sources. As K increases, we have to increase the number of beamformers in parallel; however, each beamformer output is maximized separately.

To demonstrate the performance of the algorithm, we have considered the following example. The array consists of five colinear and evenly spaced sensors. There are two far-field signal sources at bearings

el = 00 e2 = 100

relative to the boresight. The array-source geometry is shown in Fig. 9. The signals and the noises are spectrally flat with S, ( w ) = S and N,,, (a) = N over the frequency band [ - W / 2 , W/2]. We assume that S I N = 1 and that WT/2n = 20 (so that the postintegration SNR per chan- nel is approximately 23 dB). The array length is taken to be L = 6X where X is the wavelength associated with the highest frequency in the signal band.

In Fig. 10 we have plotted the array beamformer response as a function of the bearing angle. As we can see, the conventional beamformer cannot resolve between the signal sources and estimate their bearings.

The ML estimates, obtained by direct minimization of the objective function in (61), are

8, = -0.0563 8 2 = 10.4556,

and the value of the objective function at the minimum (corresponds to the value of the log-likelihood function at the maximum) is

J = 159.0137.

B O

Fig. 9. Array-source geometry.

t

I 1 * -90 -72 -54 -36 -18 o 18 36 54 72 90 8

Fig. 10. The conventional beamformer.

We have also computed the Cramer-Rao lower bound on the rms error of each parameter estimate. We obtain

. (el) = 0.2680 ~ ( 8 , ) = 0.2722.

We now apply our algorithm. In Fig. 11 we have plotted the beamformer response to the various signal sources as they are evolved with iterations. In Fig. 12 we have tabulated the results using several arbitrarily selected initial guesses. We see that in all cases, after 5-10 iterations, the algorithm essentially converges, within the Cramer- Rao lower bound, to the ML estimates of all the unknown bearing parameters simultaneously, and the various signal sources are correctly resolved.

VI. DISCUSSION We have presented a computationally efficient algo-

rithm for ML parameter estimation of superimposed signals based on the EM method. The algorithm is developed

r i I I I I I

FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS

n - 0 1 2 3 4 5 6 7 8 9

-- - P 2.0000 1.5776 1.1220 .4932 .3716 .1572 .os00

- -0036 -. 0304 -. 0304

I l l I l I I I 4 1 :

-0.0-7.5 -5.0 -2.5 0 2.5 5.0 7.5 10.0 12.5 15.0;

t n= I

L I I I I I I I I t I * -10.0 7.5 -51) -25 0 2.5 91) 7.5 10.0 12.5 15.08

Fig. 11. The beamfonner response to each signal source.

- 8.0000 8.4652 8.9744 9.4568 9.8588 10.1268 10.2076 10.3660 10.4216 10.4404

J - 269.6614 241.1230 210.3016 184.1323 168.3571 161.7404 159.7 169 159. I956 159.0440 159.0218

n - 0 1 2 3 4 5 6 7 8 9 10 1 1 12 13

6 (n) I 4.0000 3.6412 3.2928 2.9176 2.4888 2.0332 1.5240 1.0148

.5860

.2912

.1304

.OS00 - ,0636 -. 0304

- 7.0006 7.2592 7.5272 7.8220 8.1436 8.5188 8.9476 9.4032 9. 80S2

16. 1000 10.2608 10..3680 10.4216 10.4484

- J . 378.5662 35 1.2384. 328.5127 306.1186 281.8299 254.9325 224.2842 194.4633 173.5086 163.5225 160.3573 159.3599 159.6797 159.0218

n - 0 1 2 3 4 5 6 7 8

-5.0000 -2.4692 -1.2632 - -6736 -. 3788 -. 2440 -. 1644 -. I108 -. 0840

- 13.0000 1 11.1452 1. 9492

10.7164 10.5280 10.4484 10.4216 10.4216 10.4216

J - 520.2958 294.7128 196.5459 167.7777 161.0492 159.6113 159.2048 1 59.0730 159.0405

n - 0 1 . 2 3 4

6 7 8 9 10 1 1 12 13 14 15 16

c

p ) I 7.0000 6.0532 4.9544 3.8288 2.7836 1.9528 1.3632 .9344 .6396 .4520 .3180 .2108 ,1304 .0768 .os00 .0232 - * 0036

13. ElWO 12.6996 12.3780 12.08.32 11.8420 1 1.6276 11.4132 1 1.2256 1 1.0380 10.9040 10.7960 10.7164 10.6360 10.5024 10.5555 10.5288 10.5020

J - 497.0527 448.0688 385.8327 318.2541 256.9945 214.1421 109.1836 175.1 le9 167.4013 i 63.6597 161.5851

159.6668

159.2103 159.1171 159.0536

160. 3e80

159. 33351

Fig. 12. Tables of results for the multiple source angle-of-arrival estimation.

488 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 36. NO. 4. APRIL 1988

for the case of deterministic (known) signals and for the case of stationary Gaussian signals. The most striking feature of the algorithm is that it decouples the full multidimensional search associated with the direct ML approach into searches in smaller-dimensional parameter subspaces, leading to a considerable simplification in the computations involved. The algorithm is applied to the multipath time-delay and multiple source location estimation problems; in both cases, we demonstrate the performance of the algorithm in a given example and show that the algorithm converges iteratively to the exact ML estimate of all the unknown parameters simultaneously where each iteration increases the likelihood.

We finally note that the derivation of the EM algorithm for the linear-Gaussian case (the Appendix) has importance of its own since it covers a wide range of applications.

APPENDIX DEVELOPMENT OF THE EM ALGORITHM FOR THE

LINEAR-GAUSSIAN CASE

Suppose the "complete" data X and the observed (incomplete) data Yare related by the linear transformation

Y = HX ('41)

where H is a noninvertible matrix and X possesses the following multivariate Gaussian probability density:

where h = 1 if X is real-valued and valued. Taking the logarithm of (A2),

= 2 if X is complex-

x 2

= c - - [logdet A ( 8 ) + m * ( 8 ) A-' (8 )

. m ( e ) - x*A-'(e) m ( e )

- m * ( e ) A-'(e)x + tr ( ~ ( ~ ) x x * ) ]

(A3 1 where c is a constant independent of 8. Taking the conditional expectation of (A3) given Y = y at a parameter value e',

x 2

u(8, e') = c - - [logdet a(e)

+ m * ( e ) P ( e - m * ( e ) P ( 8

where

f = E{x/Y

and

= y; 8 ' )

XX* = E{XX*/Y = y ; 8 ' ) . (A61

Since X and Y are related by a linear transformation, they are jointly Gaussian, and the conditional expectations required in (A5) and (A6) can be computed by a straightforward modification of existing results. We obtain

R = m ( w ) + n(w) H*[HA(W)H*]-' [Y - Hm((3 ' ) ] (-47)

* H A ( € ) ' ) + if*. ('48)

xx* = A ( 8 ' ) - A(8 ' ) H * [ H A ( e ' ) H * ] - '

Note that if we set 8' = 8, (A7) and (A8) are the well- known formulas for the conditional expectations in the Gaussian case (e.g., [19]).

Substituting (A7) and (A8) into (A4), the function U( 8, 8') required for the EM algorithm is given in a closed form. We observe that U( 8, 8 ' ) and log f x ( x ; 8 ) have the same dependence on 8. Maximizing U( 8, 8 ' ) with respect to 8 is therefore the same as maximizing logfx(x; 8 ) with respect to 8. Hence, the EM algorithm essentially requires the ML solution in the X model, which might be significantly simpler than the direct ML solution in the Y model.

In correspondence with (1 1) and (12), the EM algorithm for the linear-Gaussian case is given by the following.

E step: $"' = m(@"') + A(@"')

* H * [ H A ( 0 ' " ' ) H * ] - ' [ y - H m ( 8 ' " ' ) ]

(A91

( A10 )

&*(") = A ( e (") ) - A( e(")) H* [ H A ( ( j ( " ) ) H * ] -' . HA( e ( " ) ) + f(n)f(n)*.

M step:

e min {logdet A ( 8 ) + m * ( 8 )

- n-'(e) m ( e ) - d " ) * A - ' ( e ) m ( e )

- m * ( e ) +(e) 2'")

+ tr (A-'(e)&*('))} + W+l). ( A l l )

FEDER AND WEINSTEIN: PARAMETER ESTIMATION OF SUPERIMPOSED SIGNALS 489

ACKNOWLEDGMENT The authors wish to thank D. Kelter and Y. Eyal for

their help in the simulations, and the referees for many helpful comments.

REFERENCES [ l ] M. Feder and E. Weinstein, “Optimal multiple source location esti-

mation via the EM algorithm,” in Proc. I985 Int. Conf. Acoust., Speech, Signal Processing (ICASSP ’85), Tampa Bay, FL, Mar. 1985,

[2] -, “Multipath and multiple source array processing via the EM algorithm,” in Proc. I986 Int. Con$ Acoust., Speech, Signal Pro- cessing (ICASSP ’86), Tokyo, Japan, Apr. 1986.

[3] J. R. Grindon, “Estimation theory for a class of nonlinear random vector channels,” Ph.D. dissertation, Washington Univ., St. Louis, MO, 1970.

[4] J. S. Lim and A. V. Oppenheim, “All pole modeling of degraded speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol.

[5] M. Segal, “Resolution and estimation techniques with application to vectorial miss-distance indicator,” M.Sc. thesis, Technion-Israel Inst. Technol., Haifa, Israel, Aug. 1981.

[6] N. M. Laird, A. P. Dempster, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Ann. Roy. Stat. Soc., pp. 1-38, Dec. 1977.

[7] C. F. J. Wu, “On the convergence properties of the EM algorithm,” Ann. Stat., no. 11, pp. 95-103, 1983.

[8] I. Meilijson, “A fast improvement to the EM algorithm on its own terms,’’ submitted for publication.

[9] H. L. Van Trees, Detection Estimation and Modulation Theory, Part I . New York: Wiley, 1968.

[ lo] J. E. Ehrenberg, T . E. Ewart, and R. D. Moms, “Signal processing techniques for resolving individual pulses in multipath signal,” J . Acoust. SOC. Amer., vol. 63, no. 6, pp. 1861-1865, 1978.

[ I l l M. Feder and E. Weinstein, “Optimal multiple source location estimation via the EM algorithm,” Woods Hole Oceanographic Institu- tion, Woods Hole, MA, Tech. Rep. WHOI-85-25, July 1985.

[12] M. Wax and T. Kailath, “Detection of signals by information theo- retic criteria,’’ IEEE Trans. Acoust., Speech, Signal Processing, vol.

[131 S. S. Reddy, “Multiple source location-A digital approach,” IEEE Trans. Aerosp. Electron. Sysr., vol. AES-15, pp. 95-105, 1979.

[14] G. V. Borgiotti and L. J. Kaplan, “Superresolution of uncorrelated interference sources by using adaptive array techniques,” IEEE Trans. Antennas Propagat., vol. AP-27, pp. 842-845, 1979.

pp. 1762-1765.

ASSP-26, pp. 197-210, 1978.

ASSP-32, pp. 387-392, 1985.

[15] R. 0. Schmidt, “A signal subspace approach to multiple emitter location and spectral estimation,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1981.

1161 G. Su and M. Morf, “The signal subspace approach for multiple wide- band emitter location,” IEEE Trans. Acoust., Speech, Signal Pro- cessing, vol. ASSP-31, pp. 1503-1522, 1983.

[17] B. Porat and B. Friedlander, “Estimation of spatial and spectral parameters of multiple sources,” IEEE Trans. Inform. Theory, vol. IT-

1181 A. Nehorai, G. Su, and M. Morf, “Estimation of time difference of arrival for multiple ARMA sources by pole decomposition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-31, pp. 1478- 1491, 1983.

Englewood Cliffs, NJ: Prentice-Hall, 1979.

29, pp. 412-425, 1983.

[19] B. 0. Anderson and J. B. Moore, Optimal Filtering.

Meir Feder received the B.Sc. and M.Sc. degrees (summa cum laude) from Tel-Aviv University, Tel-Aviv, Israel, and the Sc.D. degree from the Massachusetts Institute of Technology, Cam- bridge, and the Woods Hole Oceanographic Insti- tution, Woods Hole, MA, all in electrical engineering, in 1980, 1984, and 1987, respectively.

He is currently a Research Associate in the De partment of Electrical Engineering and Computer Science at M.I.T. His research interests lie in the broad area of signal processing, stochastic esti-

mation, and information theory. He is particularly interested in investigat- ing new statistical estimation techniques and using them in signal processing applications, e.g., speech, image, and array processing problems.

Ehud Weinstein received the B.Sc. degree from the Technion-Israel Institute of Technology, Haifa, Israel, and the Ph.D. degree from Yale University, New Haven, CT, both in electncal engineering, in 1975 and 1978, respectively.

He is currently an Associate Professor in the Department of Electronic Systems, Faculty of En- gineering, Tel-Aviv University, Tel-Aviv, Israel, and an Adjunct Scientist with the Department of Ocean Engineenng, Woods Hole Oceanographic Institution, Woods Hole, MA. His research inter-

ests are in the general area of statistical estimation theory and its application to signal and array processing.

Dr Weinstein is the co-recipient of the IEEE Acoustics, Speech, and Signal Processing Society’s 1983 Senior Award.

‘1 1

parameter estimation of superimposed signals using the em

Documents