inter frame lsf

8/3/2019 Inter Frame Lsf

1/15

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999 495

Interframe LSF Quantization for Noisy ChannelsThomas Eriksson, Jan Linden, Member, IEEE, and Jan Skoglund, Member, IEEE

AbstractIn linear predictive speech coding algorithms, trans-

mission of linear predictive coding (LPC) parametersoftentransformed to the line spectrum frequencies (LSF) representa-tionconsumes a large part of the total bit rate of the coder.Typically, the LSF parameters are highly correlated from oneframe to the next, and a considerable reduction in bit rate canbe achieved by exploiting this interframe correlation. However,interframe coding leads to error propagation if the channelis noisy, which possibly cancels the achievable gain. In thispaper, several algorithms for exploiting interframe correlationof LSF parameters are compared. Especially, performance fortransmission over noisy channels is examined, and methods toimprove noisy channel performance are proposed. By combiningan interframe quantizer and a memoryless safety-net quan-tizer, we demonstrate that the advantages of both quantizationstrategies can be utilized, and the performance for both noiseless

and noisy channels improves. The results indicate that the bestinterframe method performs as good as a memoryless quantizingscheme, with 4 bits less per frame. Subjective listening testshave been employed that verify the results from the objectivemeasurements.

Index TermsInterframe coding, memory-based vector quan-tization, robust coding, spectrum coding, speech coding, vectorquantization.

I. INTRODUCTION

MODERN digital communication applications, such ascellular telephony, have lead to an increasing needfor high-quality speech coding schemes operating at lower

and lower bit rates. Most contemporary speech coders arebased on linear predictive coding (LPC), where a fairly white

excitation signal is fed into an all-pole filter representing the

spectral information of speech. For many applications, the LPC

spectrum is the major side information, and thus it is important

to encode the LPC parameters using as few bits as possible

with a maintained high speech quality. The aim of this study is

to investigate the problem of efficient transmission of spectral

Manuscript received August 19, 1996; revised March 31, 1999. Theassociate editor coordinating the review of this manuscript and approving itfor publication was Dr. Joseph Campbell.

T. Eriksson was with the Department of Information Theory, Chalmers

University of Technology, SE-412 96 Goteborg, Sweden. He is nowwith the Information Theory Group, Department of Signals and Systems,Chalmers University of Technology, SE-412 96 Goteborg, Sweden (e-mail:[email protected]).

J. Linden was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with the Departmentof Electrical and Computer Engineering, University of California, SantaBarbara, CA 93106 USA, and SignalCom Inc., Goleta, CA 93117 USA(e-mail: [email protected]).

J. Skoglund was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with AT&TLabsResearch, Shannon Laboratory, Florham Park, NJ 07932 USA (e-mail: [email protected]).

Publisher Item Identifier S 1063-6676(99)06560-8.

information by exploiting interframe correlation for noiseless

and noisy channels. The subject of LPC quantization has beenstudied intensively for many years, initially with the focus

on which parameter set to use for LPC representation [1],

[2]. In competition with reflection coefficients and log area

ratios (LAR), the line spectral frequencies or line spectrum

pairs (LSF or LSP, introduced in [3]) have shown to be a

suitable representation, and is the prevailing LPC parameter

set in speech coding today.

Up to about 1990, almost all coding schemes relied on

scalar quantization to some extent. Complexity reasons limited

the use of vector quantization (VQ), and therefore methods

designed to exploit intraframe correlation (correlation between

parameters within one frame) using scalar quantization wereproposed, see, e.g., [4][8]. The first work that incorporated

VQ was described in [9], but far from acceptable performance

was obtained with a VQ of 10 bits/frame. Instead, several

hybrids of scalar quantization and VQ were investigated, e.g.,

[10], [11]. Direct application of a single VQ is still not suitable

in practice (though it has been done in, e.g., [12]) but different

schemes that reduce the VQ complexity at the expense of

degraded performance have been demonstrated to outperform

earlier scalar systems. In [13] it is proposed that transparent

quantization can be achieved at 24 bits/frame if the LSF vector

is split into two vectors, each quantized with a separate VQ

(this procedure is usually referred to as split VQ). Another

efficient way of reducing VQ complexity is multistage VQ[14]. In [15] it is stated that the same performance as for the

24 bits/frame split VQ can be achieved at 22 bits/frame with

multistage VQ.

In memoryless quantization, each LSF parameter vector is

quantized independently of previous LSF vectors. This is not,

however, the most efficient way to encode the LSF vectors. Pa-

rameters extracted from speech, such as the LPC coefficients,

typically show a significant interframe correlation (correlation

between successive frames). Consequently, large gains can be

obtained by exploiting the interframe correlation. A number of

memory-based quantization schemes, i.e., schemes that utilize

correlation between successive frames, have been proposed

during the last ten years. In Section II, an overview of some

successful methods to exploit interframe correlation in LSF

quantization is presented. Among the most popular memory-

based VQ schemes is predictive VQ (PVQ) [16][19], a

straightforward extension of a scalar predictive quantizer,

and finite-state VQ (FSVQ) [20], [21], where a next-state

function determines which of a set of quantizers to use for the

next vector. Other quantizers with memory include methods

based on the discrete cosine transform, two-dimensional (2-D)

prediction, noiseless coding of VQ indices etc.

10636676/99$10.00 1999 IEEE


2/15

496 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

Fig. 1. Predictive vector quantizer, encoder (left) and decoder (right). An error vector, which is quantized with a VQ, is formed by subtracting a predictionbased on previously quantized vectors from the current input vector.

Several studies of LSF quantization can be found in the

literature. However, the results can in general not be directly

compared since there may be large deviations in the exper-

imental setups. We have observed that different databases

can lead to different objective performance for the same

quantization scheme. Furthermore, there are several possi-

ble methods of performing the LPC analysis. For example,

both the autocorrelation method and the stabilized covariance

method are common for LPC analysis, and procedures suchas high frequency compensation and bandwidth expansion

also affect the result. The frame length varies between 5

and 40 ms in different papers, and the analysis window

overlap is also not consistent from one work to another (two

factors that are of significant importance for the performance

of memory-based methods [22]). Consequently, the greatest

caution should be exercised when comparing results from

different studies. Several memoryless quantization schemes are

compared using a common database in [23]. In this work we

have incorporated some of the most popular memory-based

VQ schemes and compared their performance for the same

database and analysis method. Throughout this paper, the order

of the linear prediction filter is 10 and the frame length is

20 ms with 25-ms analysis window. More details about the

experimental setup are found in Section VI-A.

An interesting subject in LSF quantization is the perfor-

mance of memory-based VQ methods when the transmission

channel is noisy. For such channels, bit errors are unavoidable.

This may cause the state of the encoder and decoder to differ.

In a memory-based scheme, this leads to a sequence of errors,

error propagation, which possibly cancels the advantage over

a memoryless VQ. We have studied a new technique called

safety-net VQ, which is shown to significantly decrease error

propagation. The safety-net can be used as an extension to a

memory-based VQ, thereby improving the performance bothfor transmission over noisy and noiseless channels. In this pa-

per, we study spectrum coding performance for noisy channels

without using explicit error protection on the transmitted bits

which can improve noisy channel performance, however at the

expense of fewer bits available for source coding.

The main topics of this report are 1) to study the perfor-

mance gains of exploiting interframe correlation for coding

of LPC parameters, and 2) to investigate the performance of

memory-based VQ for noisy channels.

The paper is organized as follows. Several of the most

commonly used memory-based VQ schemes are described in

Section II. In Section III, we calculate some estimates of the

achievable gains with interframe coding. The new safety-net

technique is thoroughly described in Section IV. Section V

investigates how performance can be improved for memory-

based VQs when channel noise is present. Simulation results

of the various systems under noisy and noiseless conditions

are given in Section VI in terms of objective measures as well

as in terms of subjective listening tests. Finally, conclusions

are given in Section VII.

II. MEMORY-BASED QUANTIZATION METHODS

A memory-based quantizer is a quantizer that incorporates

knowledge of previously quantized vectors when coding the

current input vector. The memory in the quantizer makes it

possible to exploit memory in the input process, i.e., interframe

dependencies. Both scalar and vector quantizers with memory

are common in the literature. Here we describe some of the

most successful memory-based quantization methods for LSF

parameters.

A. Predictive VQ

A straightforward method of taking advantage of the mem-

ory of the source is to utilize (linear) predictive vector quan-

tization (PVQ). PVQ is an extension of standard scalar pre-

dictive quantization (DPCM) obtained by replacing the scalar

predictor and scalar quantizer by their vector counterparts.

PVQ was introduced in [24] and [25], and further developed

in for example [19] and [26].

A vector linear predictor forms an estimate of the incoming

vectors1 as a linear combination of earlier quantized vectors,

and the prediction residual vector is quantized by a vector

quantizer. A PVQ encoder and decoder are depicted in Fig. 1.

The vector predictor can be written

(1)

where is the one-step-ahead prediction vector, are

earlier quantized input vectors, and are the prediction

matrices. The optimum values (in a minimum mean square

error sense) of the prediction matrices can be found by

1 In the following discussion, we will assume that the incoming vectors havezero mean, and that the vector process is ergodic and wide sense stationary.The formulas can easily be generalized to vectors with a nonzero mean.


3/15

ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 497

Fig. 2. Encoder (left) and decoder (right) of a finite-state VQ. Which of the K memoryless codebooks that is used at a certain coding instant is determinedby a next-state function. The input to the next-state function is the last chosen codevector and the previous state.

solving a system of linear matrix equations

for (2)

where are the correlation matrices

(3)

For simplicity, the unquantized input process, , is often

used to estimate the correlation matrices, instead of , which

would be more correct. The solution for a first-order predictor

( ) is particularly simple. For this case, the optimum

prediction matrix can be found by simple matrix inversion

and multiplication:

(4)

For higher order predictors, a generalized version of the

LevinsonDurbin algorithm [27] can be applied. In this work,

only first-order prediction has been simulated. As is pointed

out in Section III, most of the achievable prediction gain canbe realized with a first order vector predictor.

The correlation matrices are usually estimated from a train-

ing database, for example the same database that is later used

to train the vector quantizer in the PVQ. The simplest method,

and the one used in this paper, is the autocorrelation method,

where the correlation matrices are estimated as

(5)

Values of outside the observable window are assigned

the value zero. In [28], the autocorrelation method and the

covariance method for estimating correlation matrices in a

PVQ system are treated in more detail. After determiningthe prediction matrices, the VQ is trained, either by an open-

loop or closed-loop procedure. In the open-loop approach, the

predictor is designed first, without taking the VQ into account.

Then the VQ is separately trained on the resulting prediction

errors.

In the closed-loop approach, the predictor and the VQ are

first designed from the database, as in the open-loop approach.

Then, the PVQ system with the current VQ is used to generate

a new set of vectors for additional training of the VQ. This

process is iterated until a stopping criterion is reached. It is also

possible to update the predictor coefficients in a closed-loop

design process. The closed-loop PVQ design was proposed

in [19].

Another version of predictive VQ is the MA-PVQ, where

the decoder includes a moving average (MA) filter instead of

an autoregressive (AR) filter as in the standard PVQ solution.

In most cases, the MA predictor system requires a predictor of

higher order to reach the same performance as an AR predictor

system. The main advantage of the MA configuration is thefinite impulse response of the decoder filter, which leads to

limited bit error propagation. In this report, we study other

methods to limit the bit error propagation (see Sections IV and

V), and we will not discuss the MA predictor further. In [29], a

comparison between MA and AR prediction is presented and it

is found that using the methods described in Sections IV and V

the two prediction paradigms obtain comparable performance.

Other reports that study MA prediction include [30] and [31],

and the ITU-T 8 kb/s speech coding standard includes a fourth-

order MA predictor for LSF quantization [32].

Applications of PVQ to spectrum quantization can be found

in [16][18]. In [33] and [34], 2-D predictive quantization

is proposed, with the predictor utilizing both intraframe andinterframe correlation simultaneously. Some studies of non-

linear prediction can also be found, e.g., [35], [36]. A general

treatment of the concept of predictive VQ can be found in [37].

B. Finite-State VQ

Finite-state VQ (FSVQ), first reported on in [38], can

be viewed as a collection of memoryless vector quantizers,

together with a selection rule that determines which is the

current state, cf. Fig. 2. Each state is associated with one of

the memoryless VQs. The codebooks of the memoryless VQs

are called state codebooks and the union of them is usuallyreferred to as the super codebook. A next-state function is

employed to determine the new encoder state.

An input vector is encoded by searching the codebook,

corresponding to the current state, for the closest codevector.

The new encoder state is determined by the previous state

and the selected codeword in the state codebook, by use of

the next-state function. Only the codeword index has to be

transmitted since the current state is known by the decoder

which uses the same next-state function as the encoder. Note

that predictive VQ can be viewed as a special case of FSVQ

where the number of states is infinite.


4/15


Several different FSVQ methods have been proposed in the

literature. The main difference is the next-state function, and

the way the codebooks are represented [20], [21]. We will in

this section describe two methods. The first is the very simple

nearest neighbor FSVQ (NN-FSVQ). In Section IV-C it will

be shown how performance can be significantly improved for

NN-FSVQ by extending the design. The next method is called

omniscient labeled-transitions FSVQ (OT-FSVQ) [20], which

has been found to yield the best codes of the proposed FSVQ

schemes with reasonable complexity in most applications [21].

In the NN-FSVQ approach, it is assumed that successive

input vectors are highly correlated and, consequently, succes-

sive coded vectors are close to each other. The basic idea

of NN-FSVQ is to design a very large memoryless (super-)

codebook, but only use a subset of the codevectors at every

coding instant. The smaller set of codevectors, chosen as the

nearest neighbor codevectors to the last chosen codevector in

the super codebook, constitutes the state codebook.

The gain compared to memoryless quantization of using

such a scheme is in general quite small. This is explained

by the fact that if the best codevector in the super codebookis not contained in the current state codebook, the best state

is lost and may never be recovered. This problem, usually

referred to as the derailment problem, is very similar to the

slope overload phenomenon well known from scalar delta

modulator quantizers. The NN-FSVQ method is, in its most

straightforward implementation, not practicable in the pres-

ence of channel noise, because the derailment problem then

becomes unmanageable.

The omniscient FSVQ technique has shown good perfor-

mance in many applications, especially for image coding,

e.g., [39] but also to some extent for speech coding [21],

[40]. There are two possibilities of representations for the

omniscient FSVQs; labeled-state and labeled-transitions. Wewill here only discuss the labeled-transition case, as it has

shown to perform better in applications [21], [40]. The first

step in the omniscient FSVQ design is to find a state classifier,

for example a memoryless VQ with the same number of

codevectors as the desired number of states. The training

data is then divided into subsets using the classifier. The

training subset for state consists of all training vectors

whose immediate predecessors have been classified to state .

The state codebook is then designed by applying a standard

VQ training algorithm using the subset of the training data

corresponding to state .

The decoder cannot track the omniscient next-state rule

defined above, since it depends on the input rather than onthe encoded input. However, if the actual input is replaced

with the encoded input, we get an approximation of the

next-state rule used in the design. Hence, the next state is

determined from the encoder output as depicted in Fig. 2,

which makes it possible for the encoder and the decoder to

be synchronized. The state codebooks can then be fine-tuned

by encoding the whole training sequence using the new FSVQ

encoder and replacing each codevector with the centroid of

the training vectors assigned to it. A closed-loop optimization

similar to that described for PVQ can be applied to improve

performance.

The omniscient FSVQ technique requires very large

databases for training purposes, especially if the number of

states is large. Even for the relatively small number of states

we have experimented with, the training is very complex

and requires a large training database. Another problem is

robustness, both against changes in the input signal and against

channel errors. In Section VI, results for OT-FSVQ with eight

states are reported. It is worth noting that if the number of

states is increased, the performance is expected to increase as

well. However, the performance improvement is in general

small, and is achieved at the expense of increased complexity

and storage requirements [40].

C. Other Memory-Based Quantization Schemes

Although finite-state VQ and predictive VQ are the most

commonly treated memory-based VQ methods in the literature,

there are also other methods to exploit interframe correlation.

Most of these other methods imply an increased coding delay,

high complexity, variable bit rate, etc. Variable bit rate and

high coding delay are acceptable in certain applications, suchas speech storage. However, in other applications, such as

speech coding for mobile telephony, it is of great importance

to keep the coding delay as low as possible. Variable bit rate

requires complex protocols in most channel access schemes,

and is hence not possible to use in many applications. In order

to keep the cost and power consumption of the hardware

(on which the coder is implemented) as low as possible, it

is important that the computational complexity is reasonably

low. Also, most speech coders operate in real-time, limiting

the computational delay to one frame. Brief explanations of

several methods to exploit interframe redundancy are given

below, but no measurements of performance are included in

this article.In matrix quantization, two or more vectors are compiled

into a matrix and are quantized simultaneously. This approach

is straightforward and clear, but it has two major disadvan-

tages: 1) the coding delay increases since two or more vectors

are buffered before quantization and 2) the complexity is often

very high. Hence, the usage of matrix quantization is in general

limited to very low rate applications. Complexity reduction for

matrix quantization have been proposed in, e.g., [41] and [42].

Phamdo and Farvardin [43] proposes a scheme called tree-

searched VQ with interblock noiseless coding (TSVQ-IBNC)

for coding of LSF parameters. This scheme relies on a tech-

nique developed by Neuhoff and Moayeri [44]. In TSVQ of

a correlated source, it is likely that the codewords of twoconsecutive vectors share a common part. Therefore, it is

possible to transmit only the altered bits of each codeword,

together with the length of the common part. This procedure

obviously results in a variable rate scheme.

Another scheme that works with the codewords instead of

directly on the vectors is relative index coding (RIC), proposed

by Bruhn in [45]. The codewords are sorted according to the

distance from the previously selected codevector, with index

zero being the same codevector as the previous, codeword one

is the closest index, and so on. The sorted index can then be

Huffman coded, resulting in a variable-rate scheme.


5/15


Fig. 3. Left: Histogram of LSF parameters 1 to 10. Right: Scatter plot showing the distribution of LSF 1 and 2.

In [6], Farvardin and Laroia propose the use of the discrete

cosine transform to decorrelate consecutive LSF vectors. This

scheme requires an increased coding delay to obtain acceptable

performance.Interpolation of LSF parameters also relies on interframe

correlation. In [46], four out of eight frames are selected for

transmission, and the spectra of the remaining frames are

derived by interpolation. The coding delay is eight frames,

which is far from acceptable in low-delay applications.

Codebook adaptation is another popular procedure. Xydeas

and So [47] first search a fixed VQ for the best index, then

try to encode the index by use of a long history quantization

codebook, which is updated to contain the most common

indices. In [48], the first codebook in a two-stage VQ is

adapted by a deletion and partition operation.

III. ESTIMATES OF INTERFRAME CODING PERFORMANCE

In this section, we try to estimate the theoretically achiev-

able gains if interframe dependencies of the LSF vector

process are exploited. Rate distortion theory [49], [50] can

be of great help when the performance of a coding scheme

shall be estimated. The rate distortion function (RDF, )

gives a lower bound for the required rate, (number of bits

per parameter), in coding a stochastic process at a desired

distortion (commonly a quadratic distortion measure).

By computing the RDF for a memoryless coding scheme

and for a scheme where interframe correlation is exploited,

both at the same distortion, we can obtain an estimate of theachievable gains with interframe coding. The RDF is fully

determined by the probability density function (pdf) of the

actual process. However, the pdf of the LSF vector process is

not trivial to estimate, and even if a good estimate of the pdf

exists, the corresponding rate distortion function is difficult to

compute. Fortunately, there are some cases where the RDF is

simple to compute. In this section, we compute two estimates

of the RDF, based on different assumptions. In Section III-A,

we calculate the entropy of the index source generated by an

LSF VQ, and in Section III-B we make the assumption that the

distribution of the LSF parameter vector is jointly Gaussian.

A. Approximation 1: Entropy Measurements

In this section, we estimate the RDF of the LSF process by

entropy measurements.

First we design a vector quantizer for the LSF source usingthe algorithm in [51]. This VQ encodes the LSF source with

a certain distortion , producing a stream of indices .

Assuming that this index source is memoryless (alternatively

avoiding to exploit the memory in the process), we can find a

lower bound for the required number of bits to transmit the

VQ indices by computing the entropy of this source,

(6)

where is the number of vectors in the VQ. The index

source can be transmitted at a rate arbitrarily close to the

entropy by use of a noiseless coding scheme such as Huffman

coding, applied to long sequences of indices. This procedure is

impractical, due to the extra delay introduced. Therefore theseresults shall be considered as performance bounds, and not as

recipes on how to encode the VQ indices.

To estimate the required rate when knowledge of the previ-

ous indices is exploited, we compare the entropy above with

the conditional entropy, computed as

(7)

where is the history of the source, .

For reason of simplicity, we approximate the LSF process as

a first order Markov processes, with

(8)

We write

(9)


6/15


The mutual information is defined as the difference

and constitutes an

estimate of the performance gain if knowledge of previously

encoded vectors is fully exploited. We note that the entropies

above are straightforward to determine once the probabilities

have been estimated, as shown in Section III-C.

B. Approximation 2: Gaussian pdf

If the probability density function of the LSF vectors

is assumed to be jointly Gaussian, there is a simple way

to estimate the gains if interframe correlation is exploited.

For Gaussian continuous-valued processes, the rate distortion

theory is well developed, and simple formulas exist. However,

a real LSF process is not Gaussian, partly due to the ordering

property of LSF parameters. In Fig. 3, histograms and scatter

plots of the LSF parameters are plotted. We conclude that the

one-dimensional marginal distributions of the LSFs are well

approximated with one-dimensional (1-D) Gaussian pdfs, but

we also note that a 2-D scatter plot of LSF 1 and 2 does

not seem Gaussian at all. Still, we think that valuable insights

of the LSF vector process can be achieved by the discussionbelow.

The rate distortion, , for jointly Gaussian pdfs is given

parametrically in the form [49]

(10)

where is the dimension of the vectors, and are the

eigenvalues of the vector process. For high rates, the distortion

rate function can be simplified to

(11)

Encoding of a Gaussian pdf requires a higher rate than

other pdfs to achieve a given distortion. This means that the

Gaussian RDF can serve as an upper bound for any non-

Gaussian pdf. The rate distortion function is fully determined

by the eigenvalues of the covariance matrix

[defined in (3)].

Now we want to compute the RDF for a system with mem-

ory. For jointly Gaussian vectors, the optimum minimum mean

square error one-step-ahead prediction is a linear combinationof the previous vectors (see e.g., [37])

(12)

The prediction error vector process, , is Gaussianas well. The covariance matrix for the error process of an

th-order minimum error variance predictor is given by

(13)

TABLE IESTIMATED BIT SAVING, ENTROPY MEASUREMENTS

TABLE IIESTIMATED BIT SAVING, GAUSSIAN APPROXIMATION

where are the optimum prediction matrices and

the correlation matrices, defined in Section II-A. From the

covariance matrix we can compute the eigenvalues and the

RDF for the prediction error. If we compute the RDF both forthe original LSF process and for the prediction error process

at the same distortion , we get an estimate of the achievable

interframe coding gain, measured in bits/vector. The results of

such RDF measurements are presented in the next section.

C. Interframe Gain Computation

The database for computing the entropies and rate distortion

functions is the same as the one described in Section VI-A,

consisting of almost two hours of speech recorded from FM

radio. The frame size is 20 ms, with a 2.5 ms overlap on both

sides. The ten-dimensional LSF vectors are split into three

vectors, with 3, 3, and 4 LSF parameters, respectively.First we trained a three-split VQ for the LSF process and

computed the entropy and the conditional entropy

for the stream of indices. It can be shown (the

proof is simple but beyond the scope of this study) that the

larger the size of the VQ, the higher the gains that can be

expected. However, for a -bit VQ, probabilities must be

estimated, and our database limits to be seven or less in order

to get accurate estimates of and

. Therefore we have computed the entropies for a three-split

VQ with 7 bits in each split, even though 8 bits would have

been more appropriate to estimate correlation gains relative

a realistic 24 bits ( ) memoryless VQ. The entropies and

conditional entropies of the three VQs are given in Table I.For this experiment, the results indicate that a total gain of

5.6 bits for three-split interframe encoding of the LSF vector

process can be expected.

In the second experiment, we computed the RDF (with

a Gaussian assumption) for three-split LSF vectors, and for

three-split of the first-order prediction error vectors. The RDF

was computed at a distortion of Hz per LSF

(standard deviation 25 Hz per LSF), which is close to the

distortion experimentally found for a 24 bits LSF VQ. The

corresponding rates for the LSF vectors are given in Table II,

together with the rates for the prediction error vectors. As


7/15


Fig. 4. Safety-net principle: Combine a memory-based VQ with a fixedmemoryless VQ (the safety-net VQ).

can be seen in this table, we can expect to gain a total of

6.2 bits if first order interframe coding is employed. If we

use a second order predictor instead, the computations show

that we can expect to gain an extra 0.20.3 b, making a

total gain of 6.46.5 bits compared to a standard memoryless

LSF quantizer. If the predictor order is further increased, only

very little can be gained.2 We conclude that a first order

predictor achieves most of the gain, which is also confirmed

by experiments and other reports [17], [22].

These two gain estimates indicate that 56 bits can be savedby exploiting interframe correlation in a three-split structure.

The error in the above estimates of the achievable interframe

coding gain comes partly from the fact that the distortion

measure we seek to minimize in LSF quantization is not

the quadratic distance between original and encoded LSF

vector, but rather the spectral distance (SD). The necessary

approximations also lead to errors. However, we think that

the experiments in this section give a hint of the achievable

gains of interframe LSF coding.

IV. SAFETY-NET VQ

In this section, we propose an extension of existing memory-

based VQ systems with a fixed memoryless VQ, herebydenoted safety-net VQ. The safety-net concept was introduced

in [52], and has also been reported in [53] and [54]. Similar

systems have also been studied in for example [10], [17], [18],

[55], and [56]. In this paper we further develop the ideas, and

study the performance for transmission over noisy channels.

The main principle of the safety-net extension is illustrated

in Fig. 4. A memory-based VQ is combined with a fixed

memoryless VQ that operates independently of the memory-

based VQ. At each coding instant both codebooks are searched

for the best codevector.

By using this arrangement, we aim to achieve three objec-

tives.

To encode outliers, i.e., low-correlation frames, sep-

arately from the typical high-correlation frames. Many

memory-based VQ systems show good performance for

highly correlated input vectors, but perform worse than

memoryless systems for the occasional low correlation

frames. This results in low average distortion, but the

number of high distortion frames increases. In encoding

of, for example, spectrum coefficients, this is a serious

2 Note that there might be considerable long-time dependencies in the LSFvectors, since the speaker can be expected to repeat phonemes at irregularintervals. However, these dependencies are difficult to exploit by linearprediction.

problem, since there is a significant perceptual importance

of keeping the number of high distortion frames low.

This fact is emphasized in several studies [13], [57].

By adding a fixed memoryless codebook to the memory-

based VQ system, the low correlation frames are encoded

in a standard memoryless VQ, and a lower number of high

distortion frames can be expected.

Since outliers are separately encoded in the safety-net VQ,

the memory-based VQ can focus on the highly correlated

frames. A standard memory-based VQ encodes frames

with both high and low interframe correlation in the

same quantizer. The VQ must be designed to handle

both these cases, and the high interframe correlation in

the typical frames cannot be fully exploited. Some of

the potential performance gain of exploiting interframe

correlation in the memory-based VQ is lost due to the

need to compromise. The addition of a fixed memoryless

codebook that encodes outliers separately enables the

memory-based VQ to exploit interframe correlation to a

higher degree, and lower average distortion should result.

A serious objection to memory-based VQ systems is theperformance when the index must be transmitted over

a noisy channel, which is often the case in realistic

systems. An error in a memory-based VQ transmission

leads to error propagation, i.e., to a sequence of frames

where the internal state of the encoder and the decoder

differs, and thus a sequence of data with large errors is

produced. Most systems with memory forget the bit

error reasonably fast, but error propagation is nevertheless

a serious problem in memory-based VQs. By including a

fixed memoryless codebook, error propagation is canceled

every time an entry from the fixed codebook is selected

and correctly transmitted to the decoder. The improve-

ments in performance over noisy channels is perhapsthe strongest reason for extending the design with a

memoryless codebook. In Section V this subject is studied

in more detail.

The combination of the two VQs can be described as

(14)

where a fixed memoryless codebook is combined with

an adaptive memory-based codebook , resulting in the

extended codebook . The search process is performed by

first searching the adaptive codebook for the best vector,

then searching the fixed codebook for the best fixed vector.

The winning candidates from the two codebooks are compared,and the best of these two vectors,3 denoted , is encoded and

transmitted to the decoder as follows:

(15)

3 The distortion criterion we have used to find the two candidate vectorsis the weighted minimum squared error criterion (see Section VI-A), mainlydue to the comparably low complexity. When the best of the two candidatesshall be chosen, more complex criteria can be considered since only twovectors shall be compared; here we have used the spectral distance measure(Section VI-A).


8/15


9/15


Fig. 6. Principle of a 2-D DCVQ. The nearest neighbors to the previouscodevector together with the fixed codebook form the combined codebook.

number of outliers. PVQ design procedures are described in

Section II-A.

C. Safety-Net FSVQ

The second safety-net method is a combination of the simple

nearest neighbor FSVQ technique, described in Section II-

B, and a safety-net VQ. It was first presented in [52] and

will be referred to as dynamic combination VQ (DCVQ).

Derailment occurs for NN-FSVQ when the input vector has

low correlation with the previous vector, and thus no good

representation of the input vector exists in the NN-FSVQ

codebook. Since the problem with outlying vectors propagate

to the next frame, due to the memory in the quantizer,

the result is a sequence of inadequately quantized vectors.

By introducing a safety-net to take care of outliers, theDCVQ solves the derailment problem. Hence, performance

for transmission over noisy channels is also significantly

improved. The major disadvantage of the DCVQ technique

is the same as one of the major problems with the NN-FSVQ;

that the storing requirements are large. An illustration of the

combination of a NN-FSVQ and a fixed memoryless quantizer

is given in Fig. 6.

The design of the DCVQ is simple, as described earlier:

The NN-FSVQ is trained using the full training database, and

a nearest neighbor table is stored, as described in Section II-B.

The safety-net VQ is also trained using the full training

database. Closed-loop training procedures can be applied for

this case as well, but in general the improvement is negligible.

V. MEMORY-BASED VQ ON NOISY CHANNEL

For the case of a memoryless VQ, the effect of channel noise

is straightforward. An error in the transmission of a codeword

index only effects the distortion of the current vector, since no

memory is incorporated. Systems with memory are affected

differently by channel errors than memoryless systems because

the memory in the decoder causes error propagation. The

effects of error propagation can be very serious in some

systems, if precautions are not taken. In, for example, a nearest

neighbor FSVQ a bit error could cause the system to derail and

never recover. In other systems, error propagation causes long

sequences of highly distorted vectors. One way to decrease the

effect of channel errors for memory-based VQ schemes is to

periodically perform a full search that forces the code into the

best possible state which is transmitted to the decoder. This

should be done quite infrequently, as the cost of sending extra

information gets high. This method is not suitable for PVQ

systems, because of the infinite number of states.

In this section, we study performance of memory-based VQ

systems operating on noisy channels, and try to decrease the

effects of error propagation. Other work that treat memory-

based LSF quantization in the presence of channel noise

include [58] and [40], while for example [59] and [13]

investigates noisy channel performance of memoryless LSF

quantization.

A. Optimization of Index Assignment for Memory-Based VQ

Index assignmentis the procedure of numbering the vectors

in a vector quantizer (assigning indices to the vectors). Noisy

channel performance of vector quantizers having random index

assignments is in general poor. In order to minimize the effect

of channel errors on the output signal, the codebook should be

reordered such that the Hamming distance (assuming a binary

channel) between any two codevector indices corresponds

closely to the Euclidean distance between the corresponding

codevectors. For this ordering problem, it is hard to find

optimal solutions. A number of suboptimal algorithms have

been proposed [60][63]. We have applied a fast and reli-

able method denoted the linearity increasing swap algorithm

(LISA), described in [63]. The choice of LISA for index

assignment is justified by the superior speed compared to other

methods (10 bits VQs are processed in seconds).

In this study, procedures to improve the index assignmentare applied to all vector quantizers. The VQ schemes in

this comparison benefit from improving index assignment

to various degrees. The gains are larger for methods with

memory, because the effect of error propagation is reduced.

It is not obvious how to apply such an algorithm for all of the

coding schemes, therefore we will here describe briefly how

it has been done.

1) Index Assignment for PVQ: The possible reconstruction

vectors at time are , where is the

prediction from previous coded vectors, is the codeword

in the actual prediction error codebook and is the codeword

index. Evidently, the distance between codevectors and

is the same as the distance between the correspondingvectors in the codebook and . Thus an index assign-

ment algorithm operating on the final reconstruction vectors is

identical to one operating on the codebook (i.e., not changing

with time). Consequently, for PVQ we can simply apply an

algorithm that improves the index assignment of the prediction

error codebook directly.

2) Index Assignment for FSVQ: For each state we have as-

signed one codebook and hence we can optimize the index

assignment for each of the state codebooks independently. Still

there is a problem when, due to channel errors, the encoder

and the decoder do not agree on the current state. In this


10/15


case, another index assignment that takes into account that

the state is not correct would be optimal, but not practically

implementable. Thus, applying the index assignment algorithm

independently to each state codebook for FSVQ will result in

less frequent erroneous state decisions, but cannot improve

performance if they do occur.

3) Index Assignment for Safety-Net Methods: This problem

is more complicated because of the existence of two code-

books. If, for each coding instant, the adaptive and the memo-

ryless codebook are combined into one, and the optimization is

carried out for the combined codebook, the best possible index

assignment is achieved. This is only feasible if all possible

codebook combinations are known in the design procedure.

Otherwise, a new index assignment must be found at every

coding instant. For DCVQ all combinations of the codebooks

are known on beforehand, which means, that at least theo-

retically, it is possible to find an optimal index assignment.

Because of the increased complexity and storage requirements

that result, we have not implemented this strategy. Instead, we

have applied the index assignment algorithm independently to

the memoryless VQ and to the adaptive VQ. For SN-PVQwe have also applied the index assignment algorithm to the

two codebooks independently, but we have also improved the

index assignment by the help of a simple algorithm presented

in [22]. In short, a few different index assignments for the

adaptive VQ are precomputed, and which of these to use is

determined by classification of the current prediction vector.

The safety-net VQ still uses independently optimized index

assignment.

B. Channel Optimization for Memory-Based VQ

Index assignment does not take into account any explicit

knowledge about the channel error probability. If knowledge

about the channel can be incorporated in the design, perfor-

mance can be significantly improved. This is usually referred

to as channel optimized VQ (COVQ) [60]. A disadvantage is

that the performance degrades if the actual channel differs from

the design channel, or is changing with time. As is the case

in the index assignment design, a simultaneously optimized

COVQ requires that all combinations of the codebooks are

known already in the design procedure for the safety-net

extended systems. Hence, only independent COVQ designs for

the adaptive VQ and the memoryless VQ are feasible. In this

paper, we have not experimentally evaluated this method of

improving noisy channel performance, but in [40] it is shown

how COVQ can improve performance for omniscient FSVQ,and in [64], COVQ is employed to improve PVQ and SN-

PVQ. In [65], a design method that simultaneously trains the

codebook and the predictor for noisy channel PVQ is proposed.

C. Reducing Error Propagation

The fact that codevectors from the memoryless VQ are

frequently chosen implies that the error propagation is much

less prominent in a memory-based VQ scheme that includes

a safety-net than in one without a safety-net. The reason is

that if a codevector is chosen from the memoryless codebook

and the corresponding index is correctly conveyed over the

channel, the decoder will be forced into the same state as

the encoder. Consequently, it is desirable to increase the

number of times the encoder chooses a codevector from the

memoryless codebook as much as possible if the channel is

noisy, without increasing the total distortion noticeably. One

way to accomplish this is to study the relative number of

codevectors in the memoryless and adaptive codebook. If

and in (16) are equal, one bit is used to distinguish which

codebook the current vector originates from. If we want to

increase the usage of the memoryless VQ we can simply

increase the size of the memoryless codebook at the same

time decreasing the size of the adaptive codebook. However,

due to the indexing problems that arise when the codebook

size is not a power of two we have chosen not to investigate

other choices than equal sizes and .

Another way to increase usage of the memoryless VQ is

to bias the selection process to favor the safety-net vectors.

The bias can be a constant, or it can be a function of the

number of transmitted vectors since the last time a safety-net

vector was selected. With this method, the attractive limited

error propagation feature of moving average prediction canbe mimicked, by forcing the encoder to select a memoryless

vector after a predetermined number of vectors from the

memory-based VQ. This bias should be chosen depending

on the actual channel statistics. However, in this work we

use a constant bias of 0.15 dB (in SD) which is a heuristic

compromise for the range of error probabilities that was used

in the noisy channel experiments. Additional experiments on

biased decision can be found in [22].

D. Optimization of the Prediction Matrices for PVQ Systems

The performance of LSF quantization in a PVQ system

deteriorates much faster with increasing channel noise thanthe performance of memoryless LSF VQ. This fact motivated

us to improve the PVQ system for use over noisy channels.

In Fig. 1, a PVQ system is depicted. The VQ and the channel

in the system are modeled as white Gaussian noise sources,

see Fig. 7.

If we try to optimize the prediction matrices of the system,

in order to minimize the effects of quantization and channel

noise, we find that this problem is hard to treat mathematically.

However, for noiseless channels we have experimentally found

that the optimum predictor matrices are close to diagonal. Thus

the vector predictor can be approximated as a set of scalar

predictors. Some of the above approximations can be avoided

by orthogonalizing the input vectors before the analysis,by use of the KarhunenLoeve transform (see, e.g., [66]).

However, we find the approximations reasonable. For example,

by excluding all components of the prediction matrices outside

the diagonals, we have found that only about 0.15 bits is lost

for the full LSF vector. Therefore, we proceed by analyzing

the problem as a set of independent scalar problems.

Finding the equations for the optimum prediction coeffi-

cients for a set of independent problems is comparably simple.

For noiseless channels, the result

(17)


11/15


Fig. 7. Model of VQ and channel in a PVQ system as Gaussian noise sources.

is obvious from Fig. 1, but worthwhile to emphasize. For noisy

channels, a term is added to , where is the

vector impulse response of the decoder filter, and denotes

convolution. Since we have a set of approximately independent

problems, we can consider the components of the vectors one

at a time. From (17) with the added channel noise term, we

write the error in a component as

(18)where is the power transfer factor for a component of the

decoder filter, i.e., the factor by which the power of a white

noise input signal is amplified by the filter. The quantizer and

channel error variances are assumed to be proportional to the

prediction error variance ,

and (19)

and we rewrite (18) as

(20)

This result is also derived in [66]. Now we want to express

the power transfer factor, , and the prediction error variance,

, as functions of the coefficients of the input process and

prediction filter. For the sake of simplicity, we restrict the

calculations to first- and second-order AR processes, generated

by

(21)

The linear predictor is written as

(22)

After some work, we find expressions for and :

(23)

(24)

By inserting (23) and (24) in (20) we obtain an expression

for the error variance of the PVQ system in the presence of

channel noise. Also, for given values of , , , , and

(derived from the VQ, the channel and the input process),

we can find the optimum values of and . Even for this

simplified system, an analytic solution is hard to find, but a

Fig. 8. Performance of a 20-bit SN-PVQ as a function of the mix betweenvectors in the memory-based VQ and the safety-net VQ in terms of averagespectral distortion.

numerical solution is easily obtained. Note that (20) must be

independently solved for each component in the LSF vector.

The result from the above analysis is used to improve

the PVQ and SN-PVQ performance for noisy channels, in

Section VI-B. The diagonal elements of the prediction ma-

trices are optimized for high noise levels ( ), and

no optimization for actual channel noise level is performed.

That is, the error probability of the channel is not a designparameter. Even the results for noiseless channels are obtained

with the prediction matrices optimized for high noise lev-

els. However, the noiseless spectral distortion increase when

the matrix is optimized for high noise is small, 0.020.04

dB, while the gains for severely degraded channels can be

several dB.

In [67], Chang and Donaldson derive formulas for optimum

predictor coefficients for scalar DPCM systems. Jayant and

Noll [66] give a general overview of the problem of transmit-

ting DPCM over noisy channels, and Noll [68] analyzes the

noisy channel performance of PCM and DPCM quantizing

schemes.

VI. EXPERIMENTS

In this section, the experiments used to determine optimal

parameters of the memorybased and safety-net methods are

described. Comparisons of all tested methods are given, both

for noiseless and noisy channels. To verify the objective

results, a listening test is presented.

A. Experimental Setup

The speech training database used to design all the VQs

in this work consists of 250 000 vectors. Another set of


12/15


20 000 vectors is used for evaluation.4 The speech is recorded

from FM radio and includes a large number of speakers of

both gender. The language is mostly Swedish. The speech is

digitized at 16 kHz, lowpass-filtered at 3.4 kHz and decimated

to 8 kHz sampling frequency. A tenth-order LPC analysis

using the stabilized covariance method with high-frequency

compensation and error weighting [13] is performed every

20 ms using a 25 ms analysis window. A fixed 10 Hz

bandwidth expansion is applied to each pole of the LPC

coefficient vector.

One of the key issues in vector quantization is the selec-

tion of an appropriate distortion measure for the codebook

search. The Euclidean distance measure is often used for its

simplicity. Here, we employ the weighted Euclidean distance

measure presented in [13] that has shown to improve both

the objective quality (measured in spectral distortion, SD),

and the subjective quality of the coded speech. This distance

measure has been used in the design and evaluation of all VQ

techniques presented in this work except for the design of the

next-state functions in OT-FSVQ and DCVQ, where the un-

weighted Euclidean measure was employed. For measuring thequantization performance, we calculate the spectral distortion

in the 03 kHz range.

Large savings in complexity and storage requirements can

be achieved if a product-code technique is employed. In

this work we have utilized a three-split VQ scheme for allquantizers, where the dimension in each split are 3, 3, and 4,

respectively. An important design issue for a split VQ system

is the number of bits to allocate for the individual VQs. It is

common that the bits are evenly distributed over the splits in

order to keep the largest codebook as small as possible [13],

[40]. However, since the difference in complexity is relatively

small, we have here used the bit configurations that result in

the best performance in terms of average spectral distortion.A typical example is 24-bit VQs where 8 bits were used for

the first split, 9 bits for the second, and 7 bits for the last split.

All the investigated quantization methods used the same bit

allocations.

For the PVQ systems first order prediction is used, with the

prediction matrix optimized for high noise level according to

(20), (23), and (24) ( ). For the OT-FSVQ schemes

the number of states is chosen to be eight. We will not report

on any results for NN-FSVQ, instead the FSVQ class will be

represented by OT-FSVQ, due to its superior performance. In

the following MLVQ denotes memoryless VQ.

B. Performance for Noiseless Channels

An important aspect of the design of a safety-net extended

memory-based VQ system is what codebook sizes that should

be assigned to the safety-net VQ and the memory-based VQ,

respectively. We have investigated the performance of a 20-bit

SN-PVQ for five different constellations, and the results of the

simulations are depicted in Fig. 8. These results indicate that

4 In this study, we are mostly interested in the relative performancebetween different quantization schemes and the size of the evaluation setis considered sufficient for this purpose. Note, as mentioned in Section I, thatthe performance in absolute figures may differ if another speech material isused.

Fig. 9. Performance of the VQs in terms of average spectral distortion.The curves correspond to: A) memoryless VQ; B) OT-FSVQ; C) PVQ; D)DCVQ,; and E) SN-PVQ.

the best performance is obtained by choosing the safety-net

size to be somewhere between 2550% of the total size. This

experiment, together with the discussions in Sections IV-A

and V-C, motivate us to use a mix coefficient of 50% in all

experiments.

In Fig. 9, the average SD for the investigated coding

schemes is plotted as a function of the number of bits used. For

the safety-net configurations, one bit was used to determine the

chosen codebook. From this figure it is clear that all memory-

based VQ methods can utilize the interframe correlation and

achieve performance significantly better than the memoryless

VQ. Among the memory-based methods, the SN-PVQ is

clearly the best in these simulations followed by DCVQ and

PVQ and last OT-FSVQ. When employing the SN-PVQ the

required rate can be reduced by 45 bits/vector compared to

the memoryless VQ without reduction in performance. It can

also be seen that if a memory-based scheme is extended with

a safety-net, approximately 1 bit is gained. If we compare theresults with what was theoretically predicted in Section III-C,

we can conclude that with SN-PVQ performance close to the

predicted is achieved.

Differences in the analysis conditions and databases make it

difficult to compare our results to other similar work. However,

we can, if the analysis conditions are similar, compare the

relative improvement of using a memory-based VQ scheme

compared to a memoryless VQ. For OT-FSVQ, we compare

our results to the results by Hussain and Farvardin in [40].

They report a performance gain of slightly less than 3 bits

for the OT-FSVQ, which is very close to what is obtained in

this work. For the case of PVQ it is more difficult to find a

comparable investigation. For example, Loo and Chan in [35],[36] report a gain of 56 bits for PVQ, but for a completely

different coding situation than the one in this work.

In Table III, the performance both in average SD as well as

outlier percentage is depicted for all five investigated coding

methods at 24 bits. As expected, the introduction of a safety-

net VQ does not only decrease the average distortion but also

the number of outliers.

C. Performance for Noisy Channels

In the preceding section, we have verified that a number of

memory-based VQ schemes outperform conventional mem-


13/15


TABLE IIIQUANTIZER PERFORMANCE AT 24 BITS AND BIT ERROR RATE q OF 0% AND 0.5%

Fig. 10. Performance of the VQs at 24 bits in terms of average spectraldistortion as a function of bit error rate. The curves correspond to: (a)memoryless VQ, (b) OT-FSVQ, (c) PVQ, (d) DCVQ, and (e) SN-PVQ.

TABLE IVS D COMPARISON OF SN-PVQ, PVQ, AND

MEMORYLESS VQ FOR DIFFERENT BIT ERROR RATES

oryless VQs under noiseless conditions. However, in order

to be useful for practical applications, it is essential that the

coding scheme can cope with channel noise. Therefore we

have performed a study of the behavior under noisy conditions.

Here we assume a memoryless binary symmetric channel with

bit error probability . For all vector quantizers, procedures

to improve the index assignment are applied, as described in

Section V-A.

The performance for all methods under equal noisy con-

ditions at 24 bits in terms of average SD are depicted in

Fig. 10. From the curves in Fig. 10 we conclude that SN-

PVQ is better than all other methods for all tested error rates.

The other memory-based methods only perform better than

the memoryless for small error probabilities. OT-FSVQ is the

scheme in this investigation that is most sensitive to channel

errors. Again we see that the introduction of a safety-net VQ

clearly improves performance. In Table III average SD and

outlier percentage is presented for . Even though low

values of average SD are achieved for some of the methods,

the number of outliers caused by bit errors are high and hence

the distortion is clearly audible.

Fig. 11. Synthetic speech production for the listening tests.

For high error probabilities, the average SD is higher for

all methods than what can be accepted in most applications.However, the results for high error rates can be significantly

improved if channel coding is applied, see for instance [13].

We have also found that the gain of using larger codebooks is

almost negligible for high error rates. Thus, if more bits can

be used it is more efficient to use them for channel coding

than increasing the codebook sizes.

Another interesting comparison is given in Table IV. Here

we compare a 20-bit SN-PVQ, with a 21-bit PVQ and a 24-bit

memoryless VQ, which all perform approximately equal with-

out noise. The results in Table IV lead to the conclusion that

a saving of 4 bits compared to a memoryless VQ can be

obtained for all tested error rates by SN-PVQ. Compared to a

PVQ scheme, an improvement of at least 1 bit is achievable.

Note that the performance degrades more for the PVQ system

when the bit error rate is increased, compared to the other two

methods. Hence, for large error probabilities the PVQ loses

more than 1 bit compared to SN-PVQ.

D. Subjective Evaluation

We have performed listening tests to verify the objective

results in the previous subsection. In the test, a 20-bit SN-

PVQ was compared to a 24-bit memoryless VQ. The coders

were compared both for noiseless conditions and for a bit

error rate of 1%.

A diagram of the model for studying the effects of quantiza-tion of the LSF parameters is shown in Fig. 11. A prediction

residual is formed by filtering the speech signal using an

unquantized prediction filter, and synthetic speech is generated

by exciting a quantized inverse prediction filter with the undis-

torted residual. In this way, the effects of LSF quantization can

be studied separately from any encoding of the residual.

Twelve short Swedish sentences uttered by male and female

speakers are encoded by the memoryless VQ and the SN-

PVQ, with and without channel noise. The sentences were

pairwise compared, including some comparisons with the

uncoded original sentences. Twelve test persons listened with

headphones to each pair (a total of 60 pairs) and were asked to

indicate a preference for either the first or the second sentence.The listening tests revealed that for a noiseless channel, the

20-bit SN-PVQ was preferred to the 24-bit memoryless VQ

in 58% of the comparisons. For a channel with 1% bit errors,

the result is very clear: A 20-bit SN-PVQ was preferred to a

24-bit memoryless VQ in 78% of the comparisons. Statistical

tests verified that at confidence level of 95%, the SN-PVQ is

preferred to the memoryless VQ, both for the noisy and the

noiseless case.

The outcome from the listening tests show that the objec-

tive performance advantage for the SN-PVQ over a standard

memoryless VQ also holds in subjective tests.


14/15


VII. CONCLUSIONS

The most important results of the experiments can be

summarized in the following points.

A memory-based LSF quantizer has an advantage of

35 bits over memoryless VQ for error-free transmission.

The SN-PVQ method is the best in this work, with an

advantage of 45 bits.

A safety-net extension of an existing memory-based VQcan improve the performance with 12 bits for error-free

transmission. For transmission over noisy channels, the

performance gain is even larger.

For noisy channels, conventional memory-based VQ

methods rapidly lose the advantage over memoryless

VQ. However, the proposed safety-net extension of the

memory-based VQ algorithms improves the performance,

and the SN-PVQ method is similar to the memoryless

VQ for all tested error probabilities, with 4 bits less.

The above objective results are further strengthened by

subjective tests of speech quality. In the listening tests, a 20-

bit SN-PVQ was preferred to a 24-bit memoryless VQ in 58%

of the evaluated sentences for a noiseless channel, and in 78%of the sentences for a channel with 1% bit error rate.

All results in this work are derived for 20 ms frames,

windowed with an overlap of 2.5 ms on both sides. The

difference between memory-based and memoryless methods

will increase if the frame length is decreased, or if the overlap

between frames is increased. The performance for all methods

in general, and the memory-based methods in particular, will

also improve if the channel noise distribution is assumed to

be known, and channel optimization procedures can be used.

REFERENCES

[1] R. Viswanathan and J. Makhoul, Quantization properties of transmis-sion parameters in linear predictive systems, IEEE Trans. Acoust.,Speech, Signal Processing, vol. ASSP-23, pp. 309321, 1975.

[2] A. H. Gray, Jr. and J. D. Markel, Quantization and bit allocation inspeech processing, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-24, pp. 459473, 1976.

[3] F. Itakura, Line spectrum representation of linear predictive coefficientsof speech signals, J. Acoust. Soc. Amer., vol. 57, suppl. 1, p. S35(A),1975.

[4] F. K. Soong and B.-H. Juang, Line spectrum pair (LSP) and speechdata compression, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, San Diego, CA, 1984, pp. 1.10.11.10.4.

[5] N. Sugamura and N. Farvardin, Quantizer design in LSP speechanalysis-synthesis, IEEE J. Select. Areas Commun., vol. 6, pp. 432440,1988.

[6] N. Farvardin and R. Laroia, Efficient encoding of speech LSP param-eters using the discrete cosine transformation, in Proc. IEEE Int. Conf.

Acoustics, Speech, Signal Processing, Glasgow, U.K., 1989, vol. 1, pp.168171.

[7] R. Hagen and P. Hedelin, Low bit-rate spectral coding in CELP, a newLSP-method, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Albuquerque, NM, 1990, pp. 189192.

[8] F. K. Soong and B.-H. Juang, Optimal quantization of LSP parameters,IEEE Trans. Speech Audio Processing, vol. 1, pp. 1524, 1993.

[9] A. Buzo, A. H. Gray, Jr., R. M. Gray, and J. D. Markel, Speech codingbased upon vector quantization, IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-28, pp. 562574, 1980.

[10] J. Grass and P. Kabal, Methods of improving vector-scalar quantizationof LPC coefficients, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, Toronto, Ont., Canada, 1991, pp. 657660.

[11] R. Laroia, N. Phamdo, and N. Farvardin, Robust and efficient quanti-zation of speech LSP parameters using structured vector quantizers, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Toronto,Ont., Canada, 1991, pp. 641644.

[12] P. Hedelin, Single-stage spectral quantization at 20 bits, in Proc. IEEEInt. Conf. Acoustics, Speech and Signal Processing, Adelaide, Australia,1994, vol. 1, pp. 525528.

[13] K. K. Paliwal and B. S. Atal, Efficient vector quantization of LPCparameters at 24 bits/frame, IEEE Trans. Speech Audio Processing,vol. 1, pp. 314, 1993.

[14] B.-H. Juang and A. H. Gray, Jr., Multiple stage vector quantization forspeech coding, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Paris, France, 1982, pp. 597600.

[15] W. P. LeBlanc, B. Bhattacharya, S. A. Mahmoud, and V. Cuperman,

Efficient search and design procedures for robust multi-stage VQ ofLPC parameters for 4 kb/s speech coding, IEEE Trans. Speech AudioProcessing, vol. 1, pp. 373385, 1993.

[16] Y. Shoham, Vector predictive quantization of the spectral parametersfor low rate speech coding, in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Processing, Dallas, TX, 1987, vol. 4, pp. 21812184.

[17] M. Yong, G. Davidsson, and A. Gersho, Encoding of LPC spectralparameters using switched-adaptive interframe vector prediction, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, NewYork, NY, 1988, vol. 1, pp. 402405.

[18] S. Wang, E. Paksoy, and A. Gersho, Product code vector quantization ofLPC parameters, in Speech and Audio Coding for Wireless and Network

Applications, B. Atal, V. Cuperman, and A. Gersho, Eds. Boston, MA:Kluwer, 1993, pp. 251258.

[19] V. Cuperman and A. Gersho, Vector predictive coding of speech at 16kbits/s, IEEE Trans. Commun., vol. COMM-33, pp. 685696, 1985.

[20] J. Foster, R. M. Gray, and M. O. Dunham, Finite-state vector quanti-

zation for waveform coding, IEEE Trans. Inform. Theory, vol. 31, pp.348359, 1985.[21] M. O. Dunham and R. M. Gray, An algorithm for the design of labeled-

transition finite-state vector quantizers, IEEE Trans. Commun., vol.COMM-33, pp. 8389, 1985.

[22] J. Linden, Interframe quantization of spectrum parameters in speechcoding, Licent. thesis, Tech. Rep. 235L, Chalmers Univ. Technol.,1996.

[23] K. K. Paliwal and W. B. Kleijn, Quantization of LPC parameters,in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds.New York: Elsevier, 1995, pp. 433466.

[24] V. Cuperman and A. Gersho, Adaptive differential vector codingof speech, in Conf. Rec. GlobeCom, Miami, FL, 1982, vol. 3, pp.10921096.

[25] T. R. Fischer and D. J. Tinnin, Quantized control with differential pulsecode modulation, in Proc. Conf. Decision and Control, Orlando, FL,1982, vol. 3, pp. 12221227.

[26] P.-C. Chang and R. M. Gray, Gradient algorithms for designing predic-tive vector quantizers, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-34, pp. 679690, 1986.

[27] R. A. Wiggins and E. A. Robinson, Recursive solution to the multi-channel filtering problem, J. Geophys. Res., vol. 70, pp. 18851891,1965.

[28] J.-H. Chen and A. Gersho, Covariance and autocorrelation methods forvector linear prediction, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Dallas, TX, 1987, pp. 15451548.

[29] J. Skoglund and J. Linden, Predictive VQ for noisy channel spectrumcoding: AR or MA?, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Munich, Germany, 1997, vol. 2, pp. 13511354.

[30] H. Ohmuro, T. Moriya, K. Mano, and S. Miki, Coding of LSPparameters using interframe moving average prediction and multi-stagevector quantization, in Proc. IEEE Workshop on Speech Coding forTelecommunications, Quebec, P.Q., Canada, 1993, vol. 1, pp. 6364.

[31] W. P. LeBlanc, C. Liu, and V. Viswanathan, An enhanced full ratespeech coder for digital cellular applications, in Proc. IEEE Int. Conf.

Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 1, pp.569572.

[32] A. Kataoka, J. Ikedo, and S. Hayashi, LSP and gain quantization forthe proposed ITU-T 8-kb/s speech coding standard, in Proc. IEEEWorkshop on Speech Coding for Telecommunications, Annapolis, MD,1995, vol. 1, pp. 78.

[33] C.-C. Kuo, F.-R. Jean, and H.-C. Wang, Low bit-rate quantization ofLSP parameters using two-dimensional differential coding, in Proc.

IEEE Int. Conf. Acoustics, Speech and Signal Processing, San Francisco,CA, 1992, vol. 1, pp. 97100.

[34] E. Erzin and A. E. Cetin, Interframe differential vector coding of linespectrum frequencies, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Minneapolis, MN, 1993, vol. 2, pp. 2528.

[35] J. H. Y. Loo, W.-Y. Chan, and P. Kabal, Classified nonlinear predictivevector quantization of speech spectral parameters, in Proc. IEEE Int.Conf. Acoustics, Speech, and Signal Processing, Atlanta, GA, 1996, vol.


15/15


2, pp. 761764.[36] J. H. Y. Loo and W. Y. Chan, Nonlinear predictive vector quantization

of speech spectral parameters, in Proc. IEEE Workshop on SpeechCoding for Telecommunications, Annapolis, MD, 1995, vol. 1, pp.5152.

[37] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.Boston, MA: Kluwer, 1991.

[38] J. Foster and R. M. Gray, Finite-state vector quantizers for waveformcoding, in Proc. IEEE Int. Symp. Information Theory, New York, NY,1982, vol. 1, pp. 134135.

[39] R. Aravind and A. Gersho, Image compression based on vectorquantization with finite memory, Opt. Eng., vol. 26, pp. 570580, 1987.[40] Y. Hussain and N. Farvardin, Finite-state vector quantization over noisy

channels and its application to LSP parameters, in Proc. IEEE Int. Conf.Acoustics, Speech and Signal Processing, San Francisco, CA, 1992, vol.2, pp. 133136.

[41] S. Bruhn, Matrix product quantization for very-low-rate speech cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Detroit, MI, 1995, vol. 1, pp. 724727.

[42] C. S. Xydeas and C. Papanastasiou, Efficient coding of LSP parametersusing split matrix quantization, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Detroit, MI, 1995, vol. 1, pp. 740743.

[43] N. Phamdo and N. Farvardin, Coding of speech LSP parameters usingTSVQ with interblock noiseless coding, in Proc. IEEE Int. Conf.

Acoustics, Speech and Signal Processing, Albuquerque, NM, 1990, pp.193196.

[44] D. L. Neuhoff and N. Moayeri, Tree searched vector quantizationwith interblock noiseless coding, in Proc. Conf. Information ScienceSystems, 1988, pp. 781783.

[45] S. Bruhn, Efficient interblock noiseless coding of speech LPC parame-ters, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Adelaide, Australia, 1994, vol. 1, pp. 501504.

[46] D. P. Kemp, J. S. Collura, and T. E. Tremain, Multi-frame coding ofLPC parameters at 600800 bps, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Toronto, Ont., Canada, 1991, vol. 1, pp.609612.

[47] C. S. Xydeas and K. K. M. So, A long history quantization approachto scalar and vector quantization of LSP coefficients, in Proc. IEEE

Int. Conf. Acoustics, Speech and Signal Processing, Minneapolis, MN,1993, vol. 2, pp. 14.

[48] M. A. Ferrer-Ballester and A. R. Figueiras-Vidal, Efficient adaptivevector quantization of LPC parameters, IEEE Trans. Speech AudioProcessing, vol. 3, pp. 314317, 1995.

[49] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, 1971.

[50] R. M. Gray, Source Coding Theory. Boston, MA: Kluwer, 1990.[51] Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizerdesign, IEEE Trans. Commun., vol. COMM-28, pp. 8495, 1980.

[52] T. Eriksson, J. Linden, and J. Skoglund, A safety-net approach forimproved exploitation of speech correlations, in Proc. Int. Conf. DigitalSignal Processing, Cyprus, 1995, vol. 1, pp. 96101.

[53] , Vector quantization of glottal pulses, in Proc. 4th Europ. Conf.Speech Communication and Technology, Madrid, Spain, 1995, vol. 1,pp. 225228.

[54] , Exploiting interframe correlation in spectral quantizationAstudy of different memory VQ schemes, in Proc. IEEE Int. Conf.

Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 2,pp. 765768.

[55] H. Zarrinkoub and P. Mermelstein, Switched prediction and quantiza-tion of LSP frequencies, in Proc. IEEE Int. Conf. Acoustics, Speechand Signal Processing, Atlanta, GA, 1996, vol. 2, pp. 757760.

[56] E. Shlomot, Delayed decision switched prediction multi-stage LSFquantization, in Proc. IEEE Workshop on Speech Coding for Telecom-munications, Annapolis, MD, 1995, vol. 1, pp. 4546.

[57] B. S. Atal, R. V. Cox, and P. Kroon, Spectral quantization andinterpolation for CELP coders, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Glasgow, U.K., 1989, pp. 6972.

[58] S. L. DallAgnol, A. Alcaim, and J. R. B. de Marca, Performance ofLSF vector quantizers for VSELP coders in noisy channels, Eur. Trans.Telecommun., vol. 5, pp. 553563, 1994.

[59] R. Hagen and P. Hedelin, Robust vector quantization in spectral cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Minneapolis, MN, 1993, vol. 2, pp. 1316.

[60] N. Farvardin, A study of vector quantization for noisy channels, IEEETrans. Inform. Theory, vol. 36, pp. 799809, 1990.

[61] K. Zeger and A. Gersho, Pseudo-Gray coding, IEEE Trans. Commun.,vol. 38, pp. 21472158, 1990.

[62] P. Hedelin, P. Knagenhjelm, and M. Skoglund, Vector quantization forspeech transmission, in Speech Coding and Synthesis, W. B. Kleijn and

K. K. Paliwal, Eds. New York: Elsevier, 1995, pp. 311345.[63] P. Knagenhjelm and E. Agrell, The Hadamard transformA tool for

index assignment, IEEE Trans. Inform. Theory, vol. 42, pp. 11391151,1996.

[64] T. Eriksson, J. Linden, and J. Skoglund, Improvements of memoryvector quantization for noisy channel transmission of LSF parameters,in Proc. Radio Vetenskap och Kommunikation, Lulea, Sweden, 1996,vol. 1, pp. 370374.

[65] J. Linden and J. Skoglund, Channel optimization of predictive VQfor spectrum coding, in Proc. IEEE Workshop on Speech Coding for

Telecommunications, Pocono Manor, PA, 1997, pp. 9394.[66] N. S. Jayant and P. Noll, Digital Coding of Waveforms. EnglewoodCliffs, NJ: Prentice-Hall, 1984.

[67] K.-Y. Chang and R. W. Donaldson, Analysis, optimization, and sen-sitivity study of differential PCM systems operating on noisy com-munication channels, IEEE Trans. Commun., vol. 20, pp. 338350,1972.

[68] P. Noll, On predictive quantization schemes, Bell Syst. Tech. J., pp.14991532, 1978.

Thomas Eriksson was born in Skovde, Sweden, in1964. He received the M.S. degree in electrical engi-neering in 1990, and the Ph.D. degree in informationtheory in 1996, both from Chalmers University ofTechnology, Goteborg, Sweden.

From 1990 to 1996, he was with the Depart-ment of Information Theory, Chalmers Universityof Technology. From 1997 to 1998, he was at AT&TLabsResearch, Florham Park, NJ, and in 1998 and1999 he was working on a joint research projectwith the Royal Institute of Technology and Ericsson

Radio Systems AB, both in Stockholm, Sweden. He is currently an AssociateProfessor at the Department of Signals and Systems, Chalmers University ofTechnology, where his main research interests are vector quantization andspeech coding.

Jan Linden (S92M98) was born in Goteborg,Sweden, in 1966. He received the M.S. degree inelectrical engineering, the Licentiate of Engineer-

ing, and the Ph.D. degree in information theoryfrom Chalmers University of Technology, Goteborg,Sweden, in 1991, 1996, and 1998, respectively.

From 1992 to 1998, he was a Research andTeaching Assistant at the Department of Informa-tion Theory, Chalmers University of Technology.His research at Chalmers includes low bit ratespeech coding based on glottal pulse modeling and

memory-based vector quantization for noisy channels. He is currently a Post-Doctoral Researcher at the University of California, Santa Barbara (UCSB),and a Research Engineer at SignalCom, Inc., Goleta, CA. His researchat UCSB is focused on audio coding and wideband speech coding, andat SignalCom he is working on algorithm development for speech codingapplications.

Jan Skoglund (SM93M98) was born inGoteborg, Sweden, in 1967. He received the M.S.degree in electrical engineering, the Lic. Eng.,and the Ph.D. degree in information theory fromChalmers University of Technology, Goteborg,Sweden, in 1992, 1996, and 1998, respectively.His Ph.D. dissertation addressed different aspectsof speech coding such as spectrumquantization,pulse excitation modeling, and perceptual coding.

From 1992 to 1998 he was with the Departmentof Information Theory, Chalmers University of

Technology. Since 1999, he is a Consultant at the Speech and ImageProcessing Service Research Laboratory, AT&T Labs-Research, ShannonLaboratory, Florham Park, NJ, where he is working on low bit rate speechcoding.

inter frame lsf

Documents