inter frame lsf

Upload: bharavi-k-s

Post on 06-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Inter Frame Lsf

    1/15

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999 495

    Interframe LSF Quantization for Noisy ChannelsThomas Eriksson, Jan Linden, Member, IEEE, and Jan Skoglund, Member, IEEE

    AbstractIn linear predictive speech coding algorithms, trans-

    mission of linear predictive coding (LPC) parametersoftentransformed to the line spectrum frequencies (LSF) representa-tionconsumes a large part of the total bit rate of the coder.Typically, the LSF parameters are highly correlated from oneframe to the next, and a considerable reduction in bit rate canbe achieved by exploiting this interframe correlation. However,interframe coding leads to error propagation if the channelis noisy, which possibly cancels the achievable gain. In thispaper, several algorithms for exploiting interframe correlationof LSF parameters are compared. Especially, performance fortransmission over noisy channels is examined, and methods toimprove noisy channel performance are proposed. By combiningan interframe quantizer and a memoryless safety-net quan-tizer, we demonstrate that the advantages of both quantizationstrategies can be utilized, and the performance for both noiseless

    and noisy channels improves. The results indicate that the bestinterframe method performs as good as a memoryless quantizingscheme, with 4 bits less per frame. Subjective listening testshave been employed that verify the results from the objectivemeasurements.

    Index TermsInterframe coding, memory-based vector quan-tization, robust coding, spectrum coding, speech coding, vectorquantization.

    I. INTRODUCTION

    MODERN digital communication applications, such ascellular telephony, have lead to an increasing needfor high-quality speech coding schemes operating at lower

    and lower bit rates. Most contemporary speech coders arebased on linear predictive coding (LPC), where a fairly white

    excitation signal is fed into an all-pole filter representing the

    spectral information of speech. For many applications, the LPC

    spectrum is the major side information, and thus it is important

    to encode the LPC parameters using as few bits as possible

    with a maintained high speech quality. The aim of this study is

    to investigate the problem of efficient transmission of spectral

    Manuscript received August 19, 1996; revised March 31, 1999. Theassociate editor coordinating the review of this manuscript and approving itfor publication was Dr. Joseph Campbell.

    T. Eriksson was with the Department of Information Theory, Chalmers

    University of Technology, SE-412 96 Goteborg, Sweden. He is nowwith the Information Theory Group, Department of Signals and Systems,Chalmers University of Technology, SE-412 96 Goteborg, Sweden (e-mail:[email protected]).

    J. Linden was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with the Departmentof Electrical and Computer Engineering, University of California, SantaBarbara, CA 93106 USA, and SignalCom Inc., Goleta, CA 93117 USA(e-mail: [email protected]).

    J. Skoglund was with the Department of Information Theory, ChalmersUniversity of Technology, Goteborg, Sweden. He is now with AT&TLabsResearch, Shannon Laboratory, Florham Park, NJ 07932 USA (e-mail: [email protected]).

    Publisher Item Identifier S 1063-6676(99)06560-8.

    information by exploiting interframe correlation for noiseless

    and noisy channels. The subject of LPC quantization has beenstudied intensively for many years, initially with the focus

    on which parameter set to use for LPC representation [1],

    [2]. In competition with reflection coefficients and log area

    ratios (LAR), the line spectral frequencies or line spectrum

    pairs (LSF or LSP, introduced in [3]) have shown to be a

    suitable representation, and is the prevailing LPC parameter

    set in speech coding today.

    Up to about 1990, almost all coding schemes relied on

    scalar quantization to some extent. Complexity reasons limited

    the use of vector quantization (VQ), and therefore methods

    designed to exploit intraframe correlation (correlation between

    parameters within one frame) using scalar quantization wereproposed, see, e.g., [4][8]. The first work that incorporated

    VQ was described in [9], but far from acceptable performance

    was obtained with a VQ of 10 bits/frame. Instead, several

    hybrids of scalar quantization and VQ were investigated, e.g.,

    [10], [11]. Direct application of a single VQ is still not suitable

    in practice (though it has been done in, e.g., [12]) but different

    schemes that reduce the VQ complexity at the expense of

    degraded performance have been demonstrated to outperform

    earlier scalar systems. In [13] it is proposed that transparent

    quantization can be achieved at 24 bits/frame if the LSF vector

    is split into two vectors, each quantized with a separate VQ

    (this procedure is usually referred to as split VQ). Another

    efficient way of reducing VQ complexity is multistage VQ[14]. In [15] it is stated that the same performance as for the

    24 bits/frame split VQ can be achieved at 22 bits/frame with

    multistage VQ.

    In memoryless quantization, each LSF parameter vector is

    quantized independently of previous LSF vectors. This is not,

    however, the most efficient way to encode the LSF vectors. Pa-

    rameters extracted from speech, such as the LPC coefficients,

    typically show a significant interframe correlation (correlation

    between successive frames). Consequently, large gains can be

    obtained by exploiting the interframe correlation. A number of

    memory-based quantization schemes, i.e., schemes that utilize

    correlation between successive frames, have been proposed

    during the last ten years. In Section II, an overview of some

    successful methods to exploit interframe correlation in LSF

    quantization is presented. Among the most popular memory-

    based VQ schemes is predictive VQ (PVQ) [16][19], a

    straightforward extension of a scalar predictive quantizer,

    and finite-state VQ (FSVQ) [20], [21], where a next-state

    function determines which of a set of quantizers to use for the

    next vector. Other quantizers with memory include methods

    based on the discrete cosine transform, two-dimensional (2-D)

    prediction, noiseless coding of VQ indices etc.

    10636676/99$10.00 1999 IEEE

  • 8/3/2019 Inter Frame Lsf

    2/15

    496 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    Fig. 1. Predictive vector quantizer, encoder (left) and decoder (right). An error vector, which is quantized with a VQ, is formed by subtracting a predictionbased on previously quantized vectors from the current input vector.

    Several studies of LSF quantization can be found in the

    literature. However, the results can in general not be directly

    compared since there may be large deviations in the exper-

    imental setups. We have observed that different databases

    can lead to different objective performance for the same

    quantization scheme. Furthermore, there are several possi-

    ble methods of performing the LPC analysis. For example,

    both the autocorrelation method and the stabilized covariance

    method are common for LPC analysis, and procedures suchas high frequency compensation and bandwidth expansion

    also affect the result. The frame length varies between 5

    and 40 ms in different papers, and the analysis window

    overlap is also not consistent from one work to another (two

    factors that are of significant importance for the performance

    of memory-based methods [22]). Consequently, the greatest

    caution should be exercised when comparing results from

    different studies. Several memoryless quantization schemes are

    compared using a common database in [23]. In this work we

    have incorporated some of the most popular memory-based

    VQ schemes and compared their performance for the same

    database and analysis method. Throughout this paper, the order

    of the linear prediction filter is 10 and the frame length is

    20 ms with 25-ms analysis window. More details about the

    experimental setup are found in Section VI-A.

    An interesting subject in LSF quantization is the perfor-

    mance of memory-based VQ methods when the transmission

    channel is noisy. For such channels, bit errors are unavoidable.

    This may cause the state of the encoder and decoder to differ.

    In a memory-based scheme, this leads to a sequence of errors,

    error propagation, which possibly cancels the advantage over

    a memoryless VQ. We have studied a new technique called

    safety-net VQ, which is shown to significantly decrease error

    propagation. The safety-net can be used as an extension to a

    memory-based VQ, thereby improving the performance bothfor transmission over noisy and noiseless channels. In this pa-

    per, we study spectrum coding performance for noisy channels

    without using explicit error protection on the transmitted bits

    which can improve noisy channel performance, however at the

    expense of fewer bits available for source coding.

    The main topics of this report are 1) to study the perfor-

    mance gains of exploiting interframe correlation for coding

    of LPC parameters, and 2) to investigate the performance of

    memory-based VQ for noisy channels.

    The paper is organized as follows. Several of the most

    commonly used memory-based VQ schemes are described in

    Section II. In Section III, we calculate some estimates of the

    achievable gains with interframe coding. The new safety-net

    technique is thoroughly described in Section IV. Section V

    investigates how performance can be improved for memory-

    based VQs when channel noise is present. Simulation results

    of the various systems under noisy and noiseless conditions

    are given in Section VI in terms of objective measures as well

    as in terms of subjective listening tests. Finally, conclusions

    are given in Section VII.

    II. MEMORY-BASED QUANTIZATION METHODS

    A memory-based quantizer is a quantizer that incorporates

    knowledge of previously quantized vectors when coding the

    current input vector. The memory in the quantizer makes it

    possible to exploit memory in the input process, i.e., interframe

    dependencies. Both scalar and vector quantizers with memory

    are common in the literature. Here we describe some of the

    most successful memory-based quantization methods for LSF

    parameters.

    A. Predictive VQ

    A straightforward method of taking advantage of the mem-

    ory of the source is to utilize (linear) predictive vector quan-

    tization (PVQ). PVQ is an extension of standard scalar pre-

    dictive quantization (DPCM) obtained by replacing the scalar

    predictor and scalar quantizer by their vector counterparts.

    PVQ was introduced in [24] and [25], and further developed

    in for example [19] and [26].

    A vector linear predictor forms an estimate of the incoming

    vectors1 as a linear combination of earlier quantized vectors,

    and the prediction residual vector is quantized by a vector

    quantizer. A PVQ encoder and decoder are depicted in Fig. 1.

    The vector predictor can be written

    (1)

    where is the one-step-ahead prediction vector, are

    earlier quantized input vectors, and are the prediction

    matrices. The optimum values (in a minimum mean square

    error sense) of the prediction matrices can be found by

    1 In the following discussion, we will assume that the incoming vectors havezero mean, and that the vector process is ergodic and wide sense stationary.The formulas can easily be generalized to vectors with a nonzero mean.

  • 8/3/2019 Inter Frame Lsf

    3/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 497

    Fig. 2. Encoder (left) and decoder (right) of a finite-state VQ. Which of the K memoryless codebooks that is used at a certain coding instant is determinedby a next-state function. The input to the next-state function is the last chosen codevector and the previous state.

    solving a system of linear matrix equations

    for (2)

    where are the correlation matrices

    (3)

    For simplicity, the unquantized input process, , is often

    used to estimate the correlation matrices, instead of , which

    would be more correct. The solution for a first-order predictor

    ( ) is particularly simple. For this case, the optimum

    prediction matrix can be found by simple matrix inversion

    and multiplication:

    (4)

    For higher order predictors, a generalized version of the

    LevinsonDurbin algorithm [27] can be applied. In this work,

    only first-order prediction has been simulated. As is pointed

    out in Section III, most of the achievable prediction gain canbe realized with a first order vector predictor.

    The correlation matrices are usually estimated from a train-

    ing database, for example the same database that is later used

    to train the vector quantizer in the PVQ. The simplest method,

    and the one used in this paper, is the autocorrelation method,

    where the correlation matrices are estimated as

    (5)

    Values of outside the observable window are assigned

    the value zero. In [28], the autocorrelation method and the

    covariance method for estimating correlation matrices in a

    PVQ system are treated in more detail. After determiningthe prediction matrices, the VQ is trained, either by an open-

    loop or closed-loop procedure. In the open-loop approach, the

    predictor is designed first, without taking the VQ into account.

    Then the VQ is separately trained on the resulting prediction

    errors.

    In the closed-loop approach, the predictor and the VQ are

    first designed from the database, as in the open-loop approach.

    Then, the PVQ system with the current VQ is used to generate

    a new set of vectors for additional training of the VQ. This

    process is iterated until a stopping criterion is reached. It is also

    possible to update the predictor coefficients in a closed-loop

    design process. The closed-loop PVQ design was proposed

    in [19].

    Another version of predictive VQ is the MA-PVQ, where

    the decoder includes a moving average (MA) filter instead of

    an autoregressive (AR) filter as in the standard PVQ solution.

    In most cases, the MA predictor system requires a predictor of

    higher order to reach the same performance as an AR predictor

    system. The main advantage of the MA configuration is thefinite impulse response of the decoder filter, which leads to

    limited bit error propagation. In this report, we study other

    methods to limit the bit error propagation (see Sections IV and

    V), and we will not discuss the MA predictor further. In [29], a

    comparison between MA and AR prediction is presented and it

    is found that using the methods described in Sections IV and V

    the two prediction paradigms obtain comparable performance.

    Other reports that study MA prediction include [30] and [31],

    and the ITU-T 8 kb/s speech coding standard includes a fourth-

    order MA predictor for LSF quantization [32].

    Applications of PVQ to spectrum quantization can be found

    in [16][18]. In [33] and [34], 2-D predictive quantization

    is proposed, with the predictor utilizing both intraframe andinterframe correlation simultaneously. Some studies of non-

    linear prediction can also be found, e.g., [35], [36]. A general

    treatment of the concept of predictive VQ can be found in [37].

    B. Finite-State VQ

    Finite-state VQ (FSVQ), first reported on in [38], can

    be viewed as a collection of memoryless vector quantizers,

    together with a selection rule that determines which is the

    current state, cf. Fig. 2. Each state is associated with one of

    the memoryless VQs. The codebooks of the memoryless VQs

    are called state codebooks and the union of them is usuallyreferred to as the super codebook. A next-state function is

    employed to determine the new encoder state.

    An input vector is encoded by searching the codebook,

    corresponding to the current state, for the closest codevector.

    The new encoder state is determined by the previous state

    and the selected codeword in the state codebook, by use of

    the next-state function. Only the codeword index has to be

    transmitted since the current state is known by the decoder

    which uses the same next-state function as the encoder. Note

    that predictive VQ can be viewed as a special case of FSVQ

    where the number of states is infinite.

  • 8/3/2019 Inter Frame Lsf

    4/15

    498 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    Several different FSVQ methods have been proposed in the

    literature. The main difference is the next-state function, and

    the way the codebooks are represented [20], [21]. We will in

    this section describe two methods. The first is the very simple

    nearest neighbor FSVQ (NN-FSVQ). In Section IV-C it will

    be shown how performance can be significantly improved for

    NN-FSVQ by extending the design. The next method is called

    omniscient labeled-transitions FSVQ (OT-FSVQ) [20], which

    has been found to yield the best codes of the proposed FSVQ

    schemes with reasonable complexity in most applications [21].

    In the NN-FSVQ approach, it is assumed that successive

    input vectors are highly correlated and, consequently, succes-

    sive coded vectors are close to each other. The basic idea

    of NN-FSVQ is to design a very large memoryless (super-)

    codebook, but only use a subset of the codevectors at every

    coding instant. The smaller set of codevectors, chosen as the

    nearest neighbor codevectors to the last chosen codevector in

    the super codebook, constitutes the state codebook.

    The gain compared to memoryless quantization of using

    such a scheme is in general quite small. This is explained

    by the fact that if the best codevector in the super codebookis not contained in the current state codebook, the best state

    is lost and may never be recovered. This problem, usually

    referred to as the derailment problem, is very similar to the

    slope overload phenomenon well known from scalar delta

    modulator quantizers. The NN-FSVQ method is, in its most

    straightforward implementation, not practicable in the pres-

    ence of channel noise, because the derailment problem then

    becomes unmanageable.

    The omniscient FSVQ technique has shown good perfor-

    mance in many applications, especially for image coding,

    e.g., [39] but also to some extent for speech coding [21],

    [40]. There are two possibilities of representations for the

    omniscient FSVQs; labeled-state and labeled-transitions. Wewill here only discuss the labeled-transition case, as it has

    shown to perform better in applications [21], [40]. The first

    step in the omniscient FSVQ design is to find a state classifier,

    for example a memoryless VQ with the same number of

    codevectors as the desired number of states. The training

    data is then divided into subsets using the classifier. The

    training subset for state consists of all training vectors

    whose immediate predecessors have been classified to state .

    The state codebook is then designed by applying a standard

    VQ training algorithm using the subset of the training data

    corresponding to state .

    The decoder cannot track the omniscient next-state rule

    defined above, since it depends on the input rather than onthe encoded input. However, if the actual input is replaced

    with the encoded input, we get an approximation of the

    next-state rule used in the design. Hence, the next state is

    determined from the encoder output as depicted in Fig. 2,

    which makes it possible for the encoder and the decoder to

    be synchronized. The state codebooks can then be fine-tuned

    by encoding the whole training sequence using the new FSVQ

    encoder and replacing each codevector with the centroid of

    the training vectors assigned to it. A closed-loop optimization

    similar to that described for PVQ can be applied to improve

    performance.

    The omniscient FSVQ technique requires very large

    databases for training purposes, especially if the number of

    states is large. Even for the relatively small number of states

    we have experimented with, the training is very complex

    and requires a large training database. Another problem is

    robustness, both against changes in the input signal and against

    channel errors. In Section VI, results for OT-FSVQ with eight

    states are reported. It is worth noting that if the number of

    states is increased, the performance is expected to increase as

    well. However, the performance improvement is in general

    small, and is achieved at the expense of increased complexity

    and storage requirements [40].

    C. Other Memory-Based Quantization Schemes

    Although finite-state VQ and predictive VQ are the most

    commonly treated memory-based VQ methods in the literature,

    there are also other methods to exploit interframe correlation.

    Most of these other methods imply an increased coding delay,

    high complexity, variable bit rate, etc. Variable bit rate and

    high coding delay are acceptable in certain applications, suchas speech storage. However, in other applications, such as

    speech coding for mobile telephony, it is of great importance

    to keep the coding delay as low as possible. Variable bit rate

    requires complex protocols in most channel access schemes,

    and is hence not possible to use in many applications. In order

    to keep the cost and power consumption of the hardware

    (on which the coder is implemented) as low as possible, it

    is important that the computational complexity is reasonably

    low. Also, most speech coders operate in real-time, limiting

    the computational delay to one frame. Brief explanations of

    several methods to exploit interframe redundancy are given

    below, but no measurements of performance are included in

    this article.In matrix quantization, two or more vectors are compiled

    into a matrix and are quantized simultaneously. This approach

    is straightforward and clear, but it has two major disadvan-

    tages: 1) the coding delay increases since two or more vectors

    are buffered before quantization and 2) the complexity is often

    very high. Hence, the usage of matrix quantization is in general

    limited to very low rate applications. Complexity reduction for

    matrix quantization have been proposed in, e.g., [41] and [42].

    Phamdo and Farvardin [43] proposes a scheme called tree-

    searched VQ with interblock noiseless coding (TSVQ-IBNC)

    for coding of LSF parameters. This scheme relies on a tech-

    nique developed by Neuhoff and Moayeri [44]. In TSVQ of

    a correlated source, it is likely that the codewords of twoconsecutive vectors share a common part. Therefore, it is

    possible to transmit only the altered bits of each codeword,

    together with the length of the common part. This procedure

    obviously results in a variable rate scheme.

    Another scheme that works with the codewords instead of

    directly on the vectors is relative index coding (RIC), proposed

    by Bruhn in [45]. The codewords are sorted according to the

    distance from the previously selected codevector, with index

    zero being the same codevector as the previous, codeword one

    is the closest index, and so on. The sorted index can then be

    Huffman coded, resulting in a variable-rate scheme.

  • 8/3/2019 Inter Frame Lsf

    5/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 499

    Fig. 3. Left: Histogram of LSF parameters 1 to 10. Right: Scatter plot showing the distribution of LSF 1 and 2.

    In [6], Farvardin and Laroia propose the use of the discrete

    cosine transform to decorrelate consecutive LSF vectors. This

    scheme requires an increased coding delay to obtain acceptable

    performance.Interpolation of LSF parameters also relies on interframe

    correlation. In [46], four out of eight frames are selected for

    transmission, and the spectra of the remaining frames are

    derived by interpolation. The coding delay is eight frames,

    which is far from acceptable in low-delay applications.

    Codebook adaptation is another popular procedure. Xydeas

    and So [47] first search a fixed VQ for the best index, then

    try to encode the index by use of a long history quantization

    codebook, which is updated to contain the most common

    indices. In [48], the first codebook in a two-stage VQ is

    adapted by a deletion and partition operation.

    III. ESTIMATES OF INTERFRAME CODING PERFORMANCE

    In this section, we try to estimate the theoretically achiev-

    able gains if interframe dependencies of the LSF vector

    process are exploited. Rate distortion theory [49], [50] can

    be of great help when the performance of a coding scheme

    shall be estimated. The rate distortion function (RDF, )

    gives a lower bound for the required rate, (number of bits

    per parameter), in coding a stochastic process at a desired

    distortion (commonly a quadratic distortion measure).

    By computing the RDF for a memoryless coding scheme

    and for a scheme where interframe correlation is exploited,

    both at the same distortion, we can obtain an estimate of theachievable gains with interframe coding. The RDF is fully

    determined by the probability density function (pdf) of the

    actual process. However, the pdf of the LSF vector process is

    not trivial to estimate, and even if a good estimate of the pdf

    exists, the corresponding rate distortion function is difficult to

    compute. Fortunately, there are some cases where the RDF is

    simple to compute. In this section, we compute two estimates

    of the RDF, based on different assumptions. In Section III-A,

    we calculate the entropy of the index source generated by an

    LSF VQ, and in Section III-B we make the assumption that the

    distribution of the LSF parameter vector is jointly Gaussian.

    A. Approximation 1: Entropy Measurements

    In this section, we estimate the RDF of the LSF process by

    entropy measurements.

    First we design a vector quantizer for the LSF source usingthe algorithm in [51]. This VQ encodes the LSF source with

    a certain distortion , producing a stream of indices .

    Assuming that this index source is memoryless (alternatively

    avoiding to exploit the memory in the process), we can find a

    lower bound for the required number of bits to transmit the

    VQ indices by computing the entropy of this source,

    (6)

    where is the number of vectors in the VQ. The index

    source can be transmitted at a rate arbitrarily close to the

    entropy by use of a noiseless coding scheme such as Huffman

    coding, applied to long sequences of indices. This procedure is

    impractical, due to the extra delay introduced. Therefore theseresults shall be considered as performance bounds, and not as

    recipes on how to encode the VQ indices.

    To estimate the required rate when knowledge of the previ-

    ous indices is exploited, we compare the entropy above with

    the conditional entropy, computed as

    (7)

    where is the history of the source, .

    For reason of simplicity, we approximate the LSF process as

    a first order Markov processes, with

    (8)

    We write

    (9)

  • 8/3/2019 Inter Frame Lsf

    6/15

    500 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    The mutual information is defined as the difference

    and constitutes an

    estimate of the performance gain if knowledge of previously

    encoded vectors is fully exploited. We note that the entropies

    above are straightforward to determine once the probabilities

    have been estimated, as shown in Section III-C.

    B. Approximation 2: Gaussian pdf

    If the probability density function of the LSF vectors

    is assumed to be jointly Gaussian, there is a simple way

    to estimate the gains if interframe correlation is exploited.

    For Gaussian continuous-valued processes, the rate distortion

    theory is well developed, and simple formulas exist. However,

    a real LSF process is not Gaussian, partly due to the ordering

    property of LSF parameters. In Fig. 3, histograms and scatter

    plots of the LSF parameters are plotted. We conclude that the

    one-dimensional marginal distributions of the LSFs are well

    approximated with one-dimensional (1-D) Gaussian pdfs, but

    we also note that a 2-D scatter plot of LSF 1 and 2 does

    not seem Gaussian at all. Still, we think that valuable insights

    of the LSF vector process can be achieved by the discussionbelow.

    The rate distortion, , for jointly Gaussian pdfs is given

    parametrically in the form [49]

    (10)

    where is the dimension of the vectors, and are the

    eigenvalues of the vector process. For high rates, the distortion

    rate function can be simplified to

    (11)

    Encoding of a Gaussian pdf requires a higher rate than

    other pdfs to achieve a given distortion. This means that the

    Gaussian RDF can serve as an upper bound for any non-

    Gaussian pdf. The rate distortion function is fully determined

    by the eigenvalues of the covariance matrix

    [defined in (3)].

    Now we want to compute the RDF for a system with mem-

    ory. For jointly Gaussian vectors, the optimum minimum mean

    square error one-step-ahead prediction is a linear combinationof the previous vectors (see e.g., [37])

    (12)

    The prediction error vector process, , is Gaussianas well. The covariance matrix for the error process of an

    th-order minimum error variance predictor is given by

    (13)

    TABLE IESTIMATED BIT SAVING, ENTROPY MEASUREMENTS

    TABLE IIESTIMATED BIT SAVING, GAUSSIAN APPROXIMATION

    where are the optimum prediction matrices and

    the correlation matrices, defined in Section II-A. From the

    covariance matrix we can compute the eigenvalues and the

    RDF for the prediction error. If we compute the RDF both forthe original LSF process and for the prediction error process

    at the same distortion , we get an estimate of the achievable

    interframe coding gain, measured in bits/vector. The results of

    such RDF measurements are presented in the next section.

    C. Interframe Gain Computation

    The database for computing the entropies and rate distortion

    functions is the same as the one described in Section VI-A,

    consisting of almost two hours of speech recorded from FM

    radio. The frame size is 20 ms, with a 2.5 ms overlap on both

    sides. The ten-dimensional LSF vectors are split into three

    vectors, with 3, 3, and 4 LSF parameters, respectively.First we trained a three-split VQ for the LSF process and

    computed the entropy and the conditional entropy

    for the stream of indices. It can be shown (the

    proof is simple but beyond the scope of this study) that the

    larger the size of the VQ, the higher the gains that can be

    expected. However, for a -bit VQ, probabilities must be

    estimated, and our database limits to be seven or less in order

    to get accurate estimates of and

    . Therefore we have computed the entropies for a three-split

    VQ with 7 bits in each split, even though 8 bits would have

    been more appropriate to estimate correlation gains relative

    a realistic 24 bits ( ) memoryless VQ. The entropies and

    conditional entropies of the three VQs are given in Table I.For this experiment, the results indicate that a total gain of

    5.6 bits for three-split interframe encoding of the LSF vector

    process can be expected.

    In the second experiment, we computed the RDF (with

    a Gaussian assumption) for three-split LSF vectors, and for

    three-split of the first-order prediction error vectors. The RDF

    was computed at a distortion of Hz per LSF

    (standard deviation 25 Hz per LSF), which is close to the

    distortion experimentally found for a 24 bits LSF VQ. The

    corresponding rates for the LSF vectors are given in Table II,

    together with the rates for the prediction error vectors. As

  • 8/3/2019 Inter Frame Lsf

    7/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 501

    Fig. 4. Safety-net principle: Combine a memory-based VQ with a fixedmemoryless VQ (the safety-net VQ).

    can be seen in this table, we can expect to gain a total of

    6.2 bits if first order interframe coding is employed. If we

    use a second order predictor instead, the computations show

    that we can expect to gain an extra 0.20.3 b, making a

    total gain of 6.46.5 bits compared to a standard memoryless

    LSF quantizer. If the predictor order is further increased, only

    very little can be gained.2 We conclude that a first order

    predictor achieves most of the gain, which is also confirmed

    by experiments and other reports [17], [22].

    These two gain estimates indicate that 56 bits can be savedby exploiting interframe correlation in a three-split structure.

    The error in the above estimates of the achievable interframe

    coding gain comes partly from the fact that the distortion

    measure we seek to minimize in LSF quantization is not

    the quadratic distance between original and encoded LSF

    vector, but rather the spectral distance (SD). The necessary

    approximations also lead to errors. However, we think that

    the experiments in this section give a hint of the achievable

    gains of interframe LSF coding.

    IV. SAFETY-NET VQ

    In this section, we propose an extension of existing memory-

    based VQ systems with a fixed memoryless VQ, herebydenoted safety-net VQ. The safety-net concept was introduced

    in [52], and has also been reported in [53] and [54]. Similar

    systems have also been studied in for example [10], [17], [18],

    [55], and [56]. In this paper we further develop the ideas, and

    study the performance for transmission over noisy channels.

    The main principle of the safety-net extension is illustrated

    in Fig. 4. A memory-based VQ is combined with a fixed

    memoryless VQ that operates independently of the memory-

    based VQ. At each coding instant both codebooks are searched

    for the best codevector.

    By using this arrangement, we aim to achieve three objec-

    tives.

    To encode outliers, i.e., low-correlation frames, sep-

    arately from the typical high-correlation frames. Many

    memory-based VQ systems show good performance for

    highly correlated input vectors, but perform worse than

    memoryless systems for the occasional low correlation

    frames. This results in low average distortion, but the

    number of high distortion frames increases. In encoding

    of, for example, spectrum coefficients, this is a serious

    2 Note that there might be considerable long-time dependencies in the LSFvectors, since the speaker can be expected to repeat phonemes at irregularintervals. However, these dependencies are difficult to exploit by linearprediction.

    problem, since there is a significant perceptual importance

    of keeping the number of high distortion frames low.

    This fact is emphasized in several studies [13], [57].

    By adding a fixed memoryless codebook to the memory-

    based VQ system, the low correlation frames are encoded

    in a standard memoryless VQ, and a lower number of high

    distortion frames can be expected.

    Since outliers are separately encoded in the safety-net VQ,

    the memory-based VQ can focus on the highly correlated

    frames. A standard memory-based VQ encodes frames

    with both high and low interframe correlation in the

    same quantizer. The VQ must be designed to handle

    both these cases, and the high interframe correlation in

    the typical frames cannot be fully exploited. Some of

    the potential performance gain of exploiting interframe

    correlation in the memory-based VQ is lost due to the

    need to compromise. The addition of a fixed memoryless

    codebook that encodes outliers separately enables the

    memory-based VQ to exploit interframe correlation to a

    higher degree, and lower average distortion should result.

    A serious objection to memory-based VQ systems is theperformance when the index must be transmitted over

    a noisy channel, which is often the case in realistic

    systems. An error in a memory-based VQ transmission

    leads to error propagation, i.e., to a sequence of frames

    where the internal state of the encoder and the decoder

    differs, and thus a sequence of data with large errors is

    produced. Most systems with memory forget the bit

    error reasonably fast, but error propagation is nevertheless

    a serious problem in memory-based VQs. By including a

    fixed memoryless codebook, error propagation is canceled

    every time an entry from the fixed codebook is selected

    and correctly transmitted to the decoder. The improve-

    ments in performance over noisy channels is perhapsthe strongest reason for extending the design with a

    memoryless codebook. In Section V this subject is studied

    in more detail.

    The combination of the two VQs can be described as

    (14)

    where a fixed memoryless codebook is combined with

    an adaptive memory-based codebook , resulting in the

    extended codebook . The search process is performed by

    first searching the adaptive codebook for the best vector,

    then searching the fixed codebook for the best fixed vector.

    The winning candidates from the two codebooks are compared,and the best of these two vectors,3 denoted , is encoded and

    transmitted to the decoder as follows:

    (15)

    3 The distortion criterion we have used to find the two candidate vectorsis the weighted minimum squared error criterion (see Section VI-A), mainlydue to the comparably low complexity. When the best of the two candidatesshall be chosen, more complex criteria can be considered since only twovectors shall be compared; here we have used the spectral distance measure(Section VI-A).

  • 8/3/2019 Inter Frame Lsf

    8/15

  • 8/3/2019 Inter Frame Lsf

    9/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 503

    Fig. 6. Principle of a 2-D DCVQ. The nearest neighbors to the previouscodevector together with the fixed codebook form the combined codebook.

    number of outliers. PVQ design procedures are described in

    Section II-A.

    C. Safety-Net FSVQ

    The second safety-net method is a combination of the simple

    nearest neighbor FSVQ technique, described in Section II-

    B, and a safety-net VQ. It was first presented in [52] and

    will be referred to as dynamic combination VQ (DCVQ).

    Derailment occurs for NN-FSVQ when the input vector has

    low correlation with the previous vector, and thus no good

    representation of the input vector exists in the NN-FSVQ

    codebook. Since the problem with outlying vectors propagate

    to the next frame, due to the memory in the quantizer,

    the result is a sequence of inadequately quantized vectors.

    By introducing a safety-net to take care of outliers, theDCVQ solves the derailment problem. Hence, performance

    for transmission over noisy channels is also significantly

    improved. The major disadvantage of the DCVQ technique

    is the same as one of the major problems with the NN-FSVQ;

    that the storing requirements are large. An illustration of the

    combination of a NN-FSVQ and a fixed memoryless quantizer

    is given in Fig. 6.

    The design of the DCVQ is simple, as described earlier:

    The NN-FSVQ is trained using the full training database, and

    a nearest neighbor table is stored, as described in Section II-B.

    The safety-net VQ is also trained using the full training

    database. Closed-loop training procedures can be applied for

    this case as well, but in general the improvement is negligible.

    V. MEMORY-BASED VQ ON NOISY CHANNEL

    For the case of a memoryless VQ, the effect of channel noise

    is straightforward. An error in the transmission of a codeword

    index only effects the distortion of the current vector, since no

    memory is incorporated. Systems with memory are affected

    differently by channel errors than memoryless systems because

    the memory in the decoder causes error propagation. The

    effects of error propagation can be very serious in some

    systems, if precautions are not taken. In, for example, a nearest

    neighbor FSVQ a bit error could cause the system to derail and

    never recover. In other systems, error propagation causes long

    sequences of highly distorted vectors. One way to decrease the

    effect of channel errors for memory-based VQ schemes is to

    periodically perform a full search that forces the code into the

    best possible state which is transmitted to the decoder. This

    should be done quite infrequently, as the cost of sending extra

    information gets high. This method is not suitable for PVQ

    systems, because of the infinite number of states.

    In this section, we study performance of memory-based VQ

    systems operating on noisy channels, and try to decrease the

    effects of error propagation. Other work that treat memory-

    based LSF quantization in the presence of channel noise

    include [58] and [40], while for example [59] and [13]

    investigates noisy channel performance of memoryless LSF

    quantization.

    A. Optimization of Index Assignment for Memory-Based VQ

    Index assignmentis the procedure of numbering the vectors

    in a vector quantizer (assigning indices to the vectors). Noisy

    channel performance of vector quantizers having random index

    assignments is in general poor. In order to minimize the effect

    of channel errors on the output signal, the codebook should be

    reordered such that the Hamming distance (assuming a binary

    channel) between any two codevector indices corresponds

    closely to the Euclidean distance between the corresponding

    codevectors. For this ordering problem, it is hard to find

    optimal solutions. A number of suboptimal algorithms have

    been proposed [60][63]. We have applied a fast and reli-

    able method denoted the linearity increasing swap algorithm

    (LISA), described in [63]. The choice of LISA for index

    assignment is justified by the superior speed compared to other

    methods (10 bits VQs are processed in seconds).

    In this study, procedures to improve the index assignmentare applied to all vector quantizers. The VQ schemes in

    this comparison benefit from improving index assignment

    to various degrees. The gains are larger for methods with

    memory, because the effect of error propagation is reduced.

    It is not obvious how to apply such an algorithm for all of the

    coding schemes, therefore we will here describe briefly how

    it has been done.

    1) Index Assignment for PVQ: The possible reconstruction

    vectors at time are , where is the

    prediction from previous coded vectors, is the codeword

    in the actual prediction error codebook and is the codeword

    index. Evidently, the distance between codevectors and

    is the same as the distance between the correspondingvectors in the codebook and . Thus an index assign-

    ment algorithm operating on the final reconstruction vectors is

    identical to one operating on the codebook (i.e., not changing

    with time). Consequently, for PVQ we can simply apply an

    algorithm that improves the index assignment of the prediction

    error codebook directly.

    2) Index Assignment for FSVQ: For each state we have as-

    signed one codebook and hence we can optimize the index

    assignment for each of the state codebooks independently. Still

    there is a problem when, due to channel errors, the encoder

    and the decoder do not agree on the current state. In this

  • 8/3/2019 Inter Frame Lsf

    10/15

    504 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    case, another index assignment that takes into account that

    the state is not correct would be optimal, but not practically

    implementable. Thus, applying the index assignment algorithm

    independently to each state codebook for FSVQ will result in

    less frequent erroneous state decisions, but cannot improve

    performance if they do occur.

    3) Index Assignment for Safety-Net Methods: This problem

    is more complicated because of the existence of two code-

    books. If, for each coding instant, the adaptive and the memo-

    ryless codebook are combined into one, and the optimization is

    carried out for the combined codebook, the best possible index

    assignment is achieved. This is only feasible if all possible

    codebook combinations are known in the design procedure.

    Otherwise, a new index assignment must be found at every

    coding instant. For DCVQ all combinations of the codebooks

    are known on beforehand, which means, that at least theo-

    retically, it is possible to find an optimal index assignment.

    Because of the increased complexity and storage requirements

    that result, we have not implemented this strategy. Instead, we

    have applied the index assignment algorithm independently to

    the memoryless VQ and to the adaptive VQ. For SN-PVQwe have also applied the index assignment algorithm to the

    two codebooks independently, but we have also improved the

    index assignment by the help of a simple algorithm presented

    in [22]. In short, a few different index assignments for the

    adaptive VQ are precomputed, and which of these to use is

    determined by classification of the current prediction vector.

    The safety-net VQ still uses independently optimized index

    assignment.

    B. Channel Optimization for Memory-Based VQ

    Index assignment does not take into account any explicit

    knowledge about the channel error probability. If knowledge

    about the channel can be incorporated in the design, perfor-

    mance can be significantly improved. This is usually referred

    to as channel optimized VQ (COVQ) [60]. A disadvantage is

    that the performance degrades if the actual channel differs from

    the design channel, or is changing with time. As is the case

    in the index assignment design, a simultaneously optimized

    COVQ requires that all combinations of the codebooks are

    known already in the design procedure for the safety-net

    extended systems. Hence, only independent COVQ designs for

    the adaptive VQ and the memoryless VQ are feasible. In this

    paper, we have not experimentally evaluated this method of

    improving noisy channel performance, but in [40] it is shown

    how COVQ can improve performance for omniscient FSVQ,and in [64], COVQ is employed to improve PVQ and SN-

    PVQ. In [65], a design method that simultaneously trains the

    codebook and the predictor for noisy channel PVQ is proposed.

    C. Reducing Error Propagation

    The fact that codevectors from the memoryless VQ are

    frequently chosen implies that the error propagation is much

    less prominent in a memory-based VQ scheme that includes

    a safety-net than in one without a safety-net. The reason is

    that if a codevector is chosen from the memoryless codebook

    and the corresponding index is correctly conveyed over the

    channel, the decoder will be forced into the same state as

    the encoder. Consequently, it is desirable to increase the

    number of times the encoder chooses a codevector from the

    memoryless codebook as much as possible if the channel is

    noisy, without increasing the total distortion noticeably. One

    way to accomplish this is to study the relative number of

    codevectors in the memoryless and adaptive codebook. If

    and in (16) are equal, one bit is used to distinguish which

    codebook the current vector originates from. If we want to

    increase the usage of the memoryless VQ we can simply

    increase the size of the memoryless codebook at the same

    time decreasing the size of the adaptive codebook. However,

    due to the indexing problems that arise when the codebook

    size is not a power of two we have chosen not to investigate

    other choices than equal sizes and .

    Another way to increase usage of the memoryless VQ is

    to bias the selection process to favor the safety-net vectors.

    The bias can be a constant, or it can be a function of the

    number of transmitted vectors since the last time a safety-net

    vector was selected. With this method, the attractive limited

    error propagation feature of moving average prediction canbe mimicked, by forcing the encoder to select a memoryless

    vector after a predetermined number of vectors from the

    memory-based VQ. This bias should be chosen depending

    on the actual channel statistics. However, in this work we

    use a constant bias of 0.15 dB (in SD) which is a heuristic

    compromise for the range of error probabilities that was used

    in the noisy channel experiments. Additional experiments on

    biased decision can be found in [22].

    D. Optimization of the Prediction Matrices for PVQ Systems

    The performance of LSF quantization in a PVQ system

    deteriorates much faster with increasing channel noise thanthe performance of memoryless LSF VQ. This fact motivated

    us to improve the PVQ system for use over noisy channels.

    In Fig. 1, a PVQ system is depicted. The VQ and the channel

    in the system are modeled as white Gaussian noise sources,

    see Fig. 7.

    If we try to optimize the prediction matrices of the system,

    in order to minimize the effects of quantization and channel

    noise, we find that this problem is hard to treat mathematically.

    However, for noiseless channels we have experimentally found

    that the optimum predictor matrices are close to diagonal. Thus

    the vector predictor can be approximated as a set of scalar

    predictors. Some of the above approximations can be avoided

    by orthogonalizing the input vectors before the analysis,by use of the KarhunenLoeve transform (see, e.g., [66]).

    However, we find the approximations reasonable. For example,

    by excluding all components of the prediction matrices outside

    the diagonals, we have found that only about 0.15 bits is lost

    for the full LSF vector. Therefore, we proceed by analyzing

    the problem as a set of independent scalar problems.

    Finding the equations for the optimum prediction coeffi-

    cients for a set of independent problems is comparably simple.

    For noiseless channels, the result

    (17)

  • 8/3/2019 Inter Frame Lsf

    11/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 505

    Fig. 7. Model of VQ and channel in a PVQ system as Gaussian noise sources.

    is obvious from Fig. 1, but worthwhile to emphasize. For noisy

    channels, a term is added to , where is the

    vector impulse response of the decoder filter, and denotes

    convolution. Since we have a set of approximately independent

    problems, we can consider the components of the vectors one

    at a time. From (17) with the added channel noise term, we

    write the error in a component as

    (18)where is the power transfer factor for a component of the

    decoder filter, i.e., the factor by which the power of a white

    noise input signal is amplified by the filter. The quantizer and

    channel error variances are assumed to be proportional to the

    prediction error variance ,

    and (19)

    and we rewrite (18) as

    (20)

    This result is also derived in [66]. Now we want to express

    the power transfer factor, , and the prediction error variance,

    , as functions of the coefficients of the input process and

    prediction filter. For the sake of simplicity, we restrict the

    calculations to first- and second-order AR processes, generated

    by

    (21)

    The linear predictor is written as

    (22)

    After some work, we find expressions for and :

    (23)

    (24)

    By inserting (23) and (24) in (20) we obtain an expression

    for the error variance of the PVQ system in the presence of

    channel noise. Also, for given values of , , , , and

    (derived from the VQ, the channel and the input process),

    we can find the optimum values of and . Even for this

    simplified system, an analytic solution is hard to find, but a

    Fig. 8. Performance of a 20-bit SN-PVQ as a function of the mix betweenvectors in the memory-based VQ and the safety-net VQ in terms of averagespectral distortion.

    numerical solution is easily obtained. Note that (20) must be

    independently solved for each component in the LSF vector.

    The result from the above analysis is used to improve

    the PVQ and SN-PVQ performance for noisy channels, in

    Section VI-B. The diagonal elements of the prediction ma-

    trices are optimized for high noise levels ( ), and

    no optimization for actual channel noise level is performed.

    That is, the error probability of the channel is not a designparameter. Even the results for noiseless channels are obtained

    with the prediction matrices optimized for high noise lev-

    els. However, the noiseless spectral distortion increase when

    the matrix is optimized for high noise is small, 0.020.04

    dB, while the gains for severely degraded channels can be

    several dB.

    In [67], Chang and Donaldson derive formulas for optimum

    predictor coefficients for scalar DPCM systems. Jayant and

    Noll [66] give a general overview of the problem of transmit-

    ting DPCM over noisy channels, and Noll [68] analyzes the

    noisy channel performance of PCM and DPCM quantizing

    schemes.

    VI. EXPERIMENTS

    In this section, the experiments used to determine optimal

    parameters of the memorybased and safety-net methods are

    described. Comparisons of all tested methods are given, both

    for noiseless and noisy channels. To verify the objective

    results, a listening test is presented.

    A. Experimental Setup

    The speech training database used to design all the VQs

    in this work consists of 250 000 vectors. Another set of

  • 8/3/2019 Inter Frame Lsf

    12/15

    506 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    20 000 vectors is used for evaluation.4 The speech is recorded

    from FM radio and includes a large number of speakers of

    both gender. The language is mostly Swedish. The speech is

    digitized at 16 kHz, lowpass-filtered at 3.4 kHz and decimated

    to 8 kHz sampling frequency. A tenth-order LPC analysis

    using the stabilized covariance method with high-frequency

    compensation and error weighting [13] is performed every

    20 ms using a 25 ms analysis window. A fixed 10 Hz

    bandwidth expansion is applied to each pole of the LPC

    coefficient vector.

    One of the key issues in vector quantization is the selec-

    tion of an appropriate distortion measure for the codebook

    search. The Euclidean distance measure is often used for its

    simplicity. Here, we employ the weighted Euclidean distance

    measure presented in [13] that has shown to improve both

    the objective quality (measured in spectral distortion, SD),

    and the subjective quality of the coded speech. This distance

    measure has been used in the design and evaluation of all VQ

    techniques presented in this work except for the design of the

    next-state functions in OT-FSVQ and DCVQ, where the un-

    weighted Euclidean measure was employed. For measuring thequantization performance, we calculate the spectral distortion

    in the 03 kHz range.

    Large savings in complexity and storage requirements can

    be achieved if a product-code technique is employed. In

    this work we have utilized a three-split VQ scheme for allquantizers, where the dimension in each split are 3, 3, and 4,

    respectively. An important design issue for a split VQ system

    is the number of bits to allocate for the individual VQs. It is

    common that the bits are evenly distributed over the splits in

    order to keep the largest codebook as small as possible [13],

    [40]. However, since the difference in complexity is relatively

    small, we have here used the bit configurations that result in

    the best performance in terms of average spectral distortion.A typical example is 24-bit VQs where 8 bits were used for

    the first split, 9 bits for the second, and 7 bits for the last split.

    All the investigated quantization methods used the same bit

    allocations.

    For the PVQ systems first order prediction is used, with the

    prediction matrix optimized for high noise level according to

    (20), (23), and (24) ( ). For the OT-FSVQ schemes

    the number of states is chosen to be eight. We will not report

    on any results for NN-FSVQ, instead the FSVQ class will be

    represented by OT-FSVQ, due to its superior performance. In

    the following MLVQ denotes memoryless VQ.

    B. Performance for Noiseless Channels

    An important aspect of the design of a safety-net extended

    memory-based VQ system is what codebook sizes that should

    be assigned to the safety-net VQ and the memory-based VQ,

    respectively. We have investigated the performance of a 20-bit

    SN-PVQ for five different constellations, and the results of the

    simulations are depicted in Fig. 8. These results indicate that

    4 In this study, we are mostly interested in the relative performancebetween different quantization schemes and the size of the evaluation setis considered sufficient for this purpose. Note, as mentioned in Section I, thatthe performance in absolute figures may differ if another speech material isused.

    Fig. 9. Performance of the VQs in terms of average spectral distortion.The curves correspond to: A) memoryless VQ; B) OT-FSVQ; C) PVQ; D)DCVQ,; and E) SN-PVQ.

    the best performance is obtained by choosing the safety-net

    size to be somewhere between 2550% of the total size. This

    experiment, together with the discussions in Sections IV-A

    and V-C, motivate us to use a mix coefficient of 50% in all

    experiments.

    In Fig. 9, the average SD for the investigated coding

    schemes is plotted as a function of the number of bits used. For

    the safety-net configurations, one bit was used to determine the

    chosen codebook. From this figure it is clear that all memory-

    based VQ methods can utilize the interframe correlation and

    achieve performance significantly better than the memoryless

    VQ. Among the memory-based methods, the SN-PVQ is

    clearly the best in these simulations followed by DCVQ and

    PVQ and last OT-FSVQ. When employing the SN-PVQ the

    required rate can be reduced by 45 bits/vector compared to

    the memoryless VQ without reduction in performance. It can

    also be seen that if a memory-based scheme is extended with

    a safety-net, approximately 1 bit is gained. If we compare theresults with what was theoretically predicted in Section III-C,

    we can conclude that with SN-PVQ performance close to the

    predicted is achieved.

    Differences in the analysis conditions and databases make it

    difficult to compare our results to other similar work. However,

    we can, if the analysis conditions are similar, compare the

    relative improvement of using a memory-based VQ scheme

    compared to a memoryless VQ. For OT-FSVQ, we compare

    our results to the results by Hussain and Farvardin in [40].

    They report a performance gain of slightly less than 3 bits

    for the OT-FSVQ, which is very close to what is obtained in

    this work. For the case of PVQ it is more difficult to find a

    comparable investigation. For example, Loo and Chan in [35],[36] report a gain of 56 bits for PVQ, but for a completely

    different coding situation than the one in this work.

    In Table III, the performance both in average SD as well as

    outlier percentage is depicted for all five investigated coding

    methods at 24 bits. As expected, the introduction of a safety-

    net VQ does not only decrease the average distortion but also

    the number of outliers.

    C. Performance for Noisy Channels

    In the preceding section, we have verified that a number of

    memory-based VQ schemes outperform conventional mem-

  • 8/3/2019 Inter Frame Lsf

    13/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 507

    TABLE IIIQUANTIZER PERFORMANCE AT 24 BITS AND BIT ERROR RATE q OF 0% AND 0.5%

    Fig. 10. Performance of the VQs at 24 bits in terms of average spectraldistortion as a function of bit error rate. The curves correspond to: (a)memoryless VQ, (b) OT-FSVQ, (c) PVQ, (d) DCVQ, and (e) SN-PVQ.

    TABLE IVS D COMPARISON OF SN-PVQ, PVQ, AND

    MEMORYLESS VQ FOR DIFFERENT BIT ERROR RATES

    oryless VQs under noiseless conditions. However, in order

    to be useful for practical applications, it is essential that the

    coding scheme can cope with channel noise. Therefore we

    have performed a study of the behavior under noisy conditions.

    Here we assume a memoryless binary symmetric channel with

    bit error probability . For all vector quantizers, procedures

    to improve the index assignment are applied, as described in

    Section V-A.

    The performance for all methods under equal noisy con-

    ditions at 24 bits in terms of average SD are depicted in

    Fig. 10. From the curves in Fig. 10 we conclude that SN-

    PVQ is better than all other methods for all tested error rates.

    The other memory-based methods only perform better than

    the memoryless for small error probabilities. OT-FSVQ is the

    scheme in this investigation that is most sensitive to channel

    errors. Again we see that the introduction of a safety-net VQ

    clearly improves performance. In Table III average SD and

    outlier percentage is presented for . Even though low

    values of average SD are achieved for some of the methods,

    the number of outliers caused by bit errors are high and hence

    the distortion is clearly audible.

    Fig. 11. Synthetic speech production for the listening tests.

    For high error probabilities, the average SD is higher for

    all methods than what can be accepted in most applications.However, the results for high error rates can be significantly

    improved if channel coding is applied, see for instance [13].

    We have also found that the gain of using larger codebooks is

    almost negligible for high error rates. Thus, if more bits can

    be used it is more efficient to use them for channel coding

    than increasing the codebook sizes.

    Another interesting comparison is given in Table IV. Here

    we compare a 20-bit SN-PVQ, with a 21-bit PVQ and a 24-bit

    memoryless VQ, which all perform approximately equal with-

    out noise. The results in Table IV lead to the conclusion that

    a saving of 4 bits compared to a memoryless VQ can be

    obtained for all tested error rates by SN-PVQ. Compared to a

    PVQ scheme, an improvement of at least 1 bit is achievable.

    Note that the performance degrades more for the PVQ system

    when the bit error rate is increased, compared to the other two

    methods. Hence, for large error probabilities the PVQ loses

    more than 1 bit compared to SN-PVQ.

    D. Subjective Evaluation

    We have performed listening tests to verify the objective

    results in the previous subsection. In the test, a 20-bit SN-

    PVQ was compared to a 24-bit memoryless VQ. The coders

    were compared both for noiseless conditions and for a bit

    error rate of 1%.

    A diagram of the model for studying the effects of quantiza-tion of the LSF parameters is shown in Fig. 11. A prediction

    residual is formed by filtering the speech signal using an

    unquantized prediction filter, and synthetic speech is generated

    by exciting a quantized inverse prediction filter with the undis-

    torted residual. In this way, the effects of LSF quantization can

    be studied separately from any encoding of the residual.

    Twelve short Swedish sentences uttered by male and female

    speakers are encoded by the memoryless VQ and the SN-

    PVQ, with and without channel noise. The sentences were

    pairwise compared, including some comparisons with the

    uncoded original sentences. Twelve test persons listened with

    headphones to each pair (a total of 60 pairs) and were asked to

    indicate a preference for either the first or the second sentence.The listening tests revealed that for a noiseless channel, the

    20-bit SN-PVQ was preferred to the 24-bit memoryless VQ

    in 58% of the comparisons. For a channel with 1% bit errors,

    the result is very clear: A 20-bit SN-PVQ was preferred to a

    24-bit memoryless VQ in 78% of the comparisons. Statistical

    tests verified that at confidence level of 95%, the SN-PVQ is

    preferred to the memoryless VQ, both for the noisy and the

    noiseless case.

    The outcome from the listening tests show that the objec-

    tive performance advantage for the SN-PVQ over a standard

    memoryless VQ also holds in subjective tests.

  • 8/3/2019 Inter Frame Lsf

    14/15

    508 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 7, NO. 5, SEPTEMBER 1999

    VII. CONCLUSIONS

    The most important results of the experiments can be

    summarized in the following points.

    A memory-based LSF quantizer has an advantage of

    35 bits over memoryless VQ for error-free transmission.

    The SN-PVQ method is the best in this work, with an

    advantage of 45 bits.

    A safety-net extension of an existing memory-based VQcan improve the performance with 12 bits for error-free

    transmission. For transmission over noisy channels, the

    performance gain is even larger.

    For noisy channels, conventional memory-based VQ

    methods rapidly lose the advantage over memoryless

    VQ. However, the proposed safety-net extension of the

    memory-based VQ algorithms improves the performance,

    and the SN-PVQ method is similar to the memoryless

    VQ for all tested error probabilities, with 4 bits less.

    The above objective results are further strengthened by

    subjective tests of speech quality. In the listening tests, a 20-

    bit SN-PVQ was preferred to a 24-bit memoryless VQ in 58%

    of the evaluated sentences for a noiseless channel, and in 78%of the sentences for a channel with 1% bit error rate.

    All results in this work are derived for 20 ms frames,

    windowed with an overlap of 2.5 ms on both sides. The

    difference between memory-based and memoryless methods

    will increase if the frame length is decreased, or if the overlap

    between frames is increased. The performance for all methods

    in general, and the memory-based methods in particular, will

    also improve if the channel noise distribution is assumed to

    be known, and channel optimization procedures can be used.

    REFERENCES

    [1] R. Viswanathan and J. Makhoul, Quantization properties of transmis-sion parameters in linear predictive systems, IEEE Trans. Acoust.,Speech, Signal Processing, vol. ASSP-23, pp. 309321, 1975.

    [2] A. H. Gray, Jr. and J. D. Markel, Quantization and bit allocation inspeech processing, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-24, pp. 459473, 1976.

    [3] F. Itakura, Line spectrum representation of linear predictive coefficientsof speech signals, J. Acoust. Soc. Amer., vol. 57, suppl. 1, p. S35(A),1975.

    [4] F. K. Soong and B.-H. Juang, Line spectrum pair (LSP) and speechdata compression, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, San Diego, CA, 1984, pp. 1.10.11.10.4.

    [5] N. Sugamura and N. Farvardin, Quantizer design in LSP speechanalysis-synthesis, IEEE J. Select. Areas Commun., vol. 6, pp. 432440,1988.

    [6] N. Farvardin and R. Laroia, Efficient encoding of speech LSP param-eters using the discrete cosine transformation, in Proc. IEEE Int. Conf.

    Acoustics, Speech, Signal Processing, Glasgow, U.K., 1989, vol. 1, pp.168171.

    [7] R. Hagen and P. Hedelin, Low bit-rate spectral coding in CELP, a newLSP-method, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Albuquerque, NM, 1990, pp. 189192.

    [8] F. K. Soong and B.-H. Juang, Optimal quantization of LSP parameters,IEEE Trans. Speech Audio Processing, vol. 1, pp. 1524, 1993.

    [9] A. Buzo, A. H. Gray, Jr., R. M. Gray, and J. D. Markel, Speech codingbased upon vector quantization, IEEE Trans. Acoust., Speech, SignalProcessing, vol. ASSP-28, pp. 562574, 1980.

    [10] J. Grass and P. Kabal, Methods of improving vector-scalar quantizationof LPC coefficients, in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, Toronto, Ont., Canada, 1991, pp. 657660.

    [11] R. Laroia, N. Phamdo, and N. Farvardin, Robust and efficient quanti-zation of speech LSP parameters using structured vector quantizers, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Toronto,Ont., Canada, 1991, pp. 641644.

    [12] P. Hedelin, Single-stage spectral quantization at 20 bits, in Proc. IEEEInt. Conf. Acoustics, Speech and Signal Processing, Adelaide, Australia,1994, vol. 1, pp. 525528.

    [13] K. K. Paliwal and B. S. Atal, Efficient vector quantization of LPCparameters at 24 bits/frame, IEEE Trans. Speech Audio Processing,vol. 1, pp. 314, 1993.

    [14] B.-H. Juang and A. H. Gray, Jr., Multiple stage vector quantization forspeech coding, in Proc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing, Paris, France, 1982, pp. 597600.

    [15] W. P. LeBlanc, B. Bhattacharya, S. A. Mahmoud, and V. Cuperman,

    Efficient search and design procedures for robust multi-stage VQ ofLPC parameters for 4 kb/s speech coding, IEEE Trans. Speech AudioProcessing, vol. 1, pp. 373385, 1993.

    [16] Y. Shoham, Vector predictive quantization of the spectral parametersfor low rate speech coding, in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Processing, Dallas, TX, 1987, vol. 4, pp. 21812184.

    [17] M. Yong, G. Davidsson, and A. Gersho, Encoding of LPC spectralparameters using switched-adaptive interframe vector prediction, inProc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, NewYork, NY, 1988, vol. 1, pp. 402405.

    [18] S. Wang, E. Paksoy, and A. Gersho, Product code vector quantization ofLPC parameters, in Speech and Audio Coding for Wireless and Network

    Applications, B. Atal, V. Cuperman, and A. Gersho, Eds. Boston, MA:Kluwer, 1993, pp. 251258.

    [19] V. Cuperman and A. Gersho, Vector predictive coding of speech at 16kbits/s, IEEE Trans. Commun., vol. COMM-33, pp. 685696, 1985.

    [20] J. Foster, R. M. Gray, and M. O. Dunham, Finite-state vector quanti-

    zation for waveform coding, IEEE Trans. Inform. Theory, vol. 31, pp.348359, 1985.[21] M. O. Dunham and R. M. Gray, An algorithm for the design of labeled-

    transition finite-state vector quantizers, IEEE Trans. Commun., vol.COMM-33, pp. 8389, 1985.

    [22] J. Linden, Interframe quantization of spectrum parameters in speechcoding, Licent. thesis, Tech. Rep. 235L, Chalmers Univ. Technol.,1996.

    [23] K. K. Paliwal and W. B. Kleijn, Quantization of LPC parameters,in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, Eds.New York: Elsevier, 1995, pp. 433466.

    [24] V. Cuperman and A. Gersho, Adaptive differential vector codingof speech, in Conf. Rec. GlobeCom, Miami, FL, 1982, vol. 3, pp.10921096.

    [25] T. R. Fischer and D. J. Tinnin, Quantized control with differential pulsecode modulation, in Proc. Conf. Decision and Control, Orlando, FL,1982, vol. 3, pp. 12221227.

    [26] P.-C. Chang and R. M. Gray, Gradient algorithms for designing predic-tive vector quantizers, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-34, pp. 679690, 1986.

    [27] R. A. Wiggins and E. A. Robinson, Recursive solution to the multi-channel filtering problem, J. Geophys. Res., vol. 70, pp. 18851891,1965.

    [28] J.-H. Chen and A. Gersho, Covariance and autocorrelation methods forvector linear prediction, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Dallas, TX, 1987, pp. 15451548.

    [29] J. Skoglund and J. Linden, Predictive VQ for noisy channel spectrumcoding: AR or MA?, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Munich, Germany, 1997, vol. 2, pp. 13511354.

    [30] H. Ohmuro, T. Moriya, K. Mano, and S. Miki, Coding of LSPparameters using interframe moving average prediction and multi-stagevector quantization, in Proc. IEEE Workshop on Speech Coding forTelecommunications, Quebec, P.Q., Canada, 1993, vol. 1, pp. 6364.

    [31] W. P. LeBlanc, C. Liu, and V. Viswanathan, An enhanced full ratespeech coder for digital cellular applications, in Proc. IEEE Int. Conf.

    Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 1, pp.569572.

    [32] A. Kataoka, J. Ikedo, and S. Hayashi, LSP and gain quantization forthe proposed ITU-T 8-kb/s speech coding standard, in Proc. IEEEWorkshop on Speech Coding for Telecommunications, Annapolis, MD,1995, vol. 1, pp. 78.

    [33] C.-C. Kuo, F.-R. Jean, and H.-C. Wang, Low bit-rate quantization ofLSP parameters using two-dimensional differential coding, in Proc.

    IEEE Int. Conf. Acoustics, Speech and Signal Processing, San Francisco,CA, 1992, vol. 1, pp. 97100.

    [34] E. Erzin and A. E. Cetin, Interframe differential vector coding of linespectrum frequencies, in Proc. IEEE Int. Conf. Acoustics, Speech andSignal Processing, Minneapolis, MN, 1993, vol. 2, pp. 2528.

    [35] J. H. Y. Loo, W.-Y. Chan, and P. Kabal, Classified nonlinear predictivevector quantization of speech spectral parameters, in Proc. IEEE Int.Conf. Acoustics, Speech, and Signal Processing, Atlanta, GA, 1996, vol.

  • 8/3/2019 Inter Frame Lsf

    15/15

    ERIKSSON et al.: INTERFRAME LSF QUANTIZATION 509

    2, pp. 761764.[36] J. H. Y. Loo and W. Y. Chan, Nonlinear predictive vector quantization

    of speech spectral parameters, in Proc. IEEE Workshop on SpeechCoding for Telecommunications, Annapolis, MD, 1995, vol. 1, pp.5152.

    [37] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression.Boston, MA: Kluwer, 1991.

    [38] J. Foster and R. M. Gray, Finite-state vector quantizers for waveformcoding, in Proc. IEEE Int. Symp. Information Theory, New York, NY,1982, vol. 1, pp. 134135.

    [39] R. Aravind and A. Gersho, Image compression based on vectorquantization with finite memory, Opt. Eng., vol. 26, pp. 570580, 1987.[40] Y. Hussain and N. Farvardin, Finite-state vector quantization over noisy

    channels and its application to LSP parameters, in Proc. IEEE Int. Conf.Acoustics, Speech and Signal Processing, San Francisco, CA, 1992, vol.2, pp. 133136.

    [41] S. Bruhn, Matrix product quantization for very-low-rate speech cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Detroit, MI, 1995, vol. 1, pp. 724727.

    [42] C. S. Xydeas and C. Papanastasiou, Efficient coding of LSP parametersusing split matrix quantization, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Detroit, MI, 1995, vol. 1, pp. 740743.

    [43] N. Phamdo and N. Farvardin, Coding of speech LSP parameters usingTSVQ with interblock noiseless coding, in Proc. IEEE Int. Conf.

    Acoustics, Speech and Signal Processing, Albuquerque, NM, 1990, pp.193196.

    [44] D. L. Neuhoff and N. Moayeri, Tree searched vector quantizationwith interblock noiseless coding, in Proc. Conf. Information ScienceSystems, 1988, pp. 781783.

    [45] S. Bruhn, Efficient interblock noiseless coding of speech LPC parame-ters, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Adelaide, Australia, 1994, vol. 1, pp. 501504.

    [46] D. P. Kemp, J. S. Collura, and T. E. Tremain, Multi-frame coding ofLPC parameters at 600800 bps, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Toronto, Ont., Canada, 1991, vol. 1, pp.609612.

    [47] C. S. Xydeas and K. K. M. So, A long history quantization approachto scalar and vector quantization of LSP coefficients, in Proc. IEEE

    Int. Conf. Acoustics, Speech and Signal Processing, Minneapolis, MN,1993, vol. 2, pp. 14.

    [48] M. A. Ferrer-Ballester and A. R. Figueiras-Vidal, Efficient adaptivevector quantization of LPC parameters, IEEE Trans. Speech AudioProcessing, vol. 3, pp. 314317, 1995.

    [49] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, 1971.

    [50] R. M. Gray, Source Coding Theory. Boston, MA: Kluwer, 1990.[51] Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizerdesign, IEEE Trans. Commun., vol. COMM-28, pp. 8495, 1980.

    [52] T. Eriksson, J. Linden, and J. Skoglund, A safety-net approach forimproved exploitation of speech correlations, in Proc. Int. Conf. DigitalSignal Processing, Cyprus, 1995, vol. 1, pp. 96101.

    [53] , Vector quantization of glottal pulses, in Proc. 4th Europ. Conf.Speech Communication and Technology, Madrid, Spain, 1995, vol. 1,pp. 225228.

    [54] , Exploiting interframe correlation in spectral quantizationAstudy of different memory VQ schemes, in Proc. IEEE Int. Conf.

    Acoustics, Speech and Signal Processing, Atlanta, GA, 1996, vol. 2,pp. 765768.

    [55] H. Zarrinkoub and P. Mermelstein, Switched prediction and quantiza-tion of LSP frequencies, in Proc. IEEE Int. Conf. Acoustics, Speechand Signal Processing, Atlanta, GA, 1996, vol. 2, pp. 757760.

    [56] E. Shlomot, Delayed decision switched prediction multi-stage LSFquantization, in Proc. IEEE Workshop on Speech Coding for Telecom-munications, Annapolis, MD, 1995, vol. 1, pp. 4546.

    [57] B. S. Atal, R. V. Cox, and P. Kroon, Spectral quantization andinterpolation for CELP coders, in Proc. IEEE Int. Conf. Acoustics,Speech and Signal Processing, Glasgow, U.K., 1989, pp. 6972.

    [58] S. L. DallAgnol, A. Alcaim, and J. R. B. de Marca, Performance ofLSF vector quantizers for VSELP coders in noisy channels, Eur. Trans.Telecommun., vol. 5, pp. 553563, 1994.

    [59] R. Hagen and P. Hedelin, Robust vector quantization in spectral cod-ing, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,Minneapolis, MN, 1993, vol. 2, pp. 1316.

    [60] N. Farvardin, A study of vector quantization for noisy channels, IEEETrans. Inform. Theory, vol. 36, pp. 799809, 1990.

    [61] K. Zeger and A. Gersho, Pseudo-Gray coding, IEEE Trans. Commun.,vol. 38, pp. 21472158, 1990.

    [62] P. Hedelin, P. Knagenhjelm, and M. Skoglund, Vector quantization forspeech transmission, in Speech Coding and Synthesis, W. B. Kleijn and

    K. K. Paliwal, Eds. New York: Elsevier, 1995, pp. 311345.[63] P. Knagenhjelm and E. Agrell, The Hadamard transformA tool for

    index assignment, IEEE Trans. Inform. Theory, vol. 42, pp. 11391151,1996.

    [64] T. Eriksson, J. Linden, and J. Skoglund, Improvements of memoryvector quantization for noisy channel transmission of LSF parameters,in Proc. Radio Vetenskap och Kommunikation, Lulea, Sweden, 1996,vol. 1, pp. 370374.

    [65] J. Linden and J. Skoglund, Channel optimization of predictive VQfor spectrum coding, in Proc. IEEE Workshop on Speech Coding for

    Telecommunications, Pocono Manor, PA, 1997, pp. 9394.[66] N. S. Jayant and P. Noll, Digital Coding of Waveforms. EnglewoodCliffs, NJ: Prentice-Hall, 1984.

    [67] K.-Y. Chang and R. W. Donaldson, Analysis, optimization, and sen-sitivity study of differential PCM systems operating on noisy com-munication channels, IEEE Trans. Commun., vol. 20, pp. 338350,1972.

    [68] P. Noll, On predictive quantization schemes, Bell Syst. Tech. J., pp.14991532, 1978.

    Thomas Eriksson was born in Skovde, Sweden, in1964. He received the M.S. degree in electrical engi-neering in 1990, and the Ph.D. degree in informationtheory in 1996, both from Chalmers University ofTechnology, Goteborg, Sweden.

    From 1990 to 1996, he was with the Depart-ment of Information Theory, Chalmers Universityof Technology. From 1997 to 1998, he was at AT&TLabsResearch, Florham Park, NJ, and in 1998 and1999 he was working on a joint research projectwith the Royal Institute of Technology and Ericsson

    Radio Systems AB, both in Stockholm, Sweden. He is currently an AssociateProfessor at the Department of Signals and Systems, Chalmers University ofTechnology, where his main research interests are vector quantization andspeech coding.

    Jan Linden (S92M98) was born in Goteborg,Sweden, in 1966. He received the M.S. degree inelectrical engineering, the Licentiate of Engineer-

    ing, and the Ph.D. degree in information theoryfrom Chalmers University of Technology, Goteborg,Sweden, in 1991, 1996, and 1998, respectively.

    From 1992 to 1998, he was a Research andTeaching Assistant at the Department of Informa-tion Theory, Chalmers University of Technology.His research at Chalmers includes low bit ratespeech coding based on glottal pulse modeling and

    memory-based vector quantization for noisy channels. He is currently a Post-Doctoral Researcher at the University of California, Santa Barbara (UCSB),and a Research Engineer at SignalCom, Inc., Goleta, CA. His researchat UCSB is focused on audio coding and wideband speech coding, andat SignalCom he is working on algorithm development for speech codingapplications.

    Jan Skoglund (SM93M98) was born inGoteborg, Sweden, in 1967. He received the M.S.degree in electrical engineering, the Lic. Eng.,and the Ph.D. degree in information theory fromChalmers University of Technology, Goteborg,Sweden, in 1992, 1996, and 1998, respectively.His Ph.D. dissertation addressed different aspectsof speech coding such as spectrumquantization,pulse excitation modeling, and perceptual coding.

    From 1992 to 1998 he was with the Departmentof Information Theory, Chalmers University of

    Technology. Since 1999, he is a Consultant at the Speech and ImageProcessing Service Research Laboratory, AT&T Labs-Research, ShannonLaboratory, Florham Park, NJ, where he is working on low bit rate speechcoding.