linear predictive coding documentation

36
CHAPTER 1 INTRODUCTION Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. A vocoder (play /ˈvoʊkoʊdər/, short for voice encoder) is an analysis/synthesis system, used to reproduce human speech. In the encoder, the input is passed through a multiband filter, each band is passed through an envelope follower, and the control signals from the envelope followers are communicated to the decoder. The decoder applies these (amplitude) control signals to corresponding filters in the synthesizer. Since the control signals change only slowly compared to the original speech waveform, the bandwidth required to transmit speech can be reduced. This allows more speech channels to share a radio circuit or submarine cable. By encoding the control signals, voice transmission can be secured against interception. The vocoder was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. 1

Upload: chakravarthy-gopi

Post on 25-Jan-2015

2.750 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Linear predictive coding  documentation

CHAPTER 1

INTRODUCTION

Linear predictive coding (LPC) is a tool used mostly in audio signal processing

and speech processing for representing the spectral envelope of a digital signal of speech in

compressed form, using the information of a linear predictive model. It is one of the most

powerful speech analysis techniques, and one of the most useful methods for encoding good

quality speech at a low bit rate and provides extremely accurate estimates of speech

parameters.

A vocoder (play /ˈvoʊkoʊdər/, short for voice encoder) is an analysis/synthesis

system, used to reproduce human speech. In the encoder, the input is passed through a

multiband filter, each band is passed through an envelope follower, and the control signals

from the envelope followers are communicated to the decoder. The decoder applies these

(amplitude) control signals to corresponding filters in the synthesizer. Since the control

signals change only slowly compared to the original speech waveform, the bandwidth

required to transmit speech can be reduced. This allows more speech channels to share a

radio circuit or submarine cable. By encoding the control signals, voice transmission can be

secured against interception.

The vocoder was originally developed as a speech coder for

telecommunications applications in the 1930s, the idea being to code speech for transmission.

Transmitting the parameters of a speech model instead of a digitized representation of the

speech waveform saves bandwidth in the communication channel; the parameters of the

model change relatively slowly, compared to the changes in the speech waveform that they

describe. Its primary use in this fashion is for secure radio communication, where voice has

to be encrypted and then transmitted. The advantage of this method of "encryption" is that no

'signal' is sent, but rather envelopes of the band pass filters. The receiving unit needs to be set

up in the same channel configuration to resynthesize a version of the original signal

spectrum. The vocoder as both hardware and software has also been used extensively as an

electronic musical instrument.

Whereas the vocoder analyzes speech, transforms it into electronically transmitted

information, and recreates it, The Voder (from Voice Operating Demonstrator) generates

1

Page 2: Linear predictive coding  documentation

synthesized speech by means of a console with fifteen touch-sensitive keys and a pedal,

basically consisting of the "second half" of the vocoder, but with manual filter controls,

needing a highly trained operator.

Since the late 1970s, most non-musical vocoders have been implemented using linear

prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole

IIR filter. In linear prediction coding, the all-pole filter replaces the band pass filter bank of

its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and

again at the decoder to re-apply the spectral shape of the target speech signal.

1.1 Organization of the project:

Chapter 1: Introduction

Chapter 2: General theory

Chapter 3: Block diagram Description

Chapter 4: Software Description

Chapter 5: Results and Conclusion

2

Page 3: Linear predictive coding  documentation

CHAPTER 2

GENERAL THEORY

2.1 Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end

of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and

plosive sounds). Although apparently crude, this model is actually a close approximation of

the reality of speech production. The glottis (the space between the vocal folds) produces the

buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract

(the throat and mouth) forms the tube, which is characterized by its resonances, which give

rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are

generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects

from the speech signal, and estimating the intensity and frequency of the remaining buzz. The

process of removing the formants is called inverse filtering, and the remaining signal after the

subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the

formants, and the residue signal, can be stored or transmitted somewhere else. LPC

synthesizes the speech signal by reversing the process: use the buzz parameters and the

residue to create a source signal, use the formants to create a filter (which represents the

tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the

speech signal, which are called frames; generally 30 to 50 frames per second give intelligible

speech with good compression.

2.2 LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as

such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly

(see linear prediction for definition of coefficients) is undesirable, since they are very

3

Page 4: Linear predictive coding  documentation

sensitive to errors. In other words, a very small error can distort the whole spectrum, or

worse, a small error might make the prediction filter unstable.

There are more advanced representations such as Log Area Ratios (LAR), line

spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP

decomposition has gained popularity, since it ensures stability of the predictor, and spectral

errors are local for small coefficient deviations

Log area ratios (LAR)

LAR can be used to represent reflection coefficients (another form for

linear prediction coefficients) for transmission over a channel. While not as efficient as line

spectral pairs (LSPs), log area ratios are much simpler to compute. Let be the kth reflection

coefficient of a filter, the k th LAR is:

Use of Log Area Ratios have now been mostly replaced by Line Spectral Pairs, but older

codecs, such as GSM-FR use LARs.

Line spectral pairs

Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent

linear prediction coefficients (LPC) for transmission over a channel. LSPs have several

properties (e.g. smaller sensitivity to quantization noise) that make them superior to direct

quantization of LPCs. For this reason, LSPs are very useful in speech coding.

Mathematical foundation

The LP polynomial can be decomposed into:

4

Page 5: Linear predictive coding  documentation

where P(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottis

open. While A(z) has complex roots anywhere within the unit circle (z-transform), P(z) and

Q(z) have the very useful property of only having roots on the unit circle, hence P is a

palindromic polynomial and Q an antipalindromic polynomial. So to find them we take a

test point and evaluate and using a grid of

points between 0 and pi. The zeros (roots) of P(z) and Q(z) also happen to be interspersed

which is why we swap coefficients as we find roots. So the process of finding the LSP

frequencies is basically finding the roots of two polynomials of order p + 1. The roots of P(z)

and Q(z) occur in symmetrical pairs at ±w, hence the name Line Spectrum Pairs (LSPs).

Because all the roots are complex and two roots are found at 0 and , only p/2 roots need to

be found for each polynomial. The output of the LSP search thus has p roots, hence the same

number of coefficients as the input LPC filter (not counting ).

To convert back to LPCs, we need to evaluate by "clocking"

an impulse through it N times (order of the filter), yielding the original filter, A(z).

Properties

Line spectral pairs have several interesting and useful properties. When the

roots of P(z) and Q(z) are interleaved, stability of the filter is ensured if and only if the roots

are monotonically increasing. Moreover, the closer two roots are, the more resonant the filter

is at the corresponding frequency. Because LSPs are not overly sensitive to quantization

noise and stability is easily ensured, LSP are widely used for quantizing LPC filters. Line

spectral frequencies can be interpolated.

Reflection coefficient

The reflection coefficient is used in physics and electrical engineering when wave

propagation in a medium containing discontinuities is considered. A reflection coefficient

describes either the amplitude or the intensity of a reflected wave relative to an incident

wave. The reflection coefficient is closely related to the transmission coefficient.

2.3 Pitch Period Estimation

Determining if a segment is a voiced or unvoiced sound is not all of the information that

is needed by the LPC decoder to accurately reproduce a speech signal . In order to produce an

input signal for the LPC filter the decoder also needs another attribute of the current speech

5

Page 6: Linear predictive coding  documentation

segment known as the pitch period. The period for any wave, including speech signals, can be

defined as the time required for one wave cycle to completely pass a fixed position. For

speech signals, the pitch period can be thought of as the period of the vocal cord vibration

that occurs during the production of voiced speech. Therefore, the pitch period is only needed

for the decoding of voiced segments and is not required for unvoiced segments since they are

produced by turbulent air flow not vocal cord vibrations.

It is very computationally intensive to determine the pitch period for a given segment

of speech. There are several different types of algorithms that could be used. One type of

algorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), will

have a maximum when k is equivalent to the pitch period. These algorithms usually detect a

maximum value by checking the autocorrelation value against a threshold value. One

problem with algorithms that use autocorrelation is that the validity of their results is

susceptible to interference as a result of other resonances in the vocal tract. When

interference occurs the algorithm cannot guarantee accurate results. Another problem with

autocorrelation algorithms occurs because voiced speech is not entirely periodic. This means

that the maximum will be lower than it should be for a true periodic signal.

2.4 Applications

LPC is generally used for speech analysis and resynthesis. It is used as a form of

voice compression by phone companies, for example in the GSM standard. It is also

used for secure wireless, where voice must be digitized, encrypted and sent over a

narrow voice channel; an early example of this is the US government's Navajo I.

LPC synthesis can be used to construct vocoders where musical instruments are used

as excitation signal to the time-varying filter estimated from a singer's speech. This is

somewhat popular in electronic music. Paul Lansky made the well-known computer

music piece not just more idle chatter using linear predictive coding. A 10th-order

LPC was used in the popular 1980's Speak & Spell educational toy.

Waveform ROM in some digital sample-based music synthesizers made by Yamaha

Corporation may be compressed using the LPC algorithm.

LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, and other lossless audio

codec’s.

6

Page 7: Linear predictive coding  documentation

2.4.1 Voice effects in music

For musical applications, a source of musical sounds is used as the carrier,

instead of extracting the fundamental frequency. For instance, one could use the sound of a

synthesizer as the input to the filter bank, a technique that became popular in the 1970s.

One of the earliest person who recognized the possibility of Vocoder/Voder

on the electronic music may be Werner Meyer- Eppler, a German physicist/experimental

acoustician/ phoneticist. In 1949, he published a thesis on the electronic music and speech

synthesis from the viewpoint of sound synthesis, and in 1951, he joined to the successful

proposal of establishment of WDR Cologne Studio for Electronic Music.

Siemens Synthesizer (c.1959) at Siemens Studio for Electronic Music was one of the

first attempt to divert vocoder to create music

One of the first attempt to divert vocoder to create music may be a “Siemens

Synthesizer” at Siemens Studio for Electronic Music, developed between 1956-1959.

In 1968, Robert Moog developed one of the first solid-state musical vocoder for

electronic music studio of University at Buffalo. In 1969, Bruce Haack built a

prototype vocoder, named "Farad" after Michael Faraday, and it was featured on his

rock album The Electric Lucifer released in the same year.

In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band

device inspired by the vocoder designs of Homer Dudley. It was originally called a

spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier

signal came from a Moog modular synthesizer, and the modulator from a microphone

input. The output of the 10-band vocoder was fairly intelligible, but relied on

specially articulated speech. Later improved vocoders use a high-pass filter to let

some sibilance through from the microphone; this ruins the device for its original

speech-coding application, but it makes the "talking synthesizer" effect much more

intelligible.

Carlos and Moog's vocoder was featured in several recordings, including the

soundtrack to Stanley Kubrick's A Clockwork Orange in which the vocoder sang the

vocal part of Beethoven's "Ninth Symphony". Also featured in the soundtrack was a

piece called "Time steps," which featured the vocoder in two sections. "Timesteps"

was originally intended as merely an introduction to vocoders for the "timid listener",

7

Page 8: Linear predictive coding  documentation

but Kubrick chose to include the piece on the soundtrack, much to the surprise of

Wendy Carlos.[citation needed]

Kraft werk's Autobahn (1974) was one of the first successful pop/rock albums to

feature vocoder vocals. Another of the early songs to feature a vocoder was "The

Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock

band The Alan Parsons Project; the vocoder also was used on later albums such as I

Robot. Following Alan Parsons' example, vocoders began to appear in pop music in

the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light

Orchestra used the vocoder in several albums such as Time (featuring the Roland VP-

330 Plus MkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talkin' Woman" both

from Out of the Blue (1977) use the vocoder extensively. Featured on the album are

the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI

or II).

2.4.2 speaker-dependent word recognition device

The speaker-dependent word recognition device is implemented using the

Motorola DSP56303. First the speaker will train the device by storing 10 different vowel

sounds into memory. Then the same speaker can repeat one of the ten words associated with

the vowel sound and the device can detect which word was repeated and flag an appropriate

output.

Fig 2.1: Training the Device

8

Vowel

Sound

Microphone

Input

A/D Converter

Calculate LPC coefficients

Store coefficients in memory

Page 9: Linear predictive coding  documentation

Fig 2.2: Word Recognition

2.5 Modern vocoder implementations

Even with the need to record several frequencies, and the additional unvoiced sounds,

the compression of the vocoder system is impressive. Standard speech-recording systems

capture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used in

speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate).

The sampling resolution is typically at least 12 or more bits per sample resolution (16 is

standard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder can

provide a reasonable good simulation of voice with as little as 2.4 kbit/s of data.

'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks.

G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves

slightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use even

lower data rates, but below 5 kbit/s voice quality begins to drop rapidly.

Several vocoder systems are used in NSA encryption systems:

LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding

Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016,

used in STU-III

Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band

encryptors such as the KY-57.

Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the

Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone.

Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32

kbit/s used in STE secure telephone (ADPCM is not a proper vocoder but rather a

9

Vowel Sound Microphone Input A/D Converter

Calculate LPC coefficients

Compare coefficients with the one in memoryoutput

Page 10: Linear predictive coding  documentation

waveform codec. ITU has gathered G.721 along with some other ADPCM codec’s

into G.726.)

Vocoders are also currently used in developing psychophysics, linguistics,

computational neuroscience and cochlear implant research.

Modern vocoders that are used in communication equipment and in voice storage

devices today are based on the following algorithms:

Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s)

Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)

Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s)

Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s)

Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s)

10

Page 11: Linear predictive coding  documentation

CHAPTER 3

BLOCK DIAGRAM

3.1 Block diagram description:

General block diagram (as shown in Fig 3.2) of LPC consists of the blocks

A/D Converter

End Point Detection

Pre-emphasis filter

Frame blocking

Hamming window

Auto-Correlation

Levinson-Durbin algorithm

Fig 3.1: LPC analysis and synthesis of speech

11

Page 12: Linear predictive coding  documentation

Fig 3.2: General Block Diagram

3.2.1 A/D Converter

For Motorola DSP56303, the device converts the analog signals to digital samples by

an ASM file called ‘core302.asm’. The samples are input from CODEC A/D input port as

shown in Fig. 5. The assembly file initializes the necessary peripheral settings for general I/O

purposes. Moreover, the file also contains a macro called wait data. The macro waits for a

sample and takes the sample in. The sampling rate is set to 8000 samples/second.

3.2.2 End Point Detection

In the end point detection, each sample taken from A/D converter is compared to a

volume threshold. If the sample is lower than the threshold, it is considered as background

noise and therefore disregarded. Otherwise, the DSP board, will output 4 bits high to Port B

to indicate readiness to process speech samples, and the next 2000 samples will be stored into

a buffer before processing.

3.2.3 Pre-emphasis filter

The pre-emphasis is a low-order digital filter. The filter has transfer function show in

Equation (3.1).

H(z) = 1 – 0.9375 z-1 (3.1)

The digitized speech signal goes through the filter to average transmission conditions,

noise backgrounds, and signal spectrums. The filter boosts up the high frequency components

of human voice and attenuates the low frequency component of human voice. Because human

voice typically has higher power at low frequencies, the filter renders the speech sample easy

for LPC calculation.

3.2.4 Frame blocking

The pre-emphasized speech samples are divided into 30-ms window frames. Each 30 ms

window frame consists of 240 samples as illustrated in Equation (3.2) and Equation (3.3).

12

A/D Converter Pre-emphasis filter

Hamming

Window

Frame BlockingAuto-Correlation

OutputSSD ComparisonLevinson-Durbin Algorithm

End Point Detection

Page 13: Linear predictive coding  documentation

(Sampling Rate)(Frame Length) = Number of Samples in a Frame (3.2)

(8000 samples/second)(0.030 second) = 240 samples (3.3)

In addition, adjacent window frames are separated by 80 samples (240 x 1/3), with

160 overlapping samples. The amount of separation and overlapping depends on frame

length. The frame length is chosen according to the sampling rate. The higher the sampling

rate, the larger the frame length to be accurate.

3.2.5 Hamming window

The windowing method involves multiplying the ideal impulse response with a

window function to generate a corresponding filter, which tapers the ideal impulse response.

Like the frequency sampling method, the windowing method produces a filter whose

frequency response approximates a desired frequency response. The windowing method,

however, tends to produce better results than the frequency sampling method.

The toolbox provides two functions for window-based filter design, fwind1 and

fwind2. fwind1 designs a two-dimensional filter by using a two-dimensional window that it

creates from one or two one-dimensional windows that you specify. fwind2 designs a two-

dimensional filter by using a specified two-dimensional window directly.

fwind1 supports two different methods for making the two-dimensional windows it uses:

Transforming a single one-dimensional window to create a two-dimensional window

that is nearly circularly symmetric, by using a process similar to rotation

Creating a rectangular, separable window from two one-dimensional windows, by

computing their outer product

The example below uses fwind1 to create an 11-by-11 filter from the desired frequency

response Hd. The example uses the Signal Processing Toolbox hamming function to create a

one-dimensional window, which fwind1 then extends to a two-dimensional window.

Hd = zeros(11,11); Hd(4:8,4:8) = 1;

[f1,f2] = freqspace(11,'meshgrid');

mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))

h = fwind1(Hd,hamming(11));

13

Page 14: Linear predictive coding  documentation

figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])

Below images shows Desired Two-Dimensional Frequency Response (left) and Actual Two-

Dimensional Frequency Response (right)

Fig 3.4: Two-Dimensional Frequency Response

Creating the Desired Frequency Response Matrix

The filter design functions fsamp2, fwind1, and fwind2 all create filters based on a

desired frequency response magnitude matrix. Frequency response is a mathematical function

describing the gain of a filter in response to different input frequencies.

3.2.6 Auto-Correlation

The Autocorrelation LPC block determines the coefficients of an N-step forward

linear predictor for the time-series in each length-M input channel, u, by minimizing the

prediction error in the least squares sense. A linear predictor is an FIR filter that predicts the

next value in a sequence from the present and past inputs. This technique has applications in

filter design, speech coding, spectral analysis, and system identification.

The Autocorrelation LPC block can output the prediction error for each channel as

polynomial coefficients, reflection coefficients, or both. It can also output the prediction error

power for each channel. The input u can be a scalar, un oriented vector, column vector,

sample-based row vector, or a matrix. Frame-based row vectors are not valid inputs. The

block treats all M-by-N matrix inputs as N channels of length M.

When you select Inherit prediction order from input dimensions, the prediction order,

N, is inherited from the input dimensions. Otherwise, you can use the Prediction order

14

Page 15: Linear predictive coding  documentation

parameter to specify the value of N. Note that N must be a scalar with a value less than the

length of the input channels or the block produces an error.

When Output(s) is set to A, port A is enabled. For each channel, port A outputs an

(N+1)-by-1 column vector, a = [1 a2 a3 ... aN+1]T, containing the coefficients of an Nth-order

moving average (MA) linear process that predicts the next value, ûM+1, in the input time-

series.

When Output(s) is set to K, port K is enabled. For each channel, port K outputs a

length-N column vector whose elements are the prediction error reflection coefficients. When

Output(s) is set to A and K, both port A and K are enabled, and each port outputs its respective

set of prediction coefficients for each channel.

When you select Output prediction error power (P), port P is enabled. The prediction

error power is output at port P as a vector whose length is the number of input channels.

3.2.7 Levinson-Durbin algorithm

Ra = b in the cases where:

R is a Hermitian, positive-definite, Toeplitz matrix.

b is identical to the first column of R shifted by one element and with the opposite

sign.

The input to the block, r = [r(1) r(2) ... r(n+1)], can be a vector or a matrix. If the input is a

matrix, the block treats each column as an independent channel and solves it separately. Each

channel of the input contains lags 0 through n of an autocorrelation sequence, which appear

in the matrix R.

15

Page 16: Linear predictive coding  documentation

The block can output the polynomial coefficients, A, the reflection coefficients, K,

and the prediction error power, P, in various combinations. The Output(s) parameter allows

you to enable the A and K outputs by selecting one of the following settings:

A — For each channel, port A outputs A=[1 a(2) a(3) ... a(n+1)], the solution to the

Levinson-Durbin equation. A has the same dimension as the input. You can also view

the elements of each output channel as the coefficients of an nth-order autoregressive

(AR) process.

K — For each channel, port K outputs K=[k(1) k(2) ... k(n)], which contains n

reflection coefficients and has the same dimension as the input, less one element. A

scalar input channel causes an error when you select K. You can use reflection

coefficients to realize a lattice representation of the AR process described later in this

page.

A and K — The block outputs both representations at their respective ports. A scalar

input channel causes an error when you select A and K.

Select the Output prediction error power (P) check box to output the prediction error

power for each channel, P. For each channel, P represents the power of the output of an FIR

filter with taps A and input autocorrelation described by r, where A represents a prediction

error filter and r is the input to the block. In this case, A is a whitening filter. P has one

element per input channel.

When you select the If the value of lag 0 is zero, A=[1 zeros], K=[zeros], P=0 check

box (default), an input channel whose r(1) element is zero generates a zero-valued output.

When you clear this check box, an input with r(1) = 0 generates NaNs in the output. In

general, an input with r(1) = 0 is invalid because it does not construct a positive-definite

matrix R. Often, however, blocks receive zero-valued inputs at the start of a simulation. The

check box allows you to avoid propagating NaNs during this period.

Applications

One application of the Levinson-Durbin formulation implemented by this block is in

the Yule-Walker AR problem, which concerns modeling an unknown system as an

autoregressive process. You would model such a process as the output of an all-pole IIR filter

with white Gaussian noise input. In the Yule-Walker problem, the use of the signal's

16

Page 17: Linear predictive coding  documentation

autocorrelation sequence to obtain an optimal estimate leads to an Ra = b equation of the type

shown above, which is most efficiently solved by Levinson-Durbin recursion. In this case, the

input to the block represents the autocorrelation sequence, with r(1) being the zero-lag value.

The output at the block's A port then contains the coefficients of the autoregressive process

that optimally models the system. The coefficients are ordered in descending powers of z, and

the AR process is minimum phase. The prediction error, G, defines the gain for the unknown

system, where :

The output at the block's K port contains the corresponding reflection coefficients,

[k(1) k(2) ... k(n)], for the lattice realization of this IIR filter. The Yule-Walker AR Estimator

block implements this autocorrelation-based method for AR model estimation, while the

Yule-Walker Method block extends the method to spectral estimation.

Another common application of the Levinson-Durbin algorithm is in linear predictive

coding, which is concerned with finding the coefficients of a moving average (MA) process

(or FIR filter) that predicts the next value of a signal from the current signal sample and a

finite number of past samples. In this case, the input to the block represents the signal's

autocorrelation sequence, with r(1) being the zero-lag value, and the output at the block's A

port contains the coefficients of the predictive MA process (in descending powers of z).

These coefficients solve the following optimization problem:

Again, the output at the block's K port contains the corresponding reflection

coefficients, [k(1) k(2) ... k(n)], for the lattice realization of this FIR filter. The

17

Page 18: Linear predictive coding  documentation

Autocorrelation LPC block in the Linear Prediction library implements this autocorrelation-

based prediction method.

3.2.8 Sum of Square of Difference comparison

The Sum of Square of Difference comparison is quantitative method to compare two

sets of LPC coefficients. Suppose one set of LPC coefficients in the template are

A’1, A’2, A’3, …., A’10, and another set of LPC coefficients obtained from a window frame

are A1, A2, A3, …., A’10.

SSD = (A’1 – A1)2 + (A’2 – A2)2 + (A’3 – A3)2 + …. + (A’10 – A10)2

Each time the window frame is shifted, SSD is calculated between LPC coefficients

from the window frame and every set of LPC coefficients in template. A minimum SSD

exists between LPC coefficients from a window frame and one set of LPC coefficients in

template. The one with the minimum SSD value is the closest match to the input vowel.

18

Page 19: Linear predictive coding  documentation

CHAPTER 4

SOFTWARE DESCRIPTION

4.1 MATLAB INTRODUCTION:

The name MATLAB stands for MATrix LABoratory. MATLAB was written

originally to provide easy access to matrix software developed by the LINPACK (linear

system package) and EISPACK (Eigen system package) projects. MATLAB is a high-

performance language for technical computing. It integrates computation, visualization, and

programming environment. Furthermore, MATLAB is a modern programming language

environment: it has sophisticated data structures, contains built-in editing and debugging

tools, and supports object-oriented programming. These factors make MATLAB an excellent

tool for teaching and research. MATLAB has many advantages compared to conventional

computer languages (e.g.,C, FORTRAN) for solving technical problems. MATLAB is an

interactive system whose basic data element is an array that does not require dimensioning.

The software package has been commercially available since 1984 and is now considered as a

standard tool at most universities and industries worldwide. It has powerful built-in routines

that enable a very wide variety of computations. It also has easy to use graphics commands

that make the visualization of results immediately available. Special c applications are

collected in packages referred to as toolbox. There are toolboxes for signal processing,

symbolic computation, control theory, simulation, optimization, and several other fields of

applied science and engineering.

4.2 Mathematical functions:

MATLAB offers many predefined mathematical functions for technical computing

which contains a large set of mathematical functions. Typing help elfun and help specfun

calls up full lists of elementary and special functions respectively. There is a long list of

mathematical functions that are built into MATLAB. These functions are called built-ins.

Many standard mathematical functions, such as sin(x), cos(x), tan(x), ex, ln(x), are evaluated

by the functions sin, cos, tan, exp, and log respectively in MATLAB.

4.3 Basic plotting:

MATLAB has an excellent set of graphic tools. Plotting a given data set or the results

of computation is possible with very few commands. We are highly encouraged to plot

19

Page 20: Linear predictive coding  documentation

mathematical functions and results of analysis as often as possible. Trying to understand

mathematical equations with graphics is an enjoyable and very efficient way of learning

mathematics.

4.4 Matrix generation:

Matrices are the basic elements of the MATLAB environment. A matrix is a two-

dimensional array consisting of m rows and n columns. Special cases are column vectors (n =

1) and row vectors (m = 1). MATLAB supports two types of operations, known as matrix

operations and array operations.

MATLAB provides functions that generate elementary matrices. The matrix of zeros, the

matrix of ones, and the identity matrix are returned by the functions zeros, ones, and eye,

respectively.

table 3.1:Elementary matrices

4.5 Programming in Matlab:

4.5.1 M-File scripts:

A script file is an external file that contains a sequence of MATLAB statements. Script

files

have a filename extension .m and are often called M-files. M-files can be scripts that simply

execute a series of MATLAB statements, or they can be functions that can accept arguments

and can produce one or more outputs.

4.5.2 Script side-effects:

All variables created in a script file are added to the workspace. This may have

undesirable effects, because:

Variables already existing in the workspace may be overwritten.

The execution of the script can be affected by the state variables in the workspace.

As a result, because scripts have some undesirable side-effects, it is better to code any

complicated applications using rather function M-file.

20

Page 21: Linear predictive coding  documentation

4.5.3 Input to script Files:

When a script file is executed, the variables that are used in the calculations within the

file must have assigned values. The assignment of a value to a variable can be done in three

ways.

1. The variable is defined in the script file.

2. The variable is defined in the command prompt.

3. The variable is entered when the script is executed.

4.5.4 Output Commands:

MATLAB automatically generates a display when commands are executed. In addition

to this automatic display, MATLAB has several commands that can be used to generate

displays or outputs. Two commands that are frequently used to generate output are: disp and

fprintf.

Table for disp and fprint commands

4.5.5 Saving output to a File:

In addition to displaying output on the screen, the command fprintf can be used for

writing

output to a file. The saved data can subsequently be used by MATLAB or other software’s.

To save the results of some computation to a file in a text format requires the following

steps:

1. Open a file using fopen

2. Write the output using fprintf

3. Close the file using fclose

4.6 Debugging M-Files:

4.6.1 Introduction:

This section introduces general techniques for finding errors in M-files. Debugging is

the process by which you isolate and fix errors in your program or code.

Debugging helps to correct two kind of errors:

Syntax errors - For example omitting a parenthesis or misspelling a function name.

21

Page 22: Linear predictive coding  documentation

Run-time errors - Run-time errors are usually apparent and difficult to track down.

They produce unexpected results.

4.6.2 Debugging process:

We can debug the M-files using the Editor/Debugger as well as using debugging

functions from the Command Window. The debugging process consists of Preparing for

debugging:

MATLAB is relatively easy to learn

Setting breakpoints

Running an M-file with breakpoints

Stepping through an M-file

Examining values

Correcting problems

Ending debugging

4.7 Strengths:

MATLAB may behave as a calculator or as a programming language

MATLAB combine nicely calculation and graphic plotting

MATLAB is interpreted (not compiled ),errors are easy to fix

MATLAB is optimized to be relatively fast when performing matrix operations

4.8 Weaknesses:

MATLAB is not a general purpose programming language such as C, C++, or

FORTRAN.

MATLAB is designed for scientific computing, and is not well suitable for other

applications.

MATLAB is an interpreted language, slower than a compiled language such as C++

MATLAB commands are specific for MATLAB usage. Most of them do not have a

direct equivalent with other programming language commands.

CHAPTER 5

22

Page 23: Linear predictive coding  documentation

RESULTS AND CONCLUSIONS

5.1 Result

The implementation of a LPC vocoder is really an exciting and challenging matter. A

lot of techniques were learned from the literature and practice during this work.

Looking to the complexity of the voiced / unvoiced decision in the LPC-10e DoD

vocoder, it is clear that a good algorithm must have a lot of intelligence and

adaptability in order to get good results. The main problem is the estimation of the

pitch. Secondly, a robust voiced / unvoiced decision is very important.

Fig 5.1: Screen shot of input

23

Page 24: Linear predictive coding  documentation

Fig 5.2: Screen shot of output

It was found that considering the memory of the LPC filter leads to better

results. The median filter was not able to give a smooth pitch contour. Some

techniques like avoiding abrupt changes in the pitch value and avoiding double and

half pitches should be incorporated in order to get better results.

24

Page 25: Linear predictive coding  documentation

REFERENCES

Partha S Malik, MATLAB and SIMULINK 3rd edition

Stephen J Chapman, MATLAB programming for engineers 2nd edition

www.wikipedia.org

www.mathworks.com

25