linear predictive coding documentation
DESCRIPTION
TRANSCRIPT
CHAPTER 1
INTRODUCTION
Linear predictive coding (LPC) is a tool used mostly in audio signal processing
and speech processing for representing the spectral envelope of a digital signal of speech in
compressed form, using the information of a linear predictive model. It is one of the most
powerful speech analysis techniques, and one of the most useful methods for encoding good
quality speech at a low bit rate and provides extremely accurate estimates of speech
parameters.
A vocoder (play /ˈvoʊkoʊdər/, short for voice encoder) is an analysis/synthesis
system, used to reproduce human speech. In the encoder, the input is passed through a
multiband filter, each band is passed through an envelope follower, and the control signals
from the envelope followers are communicated to the decoder. The decoder applies these
(amplitude) control signals to corresponding filters in the synthesizer. Since the control
signals change only slowly compared to the original speech waveform, the bandwidth
required to transmit speech can be reduced. This allows more speech channels to share a
radio circuit or submarine cable. By encoding the control signals, voice transmission can be
secured against interception.
The vocoder was originally developed as a speech coder for
telecommunications applications in the 1930s, the idea being to code speech for transmission.
Transmitting the parameters of a speech model instead of a digitized representation of the
speech waveform saves bandwidth in the communication channel; the parameters of the
model change relatively slowly, compared to the changes in the speech waveform that they
describe. Its primary use in this fashion is for secure radio communication, where voice has
to be encrypted and then transmitted. The advantage of this method of "encryption" is that no
'signal' is sent, but rather envelopes of the band pass filters. The receiving unit needs to be set
up in the same channel configuration to resynthesize a version of the original signal
spectrum. The vocoder as both hardware and software has also been used extensively as an
electronic musical instrument.
Whereas the vocoder analyzes speech, transforms it into electronically transmitted
information, and recreates it, The Voder (from Voice Operating Demonstrator) generates
1
synthesized speech by means of a console with fifteen touch-sensitive keys and a pedal,
basically consisting of the "second half" of the vocoder, but with manual filter controls,
needing a highly trained operator.
Since the late 1970s, most non-musical vocoders have been implemented using linear
prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole
IIR filter. In linear prediction coding, the all-pole filter replaces the band pass filter bank of
its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and
again at the decoder to re-apply the spectral shape of the target speech signal.
1.1 Organization of the project:
Chapter 1: Introduction
Chapter 2: General theory
Chapter 3: Block diagram Description
Chapter 4: Software Description
Chapter 5: Results and Conclusion
2
CHAPTER 2
GENERAL THEORY
2.1 Overview
LPC starts with the assumption that a speech signal is produced by a buzzer at the end
of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and
plosive sounds). Although apparently crude, this model is actually a close approximation of
the reality of speech production. The glottis (the space between the vocal folds) produces the
buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract
(the throat and mouth) forms the tube, which is characterized by its resonances, which give
rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are
generated by the action of the tongue, lips and throat during sibilants and plosives.
LPC analyzes the speech signal by estimating the formants, removing their effects
from the speech signal, and estimating the intensity and frequency of the remaining buzz. The
process of removing the formants is called inverse filtering, and the remaining signal after the
subtraction of the filtered modeled signal is called the residue.
The numbers which describe the intensity and frequency of the buzz, the
formants, and the residue signal, can be stored or transmitted somewhere else. LPC
synthesizes the speech signal by reversing the process: use the buzz parameters and the
residue to create a source signal, use the formants to create a filter (which represents the
tube), and run the source through the filter, resulting in speech.
Because speech signals vary with time, this process is done on short chunks of the
speech signal, which are called frames; generally 30 to 50 frames per second give intelligible
speech with good compression.
2.2 LPC coefficient representations
LPC is frequently used for transmitting spectral envelope information, and as
such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly
(see linear prediction for definition of coefficients) is undesirable, since they are very
3
sensitive to errors. In other words, a very small error can distort the whole spectrum, or
worse, a small error might make the prediction filter unstable.
There are more advanced representations such as Log Area Ratios (LAR), line
spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP
decomposition has gained popularity, since it ensures stability of the predictor, and spectral
errors are local for small coefficient deviations
Log area ratios (LAR)
LAR can be used to represent reflection coefficients (another form for
linear prediction coefficients) for transmission over a channel. While not as efficient as line
spectral pairs (LSPs), log area ratios are much simpler to compute. Let be the kth reflection
coefficient of a filter, the k th LAR is:
Use of Log Area Ratios have now been mostly replaced by Line Spectral Pairs, but older
codecs, such as GSM-FR use LARs.
Line spectral pairs
Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to represent
linear prediction coefficients (LPC) for transmission over a channel. LSPs have several
properties (e.g. smaller sensitivity to quantization noise) that make them superior to direct
quantization of LPCs. For this reason, LSPs are very useful in speech coding.
Mathematical foundation
The LP polynomial can be decomposed into:
4
where P(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottis
open. While A(z) has complex roots anywhere within the unit circle (z-transform), P(z) and
Q(z) have the very useful property of only having roots on the unit circle, hence P is a
palindromic polynomial and Q an antipalindromic polynomial. So to find them we take a
test point and evaluate and using a grid of
points between 0 and pi. The zeros (roots) of P(z) and Q(z) also happen to be interspersed
which is why we swap coefficients as we find roots. So the process of finding the LSP
frequencies is basically finding the roots of two polynomials of order p + 1. The roots of P(z)
and Q(z) occur in symmetrical pairs at ±w, hence the name Line Spectrum Pairs (LSPs).
Because all the roots are complex and two roots are found at 0 and , only p/2 roots need to
be found for each polynomial. The output of the LSP search thus has p roots, hence the same
number of coefficients as the input LPC filter (not counting ).
To convert back to LPCs, we need to evaluate by "clocking"
an impulse through it N times (order of the filter), yielding the original filter, A(z).
Properties
Line spectral pairs have several interesting and useful properties. When the
roots of P(z) and Q(z) are interleaved, stability of the filter is ensured if and only if the roots
are monotonically increasing. Moreover, the closer two roots are, the more resonant the filter
is at the corresponding frequency. Because LSPs are not overly sensitive to quantization
noise and stability is easily ensured, LSP are widely used for quantizing LPC filters. Line
spectral frequencies can be interpolated.
Reflection coefficient
The reflection coefficient is used in physics and electrical engineering when wave
propagation in a medium containing discontinuities is considered. A reflection coefficient
describes either the amplitude or the intensity of a reflected wave relative to an incident
wave. The reflection coefficient is closely related to the transmission coefficient.
2.3 Pitch Period Estimation
Determining if a segment is a voiced or unvoiced sound is not all of the information that
is needed by the LPC decoder to accurately reproduce a speech signal . In order to produce an
input signal for the LPC filter the decoder also needs another attribute of the current speech
5
segment known as the pitch period. The period for any wave, including speech signals, can be
defined as the time required for one wave cycle to completely pass a fixed position. For
speech signals, the pitch period can be thought of as the period of the vocal cord vibration
that occurs during the production of voiced speech. Therefore, the pitch period is only needed
for the decoding of voiced segments and is not required for unvoiced segments since they are
produced by turbulent air flow not vocal cord vibrations.
It is very computationally intensive to determine the pitch period for a given segment
of speech. There are several different types of algorithms that could be used. One type of
algorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), will
have a maximum when k is equivalent to the pitch period. These algorithms usually detect a
maximum value by checking the autocorrelation value against a threshold value. One
problem with algorithms that use autocorrelation is that the validity of their results is
susceptible to interference as a result of other resonances in the vocal tract. When
interference occurs the algorithm cannot guarantee accurate results. Another problem with
autocorrelation algorithms occurs because voiced speech is not entirely periodic. This means
that the maximum will be lower than it should be for a true periodic signal.
2.4 Applications
LPC is generally used for speech analysis and resynthesis. It is used as a form of
voice compression by phone companies, for example in the GSM standard. It is also
used for secure wireless, where voice must be digitized, encrypted and sent over a
narrow voice channel; an early example of this is the US government's Navajo I.
LPC synthesis can be used to construct vocoders where musical instruments are used
as excitation signal to the time-varying filter estimated from a singer's speech. This is
somewhat popular in electronic music. Paul Lansky made the well-known computer
music piece not just more idle chatter using linear predictive coding. A 10th-order
LPC was used in the popular 1980's Speak & Spell educational toy.
Waveform ROM in some digital sample-based music synthesizers made by Yamaha
Corporation may be compressed using the LPC algorithm.
LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, and other lossless audio
codec’s.
6
2.4.1 Voice effects in music
For musical applications, a source of musical sounds is used as the carrier,
instead of extracting the fundamental frequency. For instance, one could use the sound of a
synthesizer as the input to the filter bank, a technique that became popular in the 1970s.
One of the earliest person who recognized the possibility of Vocoder/Voder
on the electronic music may be Werner Meyer- Eppler, a German physicist/experimental
acoustician/ phoneticist. In 1949, he published a thesis on the electronic music and speech
synthesis from the viewpoint of sound synthesis, and in 1951, he joined to the successful
proposal of establishment of WDR Cologne Studio for Electronic Music.
Siemens Synthesizer (c.1959) at Siemens Studio for Electronic Music was one of the
first attempt to divert vocoder to create music
One of the first attempt to divert vocoder to create music may be a “Siemens
Synthesizer” at Siemens Studio for Electronic Music, developed between 1956-1959.
In 1968, Robert Moog developed one of the first solid-state musical vocoder for
electronic music studio of University at Buffalo. In 1969, Bruce Haack built a
prototype vocoder, named "Farad" after Michael Faraday, and it was featured on his
rock album The Electric Lucifer released in the same year.
In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band
device inspired by the vocoder designs of Homer Dudley. It was originally called a
spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier
signal came from a Moog modular synthesizer, and the modulator from a microphone
input. The output of the 10-band vocoder was fairly intelligible, but relied on
specially articulated speech. Later improved vocoders use a high-pass filter to let
some sibilance through from the microphone; this ruins the device for its original
speech-coding application, but it makes the "talking synthesizer" effect much more
intelligible.
Carlos and Moog's vocoder was featured in several recordings, including the
soundtrack to Stanley Kubrick's A Clockwork Orange in which the vocoder sang the
vocal part of Beethoven's "Ninth Symphony". Also featured in the soundtrack was a
piece called "Time steps," which featured the vocoder in two sections. "Timesteps"
was originally intended as merely an introduction to vocoders for the "timid listener",
7
but Kubrick chose to include the piece on the soundtrack, much to the surprise of
Wendy Carlos.[citation needed]
Kraft werk's Autobahn (1974) was one of the first successful pop/rock albums to
feature vocoder vocals. Another of the early songs to feature a vocoder was "The
Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock
band The Alan Parsons Project; the vocoder also was used on later albums such as I
Robot. Following Alan Parsons' example, vocoders began to appear in pop music in
the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light
Orchestra used the vocoder in several albums such as Time (featuring the Roland VP-
330 Plus MkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talkin' Woman" both
from Out of the Blue (1977) use the vocoder extensively. Featured on the album are
the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI
or II).
2.4.2 speaker-dependent word recognition device
The speaker-dependent word recognition device is implemented using the
Motorola DSP56303. First the speaker will train the device by storing 10 different vowel
sounds into memory. Then the same speaker can repeat one of the ten words associated with
the vowel sound and the device can detect which word was repeated and flag an appropriate
output.
Fig 2.1: Training the Device
8
Vowel
Sound
Microphone
Input
A/D Converter
Calculate LPC coefficients
Store coefficients in memory
Fig 2.2: Word Recognition
2.5 Modern vocoder implementations
Even with the need to record several frequencies, and the additional unvoiced sounds,
the compression of the vocoder system is impressive. Standard speech-recording systems
capture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used in
speech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate).
The sampling resolution is typically at least 12 or more bits per sample resolution (16 is
standard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder can
provide a reasonable good simulation of voice with as little as 2.4 kbit/s of data.
'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks.
G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves
slightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use even
lower data rates, but below 5 kbit/s voice quality begins to drop rapidly.
Several vocoder systems are used in NSA encryption systems:
LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016,
used in STU-III
Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band
encryptors such as the KY-57.
Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the
Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone.
Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32
kbit/s used in STE secure telephone (ADPCM is not a proper vocoder but rather a
9
Vowel Sound Microphone Input A/D Converter
Calculate LPC coefficients
Compare coefficients with the one in memoryoutput
waveform codec. ITU has gathered G.721 along with some other ADPCM codec’s
into G.726.)
Vocoders are also currently used in developing psychophysics, linguistics,
computational neuroscience and cochlear implant research.
Modern vocoders that are used in communication equipment and in voice storage
devices today are based on the following algorithms:
Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s)
Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)
Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s)
Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s)
Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s)
10
CHAPTER 3
BLOCK DIAGRAM
3.1 Block diagram description:
General block diagram (as shown in Fig 3.2) of LPC consists of the blocks
A/D Converter
End Point Detection
Pre-emphasis filter
Frame blocking
Hamming window
Auto-Correlation
Levinson-Durbin algorithm
Fig 3.1: LPC analysis and synthesis of speech
11
Fig 3.2: General Block Diagram
3.2.1 A/D Converter
For Motorola DSP56303, the device converts the analog signals to digital samples by
an ASM file called ‘core302.asm’. The samples are input from CODEC A/D input port as
shown in Fig. 5. The assembly file initializes the necessary peripheral settings for general I/O
purposes. Moreover, the file also contains a macro called wait data. The macro waits for a
sample and takes the sample in. The sampling rate is set to 8000 samples/second.
3.2.2 End Point Detection
In the end point detection, each sample taken from A/D converter is compared to a
volume threshold. If the sample is lower than the threshold, it is considered as background
noise and therefore disregarded. Otherwise, the DSP board, will output 4 bits high to Port B
to indicate readiness to process speech samples, and the next 2000 samples will be stored into
a buffer before processing.
3.2.3 Pre-emphasis filter
The pre-emphasis is a low-order digital filter. The filter has transfer function show in
Equation (3.1).
H(z) = 1 – 0.9375 z-1 (3.1)
The digitized speech signal goes through the filter to average transmission conditions,
noise backgrounds, and signal spectrums. The filter boosts up the high frequency components
of human voice and attenuates the low frequency component of human voice. Because human
voice typically has higher power at low frequencies, the filter renders the speech sample easy
for LPC calculation.
3.2.4 Frame blocking
The pre-emphasized speech samples are divided into 30-ms window frames. Each 30 ms
window frame consists of 240 samples as illustrated in Equation (3.2) and Equation (3.3).
12
A/D Converter Pre-emphasis filter
Hamming
Window
Frame BlockingAuto-Correlation
OutputSSD ComparisonLevinson-Durbin Algorithm
End Point Detection
(Sampling Rate)(Frame Length) = Number of Samples in a Frame (3.2)
(8000 samples/second)(0.030 second) = 240 samples (3.3)
In addition, adjacent window frames are separated by 80 samples (240 x 1/3), with
160 overlapping samples. The amount of separation and overlapping depends on frame
length. The frame length is chosen according to the sampling rate. The higher the sampling
rate, the larger the frame length to be accurate.
3.2.5 Hamming window
The windowing method involves multiplying the ideal impulse response with a
window function to generate a corresponding filter, which tapers the ideal impulse response.
Like the frequency sampling method, the windowing method produces a filter whose
frequency response approximates a desired frequency response. The windowing method,
however, tends to produce better results than the frequency sampling method.
The toolbox provides two functions for window-based filter design, fwind1 and
fwind2. fwind1 designs a two-dimensional filter by using a two-dimensional window that it
creates from one or two one-dimensional windows that you specify. fwind2 designs a two-
dimensional filter by using a specified two-dimensional window directly.
fwind1 supports two different methods for making the two-dimensional windows it uses:
Transforming a single one-dimensional window to create a two-dimensional window
that is nearly circularly symmetric, by using a process similar to rotation
Creating a rectangular, separable window from two one-dimensional windows, by
computing their outer product
The example below uses fwind1 to create an 11-by-11 filter from the desired frequency
response Hd. The example uses the Signal Processing Toolbox hamming function to create a
one-dimensional window, which fwind1 then extends to a two-dimensional window.
Hd = zeros(11,11); Hd(4:8,4:8) = 1;
[f1,f2] = freqspace(11,'meshgrid');
mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))
h = fwind1(Hd,hamming(11));
13
figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2])
Below images shows Desired Two-Dimensional Frequency Response (left) and Actual Two-
Dimensional Frequency Response (right)
Fig 3.4: Two-Dimensional Frequency Response
Creating the Desired Frequency Response Matrix
The filter design functions fsamp2, fwind1, and fwind2 all create filters based on a
desired frequency response magnitude matrix. Frequency response is a mathematical function
describing the gain of a filter in response to different input frequencies.
3.2.6 Auto-Correlation
The Autocorrelation LPC block determines the coefficients of an N-step forward
linear predictor for the time-series in each length-M input channel, u, by minimizing the
prediction error in the least squares sense. A linear predictor is an FIR filter that predicts the
next value in a sequence from the present and past inputs. This technique has applications in
filter design, speech coding, spectral analysis, and system identification.
The Autocorrelation LPC block can output the prediction error for each channel as
polynomial coefficients, reflection coefficients, or both. It can also output the prediction error
power for each channel. The input u can be a scalar, un oriented vector, column vector,
sample-based row vector, or a matrix. Frame-based row vectors are not valid inputs. The
block treats all M-by-N matrix inputs as N channels of length M.
When you select Inherit prediction order from input dimensions, the prediction order,
N, is inherited from the input dimensions. Otherwise, you can use the Prediction order
14
parameter to specify the value of N. Note that N must be a scalar with a value less than the
length of the input channels or the block produces an error.
When Output(s) is set to A, port A is enabled. For each channel, port A outputs an
(N+1)-by-1 column vector, a = [1 a2 a3 ... aN+1]T, containing the coefficients of an Nth-order
moving average (MA) linear process that predicts the next value, ûM+1, in the input time-
series.
When Output(s) is set to K, port K is enabled. For each channel, port K outputs a
length-N column vector whose elements are the prediction error reflection coefficients. When
Output(s) is set to A and K, both port A and K are enabled, and each port outputs its respective
set of prediction coefficients for each channel.
When you select Output prediction error power (P), port P is enabled. The prediction
error power is output at port P as a vector whose length is the number of input channels.
3.2.7 Levinson-Durbin algorithm
Ra = b in the cases where:
R is a Hermitian, positive-definite, Toeplitz matrix.
b is identical to the first column of R shifted by one element and with the opposite
sign.
The input to the block, r = [r(1) r(2) ... r(n+1)], can be a vector or a matrix. If the input is a
matrix, the block treats each column as an independent channel and solves it separately. Each
channel of the input contains lags 0 through n of an autocorrelation sequence, which appear
in the matrix R.
15
The block can output the polynomial coefficients, A, the reflection coefficients, K,
and the prediction error power, P, in various combinations. The Output(s) parameter allows
you to enable the A and K outputs by selecting one of the following settings:
A — For each channel, port A outputs A=[1 a(2) a(3) ... a(n+1)], the solution to the
Levinson-Durbin equation. A has the same dimension as the input. You can also view
the elements of each output channel as the coefficients of an nth-order autoregressive
(AR) process.
K — For each channel, port K outputs K=[k(1) k(2) ... k(n)], which contains n
reflection coefficients and has the same dimension as the input, less one element. A
scalar input channel causes an error when you select K. You can use reflection
coefficients to realize a lattice representation of the AR process described later in this
page.
A and K — The block outputs both representations at their respective ports. A scalar
input channel causes an error when you select A and K.
Select the Output prediction error power (P) check box to output the prediction error
power for each channel, P. For each channel, P represents the power of the output of an FIR
filter with taps A and input autocorrelation described by r, where A represents a prediction
error filter and r is the input to the block. In this case, A is a whitening filter. P has one
element per input channel.
When you select the If the value of lag 0 is zero, A=[1 zeros], K=[zeros], P=0 check
box (default), an input channel whose r(1) element is zero generates a zero-valued output.
When you clear this check box, an input with r(1) = 0 generates NaNs in the output. In
general, an input with r(1) = 0 is invalid because it does not construct a positive-definite
matrix R. Often, however, blocks receive zero-valued inputs at the start of a simulation. The
check box allows you to avoid propagating NaNs during this period.
Applications
One application of the Levinson-Durbin formulation implemented by this block is in
the Yule-Walker AR problem, which concerns modeling an unknown system as an
autoregressive process. You would model such a process as the output of an all-pole IIR filter
with white Gaussian noise input. In the Yule-Walker problem, the use of the signal's
16
autocorrelation sequence to obtain an optimal estimate leads to an Ra = b equation of the type
shown above, which is most efficiently solved by Levinson-Durbin recursion. In this case, the
input to the block represents the autocorrelation sequence, with r(1) being the zero-lag value.
The output at the block's A port then contains the coefficients of the autoregressive process
that optimally models the system. The coefficients are ordered in descending powers of z, and
the AR process is minimum phase. The prediction error, G, defines the gain for the unknown
system, where :
The output at the block's K port contains the corresponding reflection coefficients,
[k(1) k(2) ... k(n)], for the lattice realization of this IIR filter. The Yule-Walker AR Estimator
block implements this autocorrelation-based method for AR model estimation, while the
Yule-Walker Method block extends the method to spectral estimation.
Another common application of the Levinson-Durbin algorithm is in linear predictive
coding, which is concerned with finding the coefficients of a moving average (MA) process
(or FIR filter) that predicts the next value of a signal from the current signal sample and a
finite number of past samples. In this case, the input to the block represents the signal's
autocorrelation sequence, with r(1) being the zero-lag value, and the output at the block's A
port contains the coefficients of the predictive MA process (in descending powers of z).
These coefficients solve the following optimization problem:
Again, the output at the block's K port contains the corresponding reflection
coefficients, [k(1) k(2) ... k(n)], for the lattice realization of this FIR filter. The
17
Autocorrelation LPC block in the Linear Prediction library implements this autocorrelation-
based prediction method.
3.2.8 Sum of Square of Difference comparison
The Sum of Square of Difference comparison is quantitative method to compare two
sets of LPC coefficients. Suppose one set of LPC coefficients in the template are
A’1, A’2, A’3, …., A’10, and another set of LPC coefficients obtained from a window frame
are A1, A2, A3, …., A’10.
SSD = (A’1 – A1)2 + (A’2 – A2)2 + (A’3 – A3)2 + …. + (A’10 – A10)2
Each time the window frame is shifted, SSD is calculated between LPC coefficients
from the window frame and every set of LPC coefficients in template. A minimum SSD
exists between LPC coefficients from a window frame and one set of LPC coefficients in
template. The one with the minimum SSD value is the closest match to the input vowel.
18
CHAPTER 4
SOFTWARE DESCRIPTION
4.1 MATLAB INTRODUCTION:
The name MATLAB stands for MATrix LABoratory. MATLAB was written
originally to provide easy access to matrix software developed by the LINPACK (linear
system package) and EISPACK (Eigen system package) projects. MATLAB is a high-
performance language for technical computing. It integrates computation, visualization, and
programming environment. Furthermore, MATLAB is a modern programming language
environment: it has sophisticated data structures, contains built-in editing and debugging
tools, and supports object-oriented programming. These factors make MATLAB an excellent
tool for teaching and research. MATLAB has many advantages compared to conventional
computer languages (e.g.,C, FORTRAN) for solving technical problems. MATLAB is an
interactive system whose basic data element is an array that does not require dimensioning.
The software package has been commercially available since 1984 and is now considered as a
standard tool at most universities and industries worldwide. It has powerful built-in routines
that enable a very wide variety of computations. It also has easy to use graphics commands
that make the visualization of results immediately available. Special c applications are
collected in packages referred to as toolbox. There are toolboxes for signal processing,
symbolic computation, control theory, simulation, optimization, and several other fields of
applied science and engineering.
4.2 Mathematical functions:
MATLAB offers many predefined mathematical functions for technical computing
which contains a large set of mathematical functions. Typing help elfun and help specfun
calls up full lists of elementary and special functions respectively. There is a long list of
mathematical functions that are built into MATLAB. These functions are called built-ins.
Many standard mathematical functions, such as sin(x), cos(x), tan(x), ex, ln(x), are evaluated
by the functions sin, cos, tan, exp, and log respectively in MATLAB.
4.3 Basic plotting:
MATLAB has an excellent set of graphic tools. Plotting a given data set or the results
of computation is possible with very few commands. We are highly encouraged to plot
19
mathematical functions and results of analysis as often as possible. Trying to understand
mathematical equations with graphics is an enjoyable and very efficient way of learning
mathematics.
4.4 Matrix generation:
Matrices are the basic elements of the MATLAB environment. A matrix is a two-
dimensional array consisting of m rows and n columns. Special cases are column vectors (n =
1) and row vectors (m = 1). MATLAB supports two types of operations, known as matrix
operations and array operations.
MATLAB provides functions that generate elementary matrices. The matrix of zeros, the
matrix of ones, and the identity matrix are returned by the functions zeros, ones, and eye,
respectively.
table 3.1:Elementary matrices
4.5 Programming in Matlab:
4.5.1 M-File scripts:
A script file is an external file that contains a sequence of MATLAB statements. Script
files
have a filename extension .m and are often called M-files. M-files can be scripts that simply
execute a series of MATLAB statements, or they can be functions that can accept arguments
and can produce one or more outputs.
4.5.2 Script side-effects:
All variables created in a script file are added to the workspace. This may have
undesirable effects, because:
Variables already existing in the workspace may be overwritten.
The execution of the script can be affected by the state variables in the workspace.
As a result, because scripts have some undesirable side-effects, it is better to code any
complicated applications using rather function M-file.
20
4.5.3 Input to script Files:
When a script file is executed, the variables that are used in the calculations within the
file must have assigned values. The assignment of a value to a variable can be done in three
ways.
1. The variable is defined in the script file.
2. The variable is defined in the command prompt.
3. The variable is entered when the script is executed.
4.5.4 Output Commands:
MATLAB automatically generates a display when commands are executed. In addition
to this automatic display, MATLAB has several commands that can be used to generate
displays or outputs. Two commands that are frequently used to generate output are: disp and
fprintf.
Table for disp and fprint commands
4.5.5 Saving output to a File:
In addition to displaying output on the screen, the command fprintf can be used for
writing
output to a file. The saved data can subsequently be used by MATLAB or other software’s.
To save the results of some computation to a file in a text format requires the following
steps:
1. Open a file using fopen
2. Write the output using fprintf
3. Close the file using fclose
4.6 Debugging M-Files:
4.6.1 Introduction:
This section introduces general techniques for finding errors in M-files. Debugging is
the process by which you isolate and fix errors in your program or code.
Debugging helps to correct two kind of errors:
Syntax errors - For example omitting a parenthesis or misspelling a function name.
21
Run-time errors - Run-time errors are usually apparent and difficult to track down.
They produce unexpected results.
4.6.2 Debugging process:
We can debug the M-files using the Editor/Debugger as well as using debugging
functions from the Command Window. The debugging process consists of Preparing for
debugging:
MATLAB is relatively easy to learn
Setting breakpoints
Running an M-file with breakpoints
Stepping through an M-file
Examining values
Correcting problems
Ending debugging
4.7 Strengths:
MATLAB may behave as a calculator or as a programming language
MATLAB combine nicely calculation and graphic plotting
MATLAB is interpreted (not compiled ),errors are easy to fix
MATLAB is optimized to be relatively fast when performing matrix operations
4.8 Weaknesses:
MATLAB is not a general purpose programming language such as C, C++, or
FORTRAN.
MATLAB is designed for scientific computing, and is not well suitable for other
applications.
MATLAB is an interpreted language, slower than a compiled language such as C++
MATLAB commands are specific for MATLAB usage. Most of them do not have a
direct equivalent with other programming language commands.
CHAPTER 5
22
RESULTS AND CONCLUSIONS
5.1 Result
The implementation of a LPC vocoder is really an exciting and challenging matter. A
lot of techniques were learned from the literature and practice during this work.
Looking to the complexity of the voiced / unvoiced decision in the LPC-10e DoD
vocoder, it is clear that a good algorithm must have a lot of intelligence and
adaptability in order to get good results. The main problem is the estimation of the
pitch. Secondly, a robust voiced / unvoiced decision is very important.
Fig 5.1: Screen shot of input
23
Fig 5.2: Screen shot of output
It was found that considering the memory of the LPC filter leads to better
results. The median filter was not able to give a smooth pitch contour. Some
techniques like avoiding abrupt changes in the pitch value and avoiding double and
half pitches should be incorporated in order to get better results.
24
REFERENCES
Partha S Malik, MATLAB and SIMULINK 3rd edition
Stephen J Chapman, MATLAB programming for engineers 2nd edition
www.wikipedia.org
www.mathworks.com
25