emg vocalizer - winlabcrose/capstone12/entries/emgvocalizer_fi… · emg vocalizer capstone design:...
TRANSCRIPT
EMG Vocalizer
Capstone Design: Final Report
Matthew Banks, Jennifer Padgett, Sophie Tsalkhelishvi, Kristin Weidmann
5/2/2012
Advisor: Professor Rose
1
Abstract
The initial goal of this project was to design a system that converts subvocal signals into
speech. Subvocal signals are captured by taking EMGs at the throat and mouth since when
people “think out loud” their vocal chords vibrate ever so slightly. With enough low noise
amplification these signals can be observed and processed so that a person’s thoughts can be
interpreted. However, this initial objective was not achievable.
The goal of this project changed to recognition of the motions made when a person
speaks. The movements at the mouth were captured with an EMG, and six words were used as
part of a recognition system. The words, when recognized, were converted back into speech by a
vocal synthesizer.
Figure 1: The Vocalizer System
Introduction
In many situations high noise makes communication over microphones difficult which
can result in vital time spent repeating information. Under certain circumstances, this time could
mean the difference between life or death. The EMG Vocalizer, however, will allow
communication to continue without environmental noise affecting the signal, therefore reducing
the need to repeat information. A microphone system supplemented with the EMG Vocalizer
would allow for seamless communication.
The beauty of this system is that it has the ability to recognize any key words that you
train it with. For tests, the words alpha, omega, left, right, forward, and reverse were used.
Given the ability to program for any six facial movements, the system can be trained with useful
words for a given situation. For example, if one were to use this system for robotic control the
words stop, go, up, down, left, and right may be used to train the system.
2
Since the system relies on movement instead of verbal communication, the system could
also be trained to recognize facial expressions or any bodily gestures with different electrode
placements. The Vocalizer could be used to help mute people communicate with those who do
not know sign language. It could also be used to detect eye movement and facial twitches to aid
in lie detection. The system has many potential applications. However, in this report the
application of more reliable audio communication is explored.
Hardware
Early stages:
Initially, the plan for the project was to use the Emotiv Epoch EEG headset to acquire the
subvocal signals at the neck/jaw. The plan was to redesign the headset for use as an EMG, then
the signal was going to be filtered using analog hardware and then sent to the computer
wirelessly using a USB transmitter/receiver. After receiving the Epoch headset and testing it as
an EEG, it worked well (at the head), but unfortunately the headset was not able to acquire
significant data at the throat. This was because the software that was provided with the headset
was created to use signals coming from specific points on the head. In order to get around this
problem, the drivers for the headset would need to be re-written so that the raw signal data could
be sent directly to MATLAB. This proved to be a much more difficult task than expected, and
would consume a large amount of time. Since it was still unclear how well the Epoch headset
would work as a sensor for EMG signals, we concluded that it would not be wise to risk time
writing drivers for a system which was not guaranteed to work.
Introduction of ECG/EMG:
After ruling out the Emotiv headset, a new plan was created to use an ECG
machine/circuit to gather EMG signals. Since an ECG machine can be expensive and difficult to
acquire, one was made. A basic ECG circuit design was implemented using a similar schematic
to the one from [10]. However, the schematic of the actual implementation can be seen below.
3
Figure 2: Schematic used following similar design to [10]
The circuit was constructed using the following parts:
2 x LM324AN Quad op-amp chips
6 x 10k resistors
12 x 100k resistors
1 x 1uF capacitor
1 x 0.1uF capacitor
6 x diodes
3 x alligator clips
1 x 1MΩ resistor
U1A
LM324AN
3
2
11
4
1
U1B
LM324AN
3
2
11
4
1
U2C
LM324AN
3
2
11
4
1
U2B
LM324AN
3
2
11
4
1
U3D
LM324AN
3
2
11
4
1
U6B
LM324AN
3
2
11
4
1
R1
100kΩ
R2
100kΩ
R3
100kΩ
R4
100kΩ
R5
100kΩ
R6100kΩ
R7
100kΩR8100kΩ
R9100kΩ
R10100kΩ
R1110kΩ
R12
10kΩ
R1310kΩ
R1410kΩ
R15
10kΩ
R16
10kΩ
V15 V
V35 V
V55 V
V85 V
V95 V
V115 V
C1
1µF
C2
0.1µF
Electrode 1
Electrode 2
V25 V
Body
To Audio Jack
R17
1MΩ
4
Figure 3:Constructed circuit following the Schematic shown above (diodes used to protect
circuit)
The parts listed were soldered onto a breadboard, and the alligator clips were used as
electrode leads. Unfortunately, this circuit was very sensitive to noise, and thus not able to
acquire very usable signals. Even a large EMG signal, like from a flexing bicep, was hard to
distinguish from noise. This noise was due to the lengthy wires soldered onto the circuit, along
with op-amps that have a low CMRR (Common-Mode Rejection Ratio) of about 85dB.
This circuit was then reconstructed using a solderless breadboard, so that interchanging
components would be easier without having to re-build the whole circuit. Furthermore, using a
solderless breadboard eliminated most of the lengthy wires, which reduced the noise drastically.
Op-amps with a higher CMRR of about 100dB (LF353Ns) were also used in the new circuit to
reduce noise. Additionally, the diodes were removed to simplify the circuit since there is no
voltage large enough to damage the circuit coming from the body. For this circuit, the DC
supply was recommended to be about and . In the design following [10],
there is an offset circuit set at half of , so in this case about 2.5V. This section of the circuit
made this the new ground reference for the circuit, which makes it easier to supply the circuit
with only 5V, and not requiring . Unfortunately, this generated many restrictions as a
separate DC supply would be required if a subsequent gain stage was to be added to the circuit.
This is because with another gain stage, the new op-amp would have 0V as the ground reference,
5
which caused the signal to hit the supply rail when a gain stage was added. Although the circuit
was able to get a better EMG signal than the first circuit did at the bicep, the signal to noise ratio
at the throat was about 1-to-1, which essentially just picked up noise. Since getting a separate
DC supply just for the gain stages would be excessive and unpractical, along with the fact that
the SNR needed to be improved, several changes to the circuit needed to be made.
A circuit called a right-leg driver was improved on the circuit, and the rest of the circuit
was simplified so that the signal from the two electrodes could be input into a differential
amplifier, with the right-leg driver creating a floating ground, and then the signal would be
amplified. What this essentially left was an instrumentation amplifier circuit, with a right-leg
driver circuit. According to [2], a right-leg driver maintains a known potential on the body, with
reference to the circuit ground. This reduces the common-mode DC offset of the circuit, and
cancels out any deviations on any of the circuit's channels. The schematic of the circuit used can
be seen below which was created using Multisim software.
6
Figure 4: Schematic of EMG Circuit with Right-leg Driver using Multisim
(with external gain resistance to be 44Ω)
This Circuit was created using the following parts:
2 x LF353Ns
1 x INA129P
2 x 10kΩ resistor
1 x 1MΩ resistor
2 x 1nF capacitors
2 x 22Ω resistors
Shielded wire (For the leads of Electrode 1 & 2)
U1
AD620AN
3
2
6
7 1 8
54
U2A
LF353N
3
2
4
8
1
U3B
LF353N
3
2
4
8
1
Electrode 1
Electrode 2
To Audio Jack/ComputerShield
R122Ω
R222Ω
R3
10kΩ
R4100kΩ
R5
10kΩ
R6
1MΩ
V1-15 V
V215 V
V3-15 V
V415 V
V515 V
V6-15 V
C1
1nF
C2
1nF
Body
7
Figure 5: Results of simulation
Looking at the simulation results, it can be seen that the difference between the two
inputs of (Electrode 1) and (Electrode 2) was about 0.3mV, the output of the circuit is around
1000X that input with an amplitude of 336.1mV peak-peak.
8
Figure 6: The Constructed Circuit of the Schematic in Fig. 4
(Additional 0.1µF Capacitors used to reduce supply noise)
The actual instrumentation amplifier used was an INA129P, and the right-leg driver
circuit was implemented using the LF353Ns. In the schematic above, an AD620AN
Instrumentation amplifier is used for simulation, but it has very similar qualities (almost
identical) to the INA129P's. The INA129P was used for its incredible CMRR of 120dB, and a
potential gain of 10,000, which can be adjusted using a single external resistor (or in the
current circuit design, the sum of two resistors in series). Furthermore, 1nF capacitors were
added to the feedback loop of the op-amp whose output goes to the body in the right-leg driver,
along with one 1MΩ resistor at the output. The 1MΩ resistor is in parallel with one of the 1nF
capacitors at the output, and the output of this goes to the body, and into a negative feedback
with the other 1nF capacitor. This Resistance at the output of the op-amp is to prevent
oscillation of the signal.
With the new design, the supply can be up to which means that hitting the supply
rail will no longer be a problem. Furthermore, with a potential gain of 10,000, subsequent gain
stages may not even be necessary, reducing the number of components in the circuit. Another
significant change made to the circuit was the use of shielded wire for the lead wires, which
reduces most of the noise being picked up from the wires themselves. With these changes, the
gain can be selected using the following equation given from the data sheet for the INA129 [8]:
9
where is the external resistor chosen. In this case, was first chosen to be around 50Ω, so
that the gain would be around 1000. Using the resistors that were available, two 22Ω resistors
were used together for 44Ω. This gave a gain of about 1123, which is even better. This was not
quite enough gain to get a noticeable subvocal signal though, at least without filtering out the
noise. It did, however, pick up the EMG signal, when a word is mouthed. The waveform below
is an example of the word "alpha" being mouthed.
Figure 7: Example of EMG signal at throat generated by mouthing the word alpha
The raw data waveform received from just thinking about the word “alpha,” showed that
the current hardware is not quite sensitive enough to pick up a signal of that small of magnitude.
It is possible that the signal can be picked up, but the SNR is currently not high enough to see it,
and while maintaining a reasonable budget and within the time restraints of this project, the
equipment and resources required to attain such a signal would not be possible to obtain. This
resulted in a change in the objective for the project. The new goal was to take the EMG signal
from saying, or mouthing a word, amplifying it, while maintaining a high SNR, sending it to the
computer to be processed, and then using a vocal synthesizer to output the same word.
10
With this new goal, the current circuit at the time was adequate for completing the
objective, but it needed improvements. The current interface from the circuit to the computer
was to output the signal to an oscilloscope, and then to the computer. This is very impractical
and the plans were to make it a wireless interface. Filters needed to be added to the circuit to
notch out 60Hz, in order to remove the noise that resides on the body's surface, and to attenuate
signals above 100Hz to eliminate unwanted higher frequencies from the signal, and to prevent
aliasing. It was also planned for the board to be condensed and then printed out as a PCB
(Printed Circuit Board). Finally, the DC supply may be changed over to batteries in order to
reduce supply noise, and to increase portability of the circuit.
Analog Filters:
Two filters are being used to reduce the noise, and to prevent aliasing in the signal. A
Twin-T notch filter is being used to notch out the 60Hz noise that is present on the body, and a
5th-Order Butterworth Low-Pass Filter is being used to attenuate signals above 100Hz. These
filters were necessary to increase the SNR of the circuit, before it is sent into the oscilloscope,
and then to the computer. If the interface changes completely to a wireless hardware/computer
interface, then these filters will no longer be needed as the noise will be placed back onto the
signal during transmission, and will have to be filtered out using digital filters. Currently, the
filters have been created and are being used as the wireless interface is not complete.
Twin-T Notch Filter:
The Twin-T notch filter was chosen because it is capable of very high attenuation of a
signal, and a very small notch width, making it much more accurate for elimination of a small
range of frequencies than a normal notch filter. To construct this filter, I used the method
described in [6], where the transfer function for the twin-t notch filter is described as follows
( )
Where
( )
Choosing C=0.1µF, → rad/s, and a notch band of about 20Hz→
rad/s
11
R=26.525KΩ, and using commercial resistor values R=27KΩ
R/2=13KΩ (Using commercial resistor values)
, so and ( )
Below is a schematic of the notch filter with the appropriate component values.
Figure 8: Schematic of Twin-T Notch Filter
In addition to the passive components (resistors and capacitors), two LM741 Op-Amps were
used to make this filter. These were chosen because they are cheap and readily available in the
lab along with the fact that for the purpose of making filters, the LM741will be an acceptable
Op-amp as they can work using a supply of 15V, while power consumption is only in about
50-85mW.
This was then tested and simulated using the following Matlab code:
%60Hz Twin-T notch filter simulation
12
w0=370.37; %rad/s 58.9Hz
B=125.66; %rad/s 19.99Hz
Hs=tf([1 0 w0^2],[1 B w0^2]);
bode(Hs)
Which yielded the following bode plot
Figure 9: Bode Plot of 60Hz Twin-T Notch Filter
-300
-250
-200
-150
-100
-50
0
Magnitude (
dB
)
System: Hs
Frequency (rad/sec): 370
Magnitude (dB): -285
101
102
103
104
-90
-45
0
45
90
Phase (
deg)
Bode Diagram
Frequency (rad/sec)
13
Looking at the simulation, it can be seen that the signal at 370 rad/s or 58.8Hz, has a
magnitude of -285dB, which means that that frequency is attenuated. Furthermore, the notching
starts to occur at about 304 rad/s and stops notching out the signal at about 452 rad/s, which is
about a 23Hz notch band. Therefore, this design satisfies both of the specifications needed for
the notch filter.
5th Order Butterworth Low-Pass Filter:
The 5th order Butterworth Low-Pass Filter was chosen because a filter with a fairly sharp
slope (small transition band) is ideal as this would reduce noise above 100Hz at a much faster
rate. According to [6]Increasing the order of the filter n times causes the transfer function of the
filter to have n poles, and the final slope of the transition band will be -20n dB/dec.
Unfortunately, this requires more components and therefore takes up more space on the circuit
board, and would increase production costs (especially if this circuit was to be mass produced).
A compromise had to be made, and since filtering had been done digitally at first using a 5th
order Butterworth Low-Pass Filter in Matlab, which worked well for what we were doing, this
was the type of filter that was created.
The method used to create this filter was to take the fifth-order Butterworth Polynomial
from [6] which is
( )( )( )
What this polynomial actually means is that the 5th-order filter will be created using a first-order
filter cascaded with two second-order filters. A prototype of the filter was then created using the
coefficients in this polynomial and setting
=1Ω and
, so .
For the first-order filter and Using
( )
rad/s
14
For the two second-order filters where for one of the second-order filters (say B),
and for the other (say C), so
for filter B:
for filter C:
Scaling:
Now that the prototype is designed, the components can be scaled to make the filter a 100Hz low
pass filter. Since the cutoff frequency will be → rad/s, this means that
rad/s
Since it is ideal to have capacitors small, R was chosen to be 10K, therefore
Following the equation
and using values that were available in the labs these values became
15
Figure 10: Schematic of 5th-Order Butterworth Low-Pass Filter
For this Filter, three LM741 Op-Amps were used in addition to the passive components
described above.
This circuit was then tested and simulated in Matlab using the following code
%Fifth-order Butterworth Low-Pass Filter fc=100Hz
R=10000;
Ca=147*10^(-9);
C1=533*10^(-9);
C2=47*10^(-9);
C3=183*10^(-9);
C4=122*10^(-9);
Hs=tf([1/(R*Ca)],[1 1/(R*Ca)]);
Hs1=tf([1/(R*R*C1*C2)],[1 2/(R*C1) 1/(R*R*C1*C2)]);
Hs2=tf([1/(R*R*C3*C4)],[1 2/(R*C3) 1/(R*R*C3*C4)]);
bode(Hs*Hs1*Hs2)
16
Which yielded the following bode plot
Figure 11: Bode Plot of 5th-Order Butterworth Low-Pass Filter
Looking at the resulting simulation, it can be seen that at about 617 rad/s (98.2Hz) the magnitude
is about-1.5 dB, and at about 6010 rad/s which is about (956.5Hz), the magnitude is about -96.2
dB. This corresponds well to the design specifications of a low-pass filter with a cutoff
frequency of 100Hz, and a slope for a 5th-order filter (-100dB/dec).
-150
-100
-50
0
50
Magnitude (
dB
)
System: untitled1
Frequency (rad/sec): 617
Magnitude (dB): -1.5
System: untitled1
Frequency (rad/sec): 6.01e+003
Magnitude (dB): -96.2
101
102
103
104
-450
-360
-270
-180
-90
0
Phase (
deg)
Bode Diagram
Frequency (rad/sec)
17
These two filters were then constructed using real hardware components, which can be
seen below
Figure 12: Final Implementation of the EMG circuit
Printing the Circuit Board:
Unfortunately, due to time constraints the circuit will not be able to be sent to a board
house and printed, as it requires about 2-3 weeks for this process. The schematic and the board
layout were constructed using Eagle software however, and can be seen below.
18
Figure 13: The schematic of the EMG with the Right-Leg Driver, the 100Hz Butterworth LPF,
and the 60Hz Twin-T Notch Filter
19
Figure 14: Board Layout of the EMG Circuit, RLD, LPF, and Notch Filter
Data and Hardware-Software Interface
Electrodes:
After experimentation and research, it is evident that the type of electrode, it's placement,
and how the skin/surface is prepared can make all of the difference when it comes to getting a
very good signal or a useless one. Initially, pennies were used since copper is a good conductor,
but most pennies are covered with many impurities and a layer of oxide which will reduce the
quality of the signal, and increase the impedance of the skin/electrode interface, which is the
opposite of what is needed. After this, many other types of conductors were tested from cans to
aluminum foil, but none would give a very clean signal. Finally, ECG electrodes were acquired
20
and were found to give much better results. Looking at [1], the proper preparation of the skin
and electrodes have been taken into consideration, as this can make or break your signal quality,
especially at the level of EMG vocal signals. Some of these preparations include washing the
surface area with water, using isotonic gel, and attaching the electrodes 5-10 minutes before
testing. This is imperative because without letting the isotonic gel set in and the electrode
making the best contact, it is easy to not pick up any signal at all. During one experiment, one
group member was hooked up to the circuit, and no signal was being shown on the oscilloscope.
After about eight minutes after the electrodes were attached to the subject, a signal became
visible and the circuit was working as expected.
The most commonly used electrodes in the development of the system were AgCl EKG
electrodes. While these electrodes are very conductive and help acquire clear signals, they do
not stay in place for more than thirty minutes to one hour. To counter electrode slip, medical
tape was used on top of the electrodes, but this did not extend the amount of time the electrodes
were stationary by much. When the electrodes slip the signals acquired change from what they
were when the electrodes were freshly applied. This means that the system needs to be
recalibrated and the recognition system retrained every hour or so, to ensure reliable recognition
and reliable signals. In a commercial implementation of this system the problem could be fixed
by either using sub-coetaneous electrodes or electrodes with better adhesive.
In the initial stages of the project the team had to experiment to find the best electrode
placement on the face. Many different placements were tried. For example, one electrode on the
side of the lips and one on the upper lip was tried. The picture below shows many of the
different pairings and placements that were tried.
Figure 15: Various Electrode Placements
21
The team found the best placement to gather information about the words a person is
speaking was to have one electrode at the chin, just below the lips, and another on the cheek,
about one inch forward from the ear. This produced data about lip, cheek, and jaw movement.
Ground was placed at the base of the throat. This ground placement help eliminate artifacts from
heartbeats and swallowing. The placement can be seen on an individual below:
Figure 16: Electrode placement example.
Hardware/Computer Interface:
Initially the data was imported into Matlab for processing over the audio jack of a laptop.
Even though this approach worked well initially, the team found that the data was noisy and
switched to a different approach. Additionally, the Matlab audio recorder worked differently on
several computers used to collect data. Now data is gathered by connecting the EMG circuit to a
Tektronix oscilloscope so the waveform can be viewed, and interfacing the oscilloscope with
Matlab, using tools from the instrument control toolbox. This approach is less noisy and more
reliable. It also has the advantage of allowing the team to see the data as it is being gathered,
which is great for analytical purposes, but this requires bringing an oscilloscope around wherever
the device is being used, not to mention an outlet must be nearby to plug the oscilloscope in.
The code written to support data gathering was constructed in such a way that allowed
the observers to see the data and selectively capture data on the screen by hitting “enter”. This
made it easy to gather good training data since any noise from coughing or smiling could be seen
on the oscilloscope before the waveform was entered into the training set. The ability to be
selective about data used increased the success of recognition systems later discussed.
22
Currently, a wireless interface is being constructed using an Arduino Uno Rev3, and a
XBee transmitter, along with the shield, which allows the XBee to attach to the Arduino. This
method was chosen because in order to send the signal from the hardware to the computer
wirelessly, the signal must first be converted from an analog signal to a digital signal. The
Arduino has an ADC built in, which will take an analog signal between 0 to 5V and discretizes
the signal. The voltage range from 0 to 5 is quantized into 1024 "bins" that each sample can be
organized into. Unfortunately, if you have a signal that only has a voltage change of millivolts,
this will decrease the resolution. Furthermore, if the signal has any negative components, then
the Arduino will not pick it up. To fix this problem, a DC bias of about 2.5V was added to the
signal, and then removed during processing, and the signal will be amplified a little more as well.
This will ensure that the signal stays primarily within the 0 to 5V range, and will utilize as many
of the 1024 "bins" as possible, increasing the resolution.
In order to create this DC bias, a voltage divider had to be implemented. A second EMG
circuit was created like in Fig.6, since analog filters would not be necessary for this interface,
and the voltage divider was then constructed on the breadboard. Assuming that a 9V battery will
eventually run the hardware, a 10KΩ resistor is placed in series with two 6.8KΩ resistors in
series, and the potential across all three resistors is 9V. The voltage between the 10KΩ and the
6.8KΩ resistor was about 5.186V, which can be used to power the Arduino. The voltage
between the two 6.8KΩ resistors was about 2.593V, which will be the DC offset added to the
output signal of the circuit.
Unfortunately, after spending a day working on just programming the XBee transmitter,
it seems that there was a compatibility problem with the hardware, and the operating system of
the laptop we were using. The program that was used to program the XBee antenna was X-CTU,
and it did not seem to communicate with or recognize the XBee properly using Windows Vista.
With time constraints, it did not seem efficient to spend more time or resources trying to get a
new laptop with bluetooth capability, or to install Windows 7 onto the current laptop, just to
program the antenna. This would not even ensure that the output of the circuit would be
transmitted properly, or have a usable resolution. This problem would take even more time, so
even though the bluetooth wireless interface can be a good way to communicate from the circuit
to the computer if there was more time, this is not the case, and the old interface with the
oscilloscope will suffice for now.
Pre-Processing:
Even though the oscilloscope to computer interface was much better than the initial
audio-jack to computer interface some residual noise was still present. While the circuit had a
60Hz notch filter and a low pass filter, the signal acquired by the computer had 60 Hz and above
noise. The noise from the interface was therefore filtered out in software. Before being used for
word recognition a low pass filter at 25Hz was applied to each signal since it was determined
that the signals contained no useful information for recognition above this frequency.
23
Figure 17: Filtered and Unfiltered “Left”
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2Filtered and Unfiltered Left
Sample
Am
plit
ude in V
olts
Unfiltered
Filtered Above 25Hz
24
Figure 18: Filtered and Unfiltered “Right”
Recognition Algorithms
Word Recognition:
Initially, a simple distance based decision rule was used to detect words. Training sets
consisted of ten examples of each word. The ten examples were each centered based on peak
energy and normalized, then the ten samples were averaged together to create an approximation
of an ideal word. The six “ideal” words were then used to create six orthonormal basis
functions, using the Gram-Schmidt orthonormalization procedure. The original “ideal” signals
were then projected onto the basis, and those projections were used as ideal points in the signal
space with which to compare new signals. This approach, however, only yielded about 50%
recognition if the signals were very distinct.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2Filtered and Unfiltered Right
Sample
Am
plit
ude in V
olts
Unfiltered
Filtered Above 25Hz
25
Figure19: Example of “ideal” signals
Figure 20: Example of basis functions generated by Gram-Schmidt, corresponding to
figure 19
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04
0.06Ideal Alpha
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04Ideal Omega
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04
0.06Ideal Left
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04Ideal Right
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04
0.06Ideal Forward
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04Ideal Reverse
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04
0.06Basis 1
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 2
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.06
-0.04
-0.02
0
0.02Basis 3
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04
0.06Basis 4
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 5
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04Basis 6
Sample
Norm
aliz
ed A
mplit
ude
26
Figure 21: Another example of “ideal” signals
Figure 22: Example of basis functions generated by Gram-Schmidt, corresponding to fig
21
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Ideal Alpha
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Ideal Omega
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04
0.06Ideal Left
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.06
-0.04
-0.02
0
0.02Ideal Right
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04
0.06
0.08Ideal Forward
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.02
0
0.02
0.04Ideal Reverse
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 1
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 2
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 3
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 4
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04
0.06Basis 5
Sample
Norm
aliz
ed A
mplit
ude
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-0.04
-0.02
0
0.02
0.04Basis 6
Sample
Norm
aliz
ed A
mplit
ude
27
Next, detection based on correlation with the “ideal” signals was tested. The parameter
used was the value at zero lag (where the signals most match). Using this approach better
recognition was achieved. Then the correlation and distance decision rules were combined, and
this approach yielded up to eighty-five percent accuracy in experiments. It should be noted that
the system is only so accurate for about thirty minutes after electrodes are initially applied to the
skin, since after that amount of time the electrodes start to move on the face and the signals
acquired are different than the signals from the original training set.
In addition to comparing new waveforms with the “ideal” waveforms acquired in the
training set a detection scheme in which the incoming waveform was compared to each
waveform in the training set was tried. The incoming signal was correlated with each of the
known 60 signals. Whichever set of signals had the highest aggregate correlation was
considered to be the set to which the incoming waveform belonged. While it was initially
thought that this approach might perform better than comparison with the six “ideal” signals, it
was surprisingly not an improvement. In fact, the number of correct assignments decreased
when compared to the number of correct assignments when the ideal signals were used for
comparison.
The initial use of the correlation and distance based decision rules was not as successful
as the later uses. The initial success rate was only about fifty to sixty percent. Therefore,
alternative approaches were explored, and a neural network was created to recognize the
waveforms.
A feed forward network with back propagation was created to recognize waveforms.
Even though only one hidden layer is needed to approximate any signal according to the
universal approximation theorem, the initial networks had several hidden layers and several
nodes in each layer. Many of these networks had anywhere between two and five hidden layers
with ten to forty nodes in each layer. The initial biases were set to zero and the weights to one.
The transfer function used for all nodes was a hyperbolic tangent sigmoid. The network with the
best accuracy turned out to be one with forty-two nodes in its only hidden layer. That network
achieved up to 70% accuracy. However, the networks had the same problem with electrode slip
as did the distance and correlation based detection.
A back propagation algorithm was used to adjust the weights of the network during
training to achieve the desired network output. While experimenting with the networks, data sets
from the same individual taken in one sitting, data sets from the same individual taken during
multiple sittings, and data sets from other individuals were used to train the network. When
training with sets from multiple individuals the network could not reach its mean squared error
performance goals (which were loosely set). Therefore, later networks were trained using only
data from one individual. However, this had the drawback of creating the need to personalize
networks and train for each use.
28
The features fed into the network took some time to develop. First, averaged frequencies
over time were used as features for the network. However, it was quickly determined that
frequency didn’t make much sense as a feature since the signals being detected are at most 25Hz.
Next, the average energy in an 80ms block was used as a feature. This was dramatically more
effective than the frequency features, but only yielded a result of up to fifty percent recognition.
The next features tried were linear predictive coding coefficients. These were added to the
feature vector in addition to the energy features. This did not improve the recognition
noticeably. Polynomial fit coefficients were also tried as features, but the coefficients for each
signal were so similar that it was not a good feature to use.
Figure 23 :Examples of Energy Feature Vectors
One of the main challenges when working with the neural network was determining how
strict the training goal should be. If too small an error were the goal then the network would not
be able to generalize if presented with different data. If, however, the error goal was too large
then the network would not perform well, even for known data. The best results were obtained
using a mean square error goal of 0.1 for the normalized signals.
An interesting exercise was done in which a person was asked to group signals by how
alike they were. The subject was given thirty waveforms (five instances of each word) and told
that there were six words but was not told how many were in each group. The same experiment
was then run through the neural network. The outcome of the experiment is shown below:
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Alpha Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Omega Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Left Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Right Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Forward Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
1Reverse Engergy Feature Vector
Vector Element
Norm
aliz
ed E
nerg
y
29
Figure 24: Outcome of human and computer grouping experiment
The experiment suggests two things: that the signals are not clearly distinguishable to
humans and computer and the feature vectors being used are still not a good match for the
system. Therefore, after observing the later success of the correlation and decision approach the
features used for the neural network were correlations with the ideal waveforms. Surprisingly,
adding the distances in signal space as features degraded the performance of the network to
about 60%. The scheme with only correlation values at zero lag produced 70% correct
recognitions, a great improvement over the earlier features used with the network.
Recognition Scheme Best Detection Achieved
Combined Distance and Correlation 85%
Distance 70%
Correlation 80%
Neural Network with Frequency Features 30%
Neural Network with Energy Features 50%
Neural Network with Energy and LPC
Coefficient
Features
50%
Neural Network with Correlation Features 70%
Neural Network with Correlation and Distance
Features 60%
The main outcome of all recognition testing is that so far, simple correlation and signal
space distances work best for this system. While neural networks do better than randomly
30
guessing six words, the correlation and distance still work better. However, for a large number
of signals the distance and correlation based technique takes a long time to compute, so given
many more words for the system to recognize and neural network might be the best choice for
word recognition since computing the output is much quicker.
Text-to-Speech Voice Synthesizer
A voice synthesizer is the artificial production of the human speech. We are
implementing a voice synthesizer in software and translating it from text to speech. The goal of
the text-to-speech voice synthesis is to convert text into an acoustic signal that is
indistinguishable from the human speech. It will transmit the information from machine to a
human speech. The process of the voice synthesizer is first to convert the text that contains
words into equivalent words. Once that is accomplished we then convert the text into a sound.
The system diagram of the text to speech voice synthesizer looks like following:
Figure 25: Diagram of Text to Speech Voice Synthesizer System [11]
There are many different ways to implement a voice synthesizes. There is Concatenative
synthesis, format synthesis, articulatory synthesis, hmm-based synthesis, and sinewave synthesis.
Concatenative synthesis is based on the stringing together of segments of recorded speech,
usually a short sequence of phonemes. Generally, concatenative synthesis produces the most
31
natural sounding synthesized speech. It is most high quality voice synthesizers today. However,
differences between natural variations in speech and the nature of the automated techniques for
segmenting the waveforms sometimes result in audible glitches in the output.
Articulatory synthesis refers to computational techniques for synthesizing speech based
on models of he human vocal tract and the articulation processes occurring there. HMM-based
synthesis is a synthesis method based on Hidden Markov Models, also called Statistical
Parametric Synthesis. In this system, the frequency spectrum (vocal tract), fundamental
frequency (vocal source), and duration (prosody) of speech are modeled simultaneously by
HMMs. Speech waveforms are generated from HMMs themselves base on the maximum
likelihood criterion.
Sinewave synthesis is a technique for synthesizing speech by replacing the formants
(main bands o energy with a pure tone whistles. Sinewave synthesis does not use human speech
samples at runtime. Instead, the synthesized speech output is created using additive synthesis and
an acoustic model. Parameters such as fundamental frequency, voicing, and noise levels are
varied over time o create a waveform of an artificial speech.
There are three main sub types of the concatenative synthesis: Unit selection synthesis,
Diphone synthesis, and Domain-specific synthesis. Unit selection synthesis requires enormous
databases to build models of the sounds of the speech that can readily be concatenated into a
decent sounding utterance with few flaws at the boundaries between speech sounds. Recorded
utterance is divided into individual phones, diphones, syllables, words, and/or sentences. It uses
visual illustrations like waveforms. Databases are created based on division of the segments and
parameters like pitch, time and position of the phonemes.
Diphone synthesis does not use enormous databases. It uses the minimum speech
database that is contained with all the sound-to-sound transitions (diphones). Diphone synthesis
undergoes the sonic glitches of the concatenative synthesis. Domain-specific synthesis
concatenated already pre-recorded sounds to create a complete utterance.
For our project we are using concatenative synthesis – unit selection synthesis. We stored
large amount of data into a database. Synthetic structure of the sentence that is spoken has been
user. The group recorded the sounds of the each phoneme of the word and saved it as an audio
file format. Then we collected a speech waveform signals and concatenated individual segments
to construct a new utterance.
We are synthesizing six words: alpha, omega, left, right, forward and reverse. We are
recording word by creating an audio recorder object with a sampling rate of a 44100Hz, 16-bit,
and 2-channel. Time sample of the speech with a microphone is 1 to 2000 seconds and the
formulation of the microphone signal data is SIN((2PI*500t)/FS). Each phoneme is recorded
separately and saved into a database. Then we plot the waveforms of each segment, crop them
into a desired frequency ranges and plot the new waveforms. Cropped sounds are getting
32
concatenated and written in one single sound. The following couple of paragraphs will contain
all the six words with the waveform plots, cropped plots and the original concatenated sound
plots.
Alpha is split into four phonemes: ae/l/f/ah. Each phoneme is recorded separately and the
waveforms are plotted:
Figure 26: Alpha (ae-l-f-ah) waveform.
After this we crop each phoneme waveform into desired frequency ranges:
33
Figure 27: Cropped Alpha (ae-l-f-ah) waveform.
The concatenated plot of the Alpha looks following:
Figure 28: Concatenated Alpha (ae-l-f-ah) waveform.
34
Omega is split into five phonemes: ow/m/eh/g/ah. Each phoneme is recorded separately
and the waveforms are plotted:
Figure 29: Omega (ow-m-eh-g-ah) waveform.
Omega waveform after cropping each phoneme waveform into desired frequency ranges:
Figure 30: Cropped Omega (ow-m-eh-g-ah) waveform.
The concatenated plot of the Omega:
35
Figure 31: Concatenated Omega (ow-m-eh-g-ah) waveform.
Left is split into four phonemes: l/eh/f/t. Each phoneme is recorded separately and the
waveforms are plotted:
Figure 32: Left (l-eh-f-t) waveform.
36
Left waveform after cropping each phoneme waveform into desired frequency ranges:
Figure 33: Cropped Left (l-eh-f-t) waveform.
The concatenated plot of the Left:
Figure 34: Concatenated Left (l-eh-f-t) waveform.
37
Right is split into three phonemes: r/ay/t. Each phoneme is recorded separately and the
waveforms are plotted:
Figure 35: Right (r-ay-t) waveform.
Right waveform after cropping each phoneme waveform into desired frequency ranges:
Figure 36: Cropped Right (r-ay-t) waveform.
38
The concatenated plot of the Right:
Figure 37: Concatenated Right (r-ay-t) waveform.
Forward is split into six phonemes: f/uh/r/w/axr/d. Each phoneme is recorded separately and the
waveforms are plotted:
Figure 38: Forward (f-uh-r-w-axr-d) waveform.
39
Forward waveform after cropping each phoneme waveform into desired frequency
ranges:
Figure 39: Cropped Forward (f-uh-r-w-axr-d) waveform.
The concatenated plot of the Forward:
40
Figure 40: Concatenated Forward (f-uh-r-w-axr-d) waveform.
Reverse is split into five phonemes: r/iy/v/axr/s. Each phoneme is recorded separately and the
waveforms are plotted:
Figure 41: Reverse (r-iy-v-axr-s) waveform.
Reverse waveform after cropping each phoneme waveform into desired frequency ranges:
41
Figure 42: Cropped Reverse (r-iy-v-axr-s) waveform.
The concatenated plot of the Reverse:
Figure 43: Concatenated Reverse (r-iy-v-axr-s) waveform.
Although the concatenation process is very straightforward, large databases may require
complex search algorithms and the signal processing could be modified to achieve desired
42
speaker characteristics. The final speech sounds natural and more recognizable after
concatenative synthesis. Some speech synthesis produce continues speech is we select waveform
segments from databases by a large number of segments but we usually do not record them but
they are generic speeches. To concatenate large database segments may give us a very good
quality speech but those techniques are costly in terms of data collection, organization, and
storing into the memory.
Filtering and Smoothing the Speech Sound:
Filtering and smoothing a pre-recorded sound is a very important development in a signal
processing. The main idea of filtering is to average a large window of points and calculate the
least squares fit. In speech synthesis signal processing is used to smooth the existing waveform
from errors. Sometimes linear interpolation in the frequency domain does not give us a good
output and we need to seek other algorithms that provides natural transitions. Spectral smoothing
helps to modify existing audio frames and interpolation helps to add more frames as needed.
When no spectral smoothing is used for the audio files the sound will sound unnatural.
Therefore to eliminate such an obstacle we are using Sovitzky-Golay smoothing filter. Savitzky-
Golay smoothing filter is digital polynomial smoothing filter and they are the most frequently
used digital smoothing filters in spectrometry.
The Savitzky-Golay filter SG(N,n) is linear and/or shift variant and acts on a vector of
input samples x(k) to produce the smoothed vector of y(k). When the window N=2M+1 is taken
with the samples of x(-M), …, x(M) the best least squares fit by the polynomial vector of p(-M),
…, p(M) of the even degree of n. Same output will be applied if the window is shifted by k with
the samples of x(k-M), …, x(k+M). The filter output of y(k) is the center of the least squares fit
to the 2M+1 samples with the degree of n. The Savitzky-Golay filter value formula is,
Figure 44: Savitzky-Golay Smoothing Filter Formula [8]
where nL and nR are the number of samples to the left and to the right respectively.
When applied the Savitzky-Golay smoothing filter to our already concatenated sound
waveforms they looked as following:
43
Figure
45: Filtered Alpha using The Savitzky-Golay Filter.
Figure 46: Filtered Omega using The Savitzky-Golay Filter.
44
Figure 47: Filtered Left using The Savitzky-Golay Filter.
Figure 48: Filtered Right using The Savitzky-Golay Filter.
45
Figure 49: Filtered Forward using The Savitzky-Golay Filter.
Figure 50: Filtered Reverse using The Savitzky-Golay Filter.
46
Spectral smoothing is the most common that people used for the speech and/or audio
coding. From studies spectral smoothing performs at its best then original spectra are similar to
each other like concatenative synthesis with large databases.
Spectral interpolation or waveform interpolation is a technique that helps to shape pitch-
period waveforms. It operates frame-by-frame basis and in each segment pitch segment and/or
waveforms are interpreted. Waveform interpolation is extracted from the original sound or signal
at some time interval. In order to produce interpolated waveforms pitch period and signal has to
be interpolated in either time domain or frequency domain. It provides smoother and nicer results
for a large number of interpolated pitch periods and calculates smoothed speech frames.
The following graphs represent the interpolated waveforms of all the six words that we
have been using for the project:
Figure 51: Waveform Interpolation of Alpha.
47
Figure 52: Waveform Interpolation of Omega.
Figure 53: Waveform Interpolation of Left
48
Figure 54: Waveform Interpolation of Right.
Figure 55: Waveform Interpolation of Forward.
49
Figure 56: Waveform Interpolation of Reverse.
After applying spectral smoothing the speech is smoother and has more natural sound. It
definitely improved the concatenative speech synthesis sounding. When smoothing is done and
all the filtered sounds are recorded and saved we produce speech from the words. We simply
input the word by text we would like to speak and our voice synthesizer gives us a speech sound.
Conclusion
After several demos, it can be concluded that the EMG Vocalizer is successful in taking
the EMG signal of a spoken, or mouthed word, classifying it as one of the six trained words, and
then outputting a synthesized version of that word. This can be done with 85% accuracy, but
after about a half hour to an hour, the electrodes start to move out of place and the signals
acquired differ from the signals used to train the system significantly. This movement is due to
lack of good adhesive material on the electrodes and the weight of the alligator clips that were
50
attached. As a result, the system had to be re-trained every hour or so. The EMG vocalizer
could perform better given electrodes that do not slip over time.
Other than the unreliability of the electrodes, all other aspects of the project worked
properly. The EMG circuit that was constructed successfully picked up the EMG signals when a
word is mouthed or spoken. This signal was amplified and filtered to attain a high signal-to-
noise ratio using analog circuitry. The signal was then sent to the computer and processed in
Matlab where a combination of distance and correlation based recognition schemes were used to
classify the incoming signal as one of the 6 keywords that were trained into the system. The
output of this is a text word that is input into the text-to-speech voice synthesizer. Using the unit
selection synthesis type of concatenative synthesis, the text string was able to be translated into
an audible representation of the word. Therefore the EMG Vocalizer is successful in its
objective to turn an EMG signal from a spoken or mouthed word, deciding what the word is, and
then outputting the word as an audio signal that was not disrupted by environmental noise.
51
References
[1] Biopac Systems, Inc. EDA (GSR) Subject Preparation.
< http://www.biopac.com/eda-gsr-subject-preparation>
[2] Driessen, Peter F. The Experimental Portable EEG/EMG Amplifier. University of Victoria.
August 1, 2003. <http://www.ece.uvic.ca/~elec499/2003a/group11/thereport.pdf>
[3] Haykin, Simon S., Communication Systems. Wiley, 2009.
[4] Haykin, Simon S., Neural Networks. Michigan: Prentice Hall, 1999.
[5] Jonsson, Fredrik. "Jonsson.eu." Web Site of Fredrik Jonsson. 4 Dec. 2011. Web. Apr. 2012.
<http://www.jonsson.eu/>.
[6] Nilsson, James W., Riedel, Susan A. Electric Circuits. 8th ed. Upper Saddle River, New
Jersey. 2008 Pearson Educaton, Inc. (pp.606-640)
[7] Persson, Per-Olof, and Gilbert Strang. "Smoothing by Savitzky-Golay and Legendre Filters."
Web
[8] Precision, Low Power INSTRUMENTATION AMPLIFIERS. Burr Brown Corporation.
1995.
<http://pdf1.alldatasheet.com/datasheet-pdf/view/56692/BURR-BROWN/INA129.html>
[9] Rabiner, Lawrence R., and Stephen E. Levinson. "Isolated and Connected Word Recognition-
Theory and Selected Applications." IEEE Transactions on Communications. 5th ed. Vol.
COM-29. 1981. 621-50.
[10] The Story of My ECG,< http://www.eng.utah.edu/~jnguyen/ecg/long_story_3.html>
[11] "Text-to-speech." - File Exchange. Web. 19 Mar. 2012.
<http://www.mathworks.com/matlabcentral/fileexchange/18091-text-to-speech>.
[12] Utama, Robert J., Ann K. Syrdal, and Alistair Conkie. "Six Approaches to Limited Domain
Concatenative Speech Synthesis." 17 Sept. 2006. Web. Mar. 2012.
52
[13] (W3C). Web. 19 Mar. 2012. "Speech Synthesis Markup Language (SSML) Version 1.0."
World Wide Web Consortium. <http://www.w3.org/TR/speech-synthesis/>.
53
Appendix A- Costs
Item Cost
Arduino Uno $23.00
SainSmart Xbee Shield Module for Arduino UNO MEGA Duemilanove $14.95
Xbee 1mW antenna $66.90
Burr-Brown INA129P $7.70
Total $112.55
54
Appendix B- Code for Word Recognition
function result=bigcompare(wave, M);
% should have ten instances of the trainwaves
% compares an input to each and every one of the waves in the training set
%....60 comparisons, test to see if this does better than the "ideal"
%signal
[r c d] = size(M);
W1=zeros(c,d,r);
for d=1:r
for k=1:6
W1(:,k,d)= M(d,:,k);
end
end
cvec=zeros(10, 6);
for k=1:10
cvec(k,:)=corrs(wave', W1(:,:,k)');
end
c=mean(cvec);
[val ind]=max(c);
resvec=zeros(6,1);
for r=1:length(ind);
resvec(ind(r))=resvec(ind(r))+1;
end
[val result]=max(resvec);
55
%this function will output "centered" waveforms for the matched filter
%this installment takes the max energy point as the center
%then circularly shifts the waveforms :)
function newform=centerwave(signal)
N=length(signal);
center=floor(N/2);
S=abs(signal).^2;
[val ind]=max(S); %take max energy as center
diff=ind-center;
if diff>0
newform=modshift(signal, 'left', diff);
elseif diff<0
newform=modshift(signal,'right',diff);
end
if diff==0
newform=signal;
end
56
% this creates the groupings of waves to compare against the human test
load('KWeidmann.mat');
load('KWeidmannnet.mat');
[r c d]=size(M);
confuse=zeros(6,6);
for k=1:d
for t=1:r/2
s=sim(netn,createfv(M(t,:,k),'norm'));
[val ind]=max(s);
confuse(k,ind)=confuse(k,ind)+1;
end
end
57
function c=condense(waveform);
% takes a length 10000 waveform (provided by osc) and condenses to length 50
% vector of mean energy values in an 80ms window
numrun=10000/50;
c=zeros(1,50);
waveform=waveform-mean(waveform);
for f=1:50
c(f)=mean(waveform((numrun*(f-1)+1):numrun*f).^2);
end
c=normalize(c);
58
%signal is the input signal, and M is a matrix of ideal vectors
%the rows of M are signals and columns are points
%all signals should be normalized
%For the correlation comparison
function cor=corrs(signal,M)
[r c]=size(M);
cor=zeros(1,r);
for k=1:r
cor(k)=max(xcorr(signal,M(k,:),'coeff'));
end
59
function [Basis ,Ideal, Ideal_projections]=createbasis(M);
%M is a row by col by depth where depth is the number of basis to create
%and row is the number of training sets for each utterance
[row col depth]=size(M);
Ideal=zeros(depth,col);
Basis=Ideal;
for t=1:depth
for k=1:row
M(k,:,t)=normalize(M(k,:,t));
M(k,:,t)=centerwave(M(k,:,t));
end
Ideal(t,:)=normalize(mean(M(:,:,t))); %create the ideal waveforms
end
Basis=gs(Ideal); %compute the basis functions using Gram-Schmidt
Ideal_projections=projections(Ideal,Basis); %get the idea projections onto the basis function
60
%This is a function to create feature vectors that are part energy and part
%frequency information about a signal
function fv=createfv(sig,strng)
[num den]=butter(4, 100/2500,'low'); %remove the high frequency jitters
sig=filter(num, den, sig);
a1= normalize(abs(tess(sig)));
a2= condense(sig);
fv=[a1'; a2'];
s=strcmp('norm',strng);
if s==1;
fv=fv/max(fv);
end
61
%This is a function to create feature vectors that are part energy and part
%frequency information about a signal This version also includes linear
%predictive coding as a feature
function fv=createfv2(sig)
[num den]=butter(4, 25/2500,'low'); %remove the high frequency jitters
sig=filter(num, den, sig);
a1= (abs(tess(sig)));
a1=a1/max(abs(a1));
a2= condense(sig);
a2=a2/max(abs(a2));
a3= lpc(sig,10);
fv=[a1'; a2'; a3'];
62
function c= createfv3(sig, waveforms, basis, idp);
%creates a feature vector of correlation values and (perhaps) points in
%signal space
%The signal, in this case is compared to the ideal waveforms
sig=sig';
c=corrs(sig,waveforms);
%
p=projections(sig,basis);
d=distance(idp,p);
c=[c d];
%
63
% Create a VISA-USB object.
function [Basis Waveforms IdP]=data_gather(name,deviceObj);
% Connect device object to hardware.
connect(deviceObj);
row=10;
col=10000;
depth=6;
M=zeros(row,col,depth);
% Execute device object function(s).
r=input('Press to Start Training Alpha');
A=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[A(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,1)=(A)';
size(M)
r=input('Press to Start Training Omega');
O=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[O(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,2)=(O)';
64
r=input('Press to Start Training Left');
L=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[L(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,3)=(L)';
r=input('Press to Start Training Right');
R=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[R(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,4)=(R)';
r=input('Press to Start Training Forward');
F=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[F(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,5)=(F)';
r=input('Press to Start Training Reverse');
65
Rev=zeros(col,row);
for k=1:row
r=input('Press Enter');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[Rev(:,k),X] = invoke(groupObj, 'readwaveform', 'channel1');
end
M(:,:,6)=(Rev)';
name=strcat(name,'.mat')
[Basis Waveforms IdP]=createbasis(M);
save(name,'Basis','Waveforms', 'IdP');
66
function [wave,x]=data_single();
deviceObj=start();
% Execute device object function(s).
wave=zeros(10000);
r=input('Press Any Key');
groupObj = get(deviceObj, 'Waveform');
groupObj = groupObj(1);
[wave,x] = invoke(groupObj, 'readwaveform', 'channel1');
67
%simple distance based decision rule :) with correllation added on
function [d ind]=decide(distance, cor)
decvect=.3*(1-distance) + .7*cor;
decvect=cor;
[val ind]=max(decvect);
switch (ind)
case 1
d='alpha';
case 2
d='omega';
case 3
d='left';
case 4
d='right';
case 5
d='forward';
case 6
d='reverse';
end
%%and so on :)
68
%demo for Professor Rose on 4/24 @ 10 AM
%load('netinuse.mat');
k=1;
load('KWeidmannnabs.mat');
[num den]=butter(4, 25/2500,'low'); %remove the high frequency jitters
for d=1:6
for r=1:10
M(r,:,d)= filter(num, den, M(r,:,d));
end
end
Mtest=M;
[Basis Waveforms IdP]=createbasis(M);
[num den]=butter(4, 25/2500,'low'); %remove the high frequency jitters
while k~=0
k=input('To end, press 0');
[wave,x]=data_single();
%
d=createfv(wave,'norm');
res=sim(netn,d);
[y ind]=max(res);
%
wave=filter(num, den, wave);
wave=normalize(wave);
wave=centerwave(wave);
% begin reco based on distance and correlation
p=projections(wave,Basis);
d=distance(IdP,p);
c=corrs(wave,Waveforms);
[dec ind]=decide(d,c);
word(ind);
switch (ind)
case 1
d='alpha';
case 2
d='omega';
case 3
69
d='left';
case 4
d='right';
case 5
d='forward';
case 6
d='reverse';
end
d
%%%Kristin and Sophie's Code goes here!!!!
end
70
%computes the distance of one signal from all ideal comparisons
function d=distance(ideal_projections, projections)
[r c]=size(ideal_projections);
for k= 1:r
d(k)= sqrt(sum((ideal_projections(k,:)-projections).^2));
end
71
%this function does gram schmidt orthanormalization
function basis=gs(signals)
[r c]=size(signals);
basis=zeros(r,c);
basis(1,:)=normalize(signals(1,:));
for t=2:r
v=signals(t,:);
for k=1:t-1
g=sum(v.*conj(basis(k,:))); %inner product to get projection
v=v-(g.*basis(k,:));
end
basis(t,:)=normalize(v);
end
72
% the main program for the Capstone classifier
%Initializes either training or use of the system
%note: this is an old version
t=1;
r=input('To train a new sequence press 0, to test press 1 ');
if r==0
%we gather training vectors
name=input('First Initial Last Name ','s');
deviceObj=start;
[Basis, Wavforms, Idp]=data_gather(name,deviceObj);
end
if r==1
%we use a previously defined training set and do reco
name=input('First Initial and Last name ','s');
name=strcat(name,'.mat');
load(name)
while t==1
deviceObj=start;
wave=get_waveform(deviceObj);
p=projections(wave,Basis);
d=distance(IdP,p);
c=corrs(wave,Waveforms);
decide(d,c)
t=input('To Continue Press 1, to Exit press 0');
end
end
73
%this function finds the projections of all signals onto all basis
%signals are on the rows, basis projection values on the columns
function ps=projections(signals, basis);
[r1 c1]=size(signals);
[r2,c2]=size(basis);
for q=1:r1
for t=1:r2
ps(q,t)=sum(signals(q,:).*conj(basis(t,:)));
end
end
74
%Neural Network: feature vectors and functions to create them changed frequently
numset=20; % this is the number of examples of data we will use to train the nn
W=zeros(6,6,4);
load('JPadgettnabs.mat'); M1=M;
load('JPadgettnabs2.mat');
[Basis Waveforms IdP]=createbasis(M);
[r c dpth]=size(M);
%want to get into 6x10000x10
W1=zeros(10000,6,numset);
for d=1:numset
for k=1:6
if d<=10
W1(:,k,d)= M(d,:,k);
elseif d>10 && d<21
W1(:,k,d)= M1(d-10,:,k);
end
%
if d>=21
W1(:,k,d)= M2(d-20,:,k);
end
%
end
end
[r1 c1 depth]=size(W1);
for d=1:depth
for k=1:dpth
W(:,k,d)=createfv3(W1(:,k,d), Waveforms); %see how the frequency works
end
end
%W=W/max(max(max(W)));
%W=abs(W);
75
targets=eye(6);
alphabet=W(:,:,1);
%[alphabet,targets] = prprob;
[R,Q] = size(alphabet);
[S2,Q] = size(targets);
%m1=min(min(min(W)));
%m2=max(max(max(W)));
netn = newff(minmax(alphabet),[41 S2],...
'tansig' 'tansig' ,'traingdx');
%tan sig appears to work bestin this case :)
netn.trainParam.goal = .04; % Mean-squared error goal.... 0.03 worked well here... not sure
why it %stopped working .. because you used another
%set to train the network, silly :p
netn.trainParam.epochs = 800; % Maximum number of epochs to train.
T=targets;
for pass = 1:depth
k=mod(pass,2);
fprintf('Pass = %.0f\n',pass);
%
P = [alphabet, alphabet, ...
(alphabet + randn(R,Q)*0.1), ...
(alphabet + randn(R,Q)*.02)];
%
if k==0
P=W(:,:,floor(pass/2));
end
if k==1
P=W(:,:,depth-floor(pass/2)) ;
end
76
[netn,tr] = train(netn,P,T);
end
netn.trainParam.goal = .1; % Mean-squared error goal.
netn.trainParam.epochs = 500; % Maximum number of epochs to train.
net.trainParam.show = 5; % Frequency of progress displays (in epochs).
P = alphabet;
T = targets;
[netn,tr] = train(netn,P,T);
77
function s=normalize(sig);
s=sig/sqrt(sum(abs(sig).^2));
78
%this function finds the projections of all signals onto all basis
%signals are on the rows, basis projection values on the columns
function ps=projections(signals, basis);
[r1 c1]=size(signals);
[r2,c2]=size(basis);
for q=1:r1
for t=1:r2
ps(q,t)=sum(signals(q,:).*conj(basis(t,:)));
end
end
79
Appendix C- Speech Synthesis Code
NewRecord.m
%Create an audiorecorder object for CD-quality audio in stereo,
and view its properties:
recObj = audiorecorder(44100, 16, 2);
get(recObj)
%Collect a sample of your speech with a microphone, and plot the
signal data:
fs=2000;
t=1:2000;
s=sin(2*pi*500*t./fs);
% Record your voice for 10 seconds.
recObj = audiorecorder;
disp('Say Phoneme at the Beep')
sound(s,2000);
pause(1);
recordblocking(recObj, 10);
disp('End of Recording.');
% Play back the recording.
play(recObj);
80
% Store data in double-precision array.
myRecording = getaudiodata(recObj);
% Plot the waveform.
figure(1)
plot(myRecording);
%Save as a .wav file
wavwrite(myRecording,'alpha.wav');
81
concatAlpha.m
%concatenating Alpha
%reading 'ae'
a = wavread('ae');
%graph of the 'ae'
figure(1)
plot(a)
title('ae')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'ae'
b= a(7100:8600)
figure(2)
plot(b)
title('Cropped ae')
xlabel('Freq (Hz)')
ylabel('dB')
wavwrite(b,'CropAe.wav');
cAe = wavread('CropAe');
82
%reading 'l'
c = wavread('l');
%graph of the 'l'
figure(3)
plot(c)
title('l')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of the 'l'
d= c(4300:6400)
figure(4)
plot(d)
title('Cropped l')
xlabel('Freq (Hz)')
ylabel('dB')
wavwrite(d,'CropL.wav');
cL = wavread('CropL');
%readin 'f'
e = wavread('f');
%graph of the 'f'
83
figure(5)
plot(e)
title('f')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of the 'f'
f= e(4000:6500)
figure(6)
plot(f)
title('Cropped f')
xlabel('Freq (Hz)')
ylabel('dB')
wavwrite(f,'CropF.wav');
cF = wavread('CropF');
%readin 'ah'
g = wavread('ah');
%graph of the 'ah'
figure(7)
plot(g)
title('ah')
xlabel('Freq (Hz)')
ylabel('dB')
84
%cropped graph of the 'ah'
h= g(7500:9000)
figure(8)
plot(h)
title('Cropped ah')
xlabel('Freq (Hz)')
ylabel('dB')
wavwrite(h,'CropAh.wav');
cAh = wavread('CropAh');
%concatenating the cropped sound of Alpha
alpha = [cAe;cL;cF;cAh];
sound(alpha)
%writing the Alpha.wav file
wavwrite(alpha, 'Alpha');
figure(9)
plot(alpha)
title('Alpha')
xlabel('Freq (Hz)')
ylabel('dB')
%Savitzky-Golay Filter applied to Alpha
85
frame = 9;
degree = 0;
y = sgolayfilt(alpha, degree, frame);
figure(10)
plot(y)
title('Filtered Alpha')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing filtered sound of Alpha
wavwrite(alpha, 'FiltAlpha.wav');
86
concatOmega.m
%concatenating Omega
%reding 'ow'
a = wavread('ow');
%graph of the 'ow'
figure(1)
plot(a)
title('ow')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'ow'
b= a(4100:6200)
figure(2)
plot(b)
title('Cropped ow')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropOw.wav file
wavwrite(b,'CropOw.wav');
87
cO = wavread('CropOw');
%reading 'm'
c = wavread('m');
figure(3)
plot(c)
title('m')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'm'
d= c(3800:7000)
figure(4)
plot(d)
title('Cropped m')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped Crop.wav file
wavwrite(d,'CropM.wav');
cM = wavread('CropM');
%reading 'eh'
e = wavread('eh');
88
%graph of 'eh'
figure(5)
plot(e)
title('eh')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'eh'
f= e(6000:7800)
figure(6)
plot(f)
title('Cropped eh')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropEh.wav file
wavwrite(f,'CropEh.wav');
cEh = wavread('CropEh');
%reading 'g'
g = wavread('g');
%graph of 'g'
figure(7)
plot(g)
89
title('g')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'g'
h= g(6800:7600)
figure(8)
plot(h)
title('Cropped g')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropG.wav file
wavwrite(h,'CropG.wav');
cG = wavread('CropG');
%reading 'ah'
i = wavread('ah');
%graph of 'ah'
figure(9)
plot(i)
title('ah')
xlabel('Freq (Hz)')
ylabel('dB')
90
%cropped graph of 'ah'
j= i(7500:9000)
figure(10)
plot(j)
title('Cropped ah')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropAh.wav file
wavwrite(j,'CropAh.wav');
cAh = wavread('CropAh');
%concatenating the cropped sound of Omega
omega = [cO;cM;cEh;cG;cAh];
sound(omega)
%writing the Omega.wav file
wavwrite(omega, 'Omega');
figure(11)
plot(omega)
title('Omega')
xlabel('Freq (Hz)')
ylabel('dB')
91
%Savitzky-Golay Filter applied to Omega
frame = 9;
degree = 0;
y = sgolayfilt(omega, degree, frame);
figure(12)
plot(y)
title('Filtered Omega')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing filtered sound of Omega
wavwrite(omega, 'FiltOmega.wav')
92
concatLeft.m
%concatenating Left
%reading 'l'
a = wavread('l');
%graph of 'l'
figure(1)
plot(a)
title('l')
xlabel('Freq (Hz)')
ylabel('dB')
%graph of cropped 'l'
b= a(4300:6400)
figure(2)
plot(b)
title('Cropped l')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropL.wav file
wavwrite(b,'CropL.wav');
cL = wavread('CropL');
93
%reading 'eh'
c = wavread('eh');
figure(3)
plot(c)
title('eh')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'eh'
d= c(6000:7800)
figure(4)
plot(d)
title('Cropped eh')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropEh.wav file
wavwrite(d,'CropEh.wav');
cEh = wavread('CropEh');
%reading 'f'
e = wavread('f');
figure(5)
plot(e)
title('f')
94
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'f'
f= e(4000:6500)
figure(6)
plot(f)
title('Cropped f')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropF.wav file
wavwrite(f,'CropF.wav');
cF = wavread('CropF');
%reading 't'
g = wavread('t');
figure(7)
plot(g)
title('t')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 't'
h= g(5900:7000)
95
figure(8)
plot(h)
title('Cropped t')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropT.wav file
wavwrite(h,'CropT.wav');
cT = wavread('CropT');
%concatenating the cropped sound of Left
left = [cL;cEh;cF;cT];
sound(left)
%writing the Left.wav file
wavwrite(left, 'Left')
figure(9)
plot(left)
title('Left')
xlabel('Freq (Hz)')
ylabel('dB')
%Savitzky-Golay Filter applied to Left
frame = 9;
96
degree = 0;
y = sgolayfilt(left, degree, frame);
figure(10)
plot(y)
title('Filtered Left')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing the filtered sound of Left
wavwrite(left, 'FiltLeft.wav');
97
concatRight.m
%concatenating Right
%reading 'r'
a = wavread('r');
%graph of 'r'
figure(1)
plot(a)
title('r')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'r'
b= a(4500:6500)
figure(2)
plot(b)
title('Cropped r')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropR.wav file
wavwrite(b,'CropR.wav');
cR = wavread('CropR');
98
%reading 'ay'
c = wavread('ay');
%graph of 'ay'
figure(3)
plot(c)
title('ay')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'ay'
d= c(6000:7000)
figure(4)
plot(d)
title('Cropped ay')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropAy.wav file
wavwrite(d,'CropAy.wav');
cAy = wavread('CropAy');
%reading 't'
e = wavread('t');
99
%graph of 't'
figure(5)
plot(e)
title('t')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 't'
f= e(5900:7000)
figure(6)
plot(f)
title('Cropped t')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropT.wav file
wavwrite(f,'CropT.wav');
cT = wavread('CropT');
%concatenating the cropped sound of Right
right = [cR;cAy;cT];
sound(right)
%writing the Right.wav file
100
wavwrite(right, 'Right')
figure(7)
plot(right)
title('Right')
xlabel('Freq (Hz)')
ylabel('dB')
%Savitzky-Golay Filter applied to Right
frame = 9;
degree = 0;
y = sgolayfilt(right, degree, frame);
figure(8)
plot(y)
title('Filtered Right')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing filltered sound of Right
wavwrite(right, 'FiltRight.wav');
101
concatForward.m
%concatenating Forward
%reading 'f'
a = wavread('f');
%graph of the 'f'
figure(1)
plot(a)
title('f')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'f'
b= a(4000:6500)
figure(2)
plot(b)
title('Cropped f')
xlabel('Freq (Hz)')
ylabel('dB')
102
%writing cropped CropF.wav file
wavwrite(b,'CropF.wav');
cF = wavread('CropF');
%reading 'uh'
c = wavread('uh');
%graph of 'uh'
figure(3)
plot(c)
title('uh')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'uh'
d= c(5900:7000)
figure(4)
plot(d)
title('Cropped uh')
xlabel('Freq (Hz)')
ylabel('dB')
103
%writing cropped CropUh.wav file
wavwrite(d,'CropUh.wav');
cUh = wavread('CropUh');
%reading 'r'
e = wavread('r');
%graph of 'r'
figure(5)
plot(e)
title('r')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'r'
f= e(4500:6500)
figure(6)
plot(f)
title('Cropped r')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropR.wav file
wavwrite(f,'CropR.wav');
cR = wavread('CropR');
104
%reading w
g = wavread('w');
%graph of 'w'
figure(7)
plot(g)
title('w')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'w'
h= g(3900:6600)
figure(8)
plot(h)
title('Cropped w')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropW.wav file
wavwrite(h,'CropW.wav');
cW = wavread('CropW');
105
%reading 'axr'
i = wavread('axr');
%graph of 'axr'
figure(9)
plot(i)
title('axr')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'axr'
j= i(6100:8100)
figure(10)
plot(j)
title('Cropped axr')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropAxr.wav file
wavwrite(j,'CropAxr.wav');
cAxr = wavread('CropAxr');
%reading 'd'
k = wavread('d');
106
figure(11)
plot(k)
title('d')
xlabel('Freq (Hz)')
ylabel('dB')
l= k(6050:7800)
figure(12)
plot(b)
title('Cropped d')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropD.wav file
wavwrite(l,'CropD.wav');
cD = wavread('CropD');
%concatenating the cropped sound of Forward
forward = [cF;cUh;cR;cW;cAxr;cD];
sound(forward)
%writing the Forward.wav file
wavwrite(forward, 'Forward.wav');
figure(13)
107
plot(forward)
title('Forward')
xlabel('Freq (Hz)')
ylabel('dB')
%Savitzky-Golay Filter applied to Forward
frame = 9;
degree = 0;
y = sgolayfilt(forward, degree, frame);
figure(14)
plot(y)
title('Filtered Forward')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing filtered sound of Forward
wavwrite(forward, 'FiltForward.wav');
108
concatReverse.m
%concatenating Reverse
%readin 'r'
a = wavread('r');
%graph of 'r'
figure(1)
plot(a)
title('r')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'r'
b= a(5000:7300)
figure(2)
plot(b)
title('Cropped r')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropR.wav file
wavwrite(b,'CropR.wav');
cR = wavread('CropR');
109
%readin 'iy'
c = wavread('iy');
%graph of 'iy'
figure(3)
plot(c)
title('iy')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'iy'
d= c(7800:9100)
figure(4)
plot(d)
title('Cropped iy')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropIy.wav file
wavwrite(d,'CropIy.wav');
cIy = wavread('CropIy');
%reading 'v'
e = wavread('v');
110
%graph of 'v'
figure(5)
plot(e)
title('v')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'v'
f= e(14000:15500)
figure(6)
plot(f)
title('Cropped v')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropV.wav file
wavwrite(f,'CropV.wav');
cV = wavread('CropV');
%reading 'axr'
g = wavread('axr');
%graph of 'axr'
figure(7)
plot(g)
111
title('axr')
xlabel('Freq (Hz)')
ylabel('dB')
%cropped graph of 'axr'
h= g(6100:8100)
figure(8)
plot(h)
title('Cropped axr')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropAxr.wav file
wavwrite(h,'CropAxr.wav');
cAxr = wavread('CropAxr');
%reading 's'
i = wavread('s');
%graph of 's'
figure(9)
plot(i)
title('s')
xlabel('Freq (Hz)')
112
ylabel('dB')
%cropped graph of 's'
j= i(5100:6000)
figure(10)
plot(j)
title('Cropped s')
xlabel('Freq (Hz)')
ylabel('dB')
%writing cropped CropS.wav file
wavwrite(j,'CropS.wav');
cS = wavread('CropS');
%concatenating the cropped sound of Reverse
reverse = [cR;cIy;cV;cAxr;cS];
sound(reverse)
%writing the Reverse.wav file
wavwrite(reverse, 'Reverse');
figure(11)
plot(reverse)
title('Reverse')
xlabel('Freq (Hz)')
113
ylabel('dB')
%Savitzky-Golay Filter applied to Reverse
frame = 9;
degree = 0;
y = sgolayfilt(reverse, degree, frame);
figure(12)
plot(y)
title('Filtered Reverse')
xlabel('Freq (Hz)')
ylabel('dB')
sound(y)
%writing filtered sound of Reverse
wavwrite(reverse, 'FiltReverse.wav');
114
interpAlpha.m
clear;
clc;
[x,Fs,bits]=wavread('FiltAlpha.wav');
n = length(x)
k1 = wavread('FiltAlpha');
sound(k1)
%Interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewAlpha.wav');
s1 = wavread('NewAlpha');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Alpha Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
115
title('Interpolated Alpha Signal')
xlabel('Freq (Hz)')
ylabel('dB')
116
interpOmega.m
clear;
clc;
[x,Fs,bits]=wavread('FiltOmega.wav');
n = length(x)
k1 = wavread('FiltOmega');
sound(k1)
%interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewOmega.wav');
s1 = wavread('NewOmega');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Omega Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
117
title('Interpolated Omega Signal')
xlabel('Freq (Hz)')
ylabel('dB')
118
interpLeft.m
clear;
clc;
[x,Fs,bits]=wavread('FiltLeft.wav');
n = length(x)
k1 = wavread('FiltLeft');
sound(k1)
%interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewLeft.wav');
s1 = wavread('NewLeft');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Left Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
title('Interpolated Left Signal')
xlabel('Freq (Hz)')
ylabel('dB')
119
interpRight.m
clear;
clc;
[x,Fs,bits]=wavread('FiltRight.wav');
n = length(x)
k1 = wavread('FiltRight');
sound(k1)
%interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewRight.wav');
s1 = wavread('NewRight');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Right Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
title('Interpolated Right Signal')
120
xlabel('Freq (Hz)')
ylabel('dB')
121
interpForward.m
clear;
clc;
[x,Fs,bits]=wavread('FiltForward.wav');
n = length(x)
k1 = wavread('FiltForward');
sound(k1)
%Interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewForward.wav');
s1 = wavread('NewForward');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Forward Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
title('Interpolated Forward Signal')
122
xlabel('Freq (Hz)')
ylabel('dB')
123
interpReverse.m
clear;
clc;
[x,Fs,bits]=wavread('FiltReverse.wav');
n = length(x)
k1 = wavread('FiltReverse');
sound(k1)
%interpolation of the signal
intX = zeros(1,2*length(x));
intX(1:2:2*length(x)) =x;
wavwrite(x,Fs,bits,'NewReverse.wav');
s1 = wavread('NewReverse');
sound(s1)
figure(1)
plot([1:n],x)
title('Original Reverse Signal');
xlabel('Freq (Hz)')
ylabel('dB')
figure(2)
plot([1:2*n],intX)
title('Interpolated Reverse Signal')
124
xlabel('Freq (Hz)')
ylabel('dB')
125
TextToSpeech.m
%produce speech from the word
word = input('What word would you like to speak? ', 's')
switch word
case 'alpha'
a = wavread('NewAlpha.wav');
sound(a)
case 'omega'
b = wavread('NewOmega.wav');
sound(b)
case 'left'
c = wavread('NewLeft.wav');
sound(c)
case 'right'
d = wavread('NewRight.wav');
sound(d)
case 'forward'
e = wavread('NewForward.wav');
sound(e)
case 'reverse'
f = wavread('NewReverse.wav');
sound(f)
end