voice recognition

17
Voice Recognition Josh Lintag Regie Longoria Ryan Mendez

Upload: ryan-mendez

Post on 31-Aug-2014

170 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Voice Recognition

Voice Recognition

Josh LintagRegie LongoriaRyan Mendez

Page 2: Voice Recognition

Initial Problem

• Problems with variation– Sample length and emphasis– Time domain issue: Starting and ending at the

same time• Program Design– Using the frequency domain to compare– Take an average of voice

Page 3: Voice Recognition

Basic Recording

• Create a for loop for recording 10 different samples of voice to be averagedfor i = 1:10file = sprintf('%s%d.wav','g',i);input('You have 2 seconds to say your name. Press enter when ready to record--> ');y = wavrecord(88200,44100); sound(y,44100); wavwrite(y,44100,file);end

• Writes wav files into “file”

Page 4: Voice Recognition

Basic Recording 2

• You’re probably wondering what this line means:y = wavrecord(88200,44100); This line basically setting the time of the recording. How do you get two seconds out of this? Well, you take the frequency of the recording (44100 hz) and divide it by 88200hz. Which gives you a half. Then you inverse the half due to the fact that HZ is just 1/second. In the end, you’d have two seconds.

Page 5: Voice Recognition

Coding of the Action

name = input ('Enter the name that must be recognized -- >','s');ytemp = zeros (88200,20);r = zeros (10,1);for j = 1:10 file = sprintf ('% s % d.wav','g',j); [t, fs] = wavread (file); s = abs (t); start = 1; last = 88200; for i = 1:88200 if s (i) >=.1 && i <=7000 start = 1; break end if s (i) >=.1 && i > 7000 start = i-7000; break end end

for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=81200 last = 88200; break end if s (k)>= .1 && k <81200 last = k + 7000; break end end r (j) = last-start; ytemp (1: last - start + 1,2 * j) = t (start:last); ytemp (1: last - start + 1,(2*j - 1)) = t (start:last);end

Page 6: Voice Recognition

What This Means

• This bit of code makes it look like a lot going on. Really, this code is taking the WAV file and converting it to a matrix. The first chunk is determining where your voice starts. The second is determining where it ends. It does this by determining where the drastic changes are in the frequency. It then determines the length of the entire recording.

Page 7: Voice Recognition

Truncation, FFT, Normalizationy = zeros (min (r),20);for i = 1:20 y (:,i) = ytemp (1:min (r),i);end

fy = fft (y);fy = fy.*conj (fy);

fn = zeros (600,20);for i = 1:20 fn (1:600,i) = fy (1:600,i)/sqrt(sum (abs (fy (1:600,i)).^2));end

Page 8: Voice Recognition

What This Means

• The first part truncates the matrix to find the minimization.

• The second part transforms it to actual waves (into the frequency domain.)

• The third part is basically getting rid of background noise by having it set to only what frequencies human speech is capable of.

Page 9: Voice Recognition

Average Vector, Norm, and STDpu = zeros (600,1);for i = 1:20 pu = pu + fn (1:600,i);endpu = pu/20;

tn = pu/sqrt(sum (abs (pu).^2));

std = 0;for i = 1:20 std = std + sum (abs (fn (1:600,i)-tn).^2);endstd = sqrt (std/19);

Page 10: Voice Recognition

What This Means

• The first part’s job is to simply create the average vector from the values of the matrices given in the last bit of code.

• The second portion normalizes the value given by the first.

• The third simply finds the standard deviation of the values.

Page 11: Voice Recognition

Verification• Verification process

input ('You will have 2 seconds to say your name. Press enter when ready')

usertemp = wavrecord (88200,44100);sound (usertemp,44100);rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> ');while rec == 1 rec = 0; input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> ');end

Page 12: Voice Recognition

What This Means

• This is the part where you record your voice for two seconds. If you’re unhappy with it, you click 1, thus clearing that last recording and restarting with a fresh one.

Page 13: Voice Recognition

Test Crops = abs (usertemp);start = 1;last = 88200;for i = 1:88200 if s (i) >=.1 && i <=5000 start = 1; break end if s (i) >=.1 && i > 5000 start = i-5000; break endend

for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=83200 last = 88200; break end if s (k)>= .1 && k <83200 last = k + 5000; break endend

Page 14: Voice Recognition

What This Means

Like a couple slides ago, this bit is cropping the voice recording down to a size mandated by the project. Two seconds, that is.

Page 15: Voice Recognition

FFT, Plotuser = usertemp (start:last);userftemp = fft (user);userftemp = userftemp.*conj (userftemp);userf = userftemp (1:600);userfn = userf/sqrt(sum (abs (userf).^2));

hold on;subplot (2,1,1);plot (userfn)title ('Normalized Frequency Spectra Of Recording')subplot (2,1,2);plot (tn);title ('Normalized Frequency Spectra of Average')

Page 16: Voice Recognition

What This Means• Computes the FFT of the recording and then normalizes it• Both the recording and the average vector is graphed onto a

plot, first half is recording and the 2nd half is average vector

Page 17: Voice Recognition

Testing

s = sqrt (sum (abs (userfn - tn).^2));if s < 2*std name = strcat ('HELLO----',name,' !!!!'); nameelse name = strcat ('YOU ARE NOT---- ',name,' !!!!'); nameend