synopsis voice recognition

7/27/2019 Synopsis voice recognition

1/9

HARDENING VOICE BASED AUTHENTICATION SYSTEMS

BY ENHANCING CONFIDENTIALITY AND INTEGRITY

FOR SECURE ENTRY POINTS:

Research Objective to Improve Integrity and Confidentiality

using

A

Synopsis

Submitted By

XXX-XXX

Supervised By

Xxx. Xxxxxx Xxxxxx

Assistant Professor (Computer Science)

Department of Computer Science,

Punjabi University Regional Center of Information & Technology,


2/9

Mohali


3/9

ABSTRACT

Speech analysis is a unique biometric property of human being, which can be used for

various purposes. A handful of research has been already done in this field which includes

various fields, speech analysis and synthesis, voice based authentication, etc. Voice print highintegrity and confidentiality is very important when applying speech analysis on voice based

authentication systems. Voice recognition based authentication systems can be used both

individually and in combination with other biometric or other authentication techniques.

Speech analysis integrity is very important when using speech analysis/voice recognition for

authentication systems. In this research we will experiment with various voice integrity

methods under speech analysis research for my master thesis research, and will select the best

and appropriate method, and will improve it to a higher accuracy level. In case, if found all of

the techniques not up to the mark, the research will lead to the creation of all new algorithm.

1. INTRODUCTION

People have always kept secrets and protected their possessions. It has evolved

from the physical to the technological, where now technology is used to restrict

access to our resources and user authentication is the key to the door.

Authentication techniques have developed with it, and now who can access our

resources is controlled using three main methods:

Knowledge -based authentication is based on information authorized individuals will

know, and unauthorized individuals will not. E.g. A PIN or a password. information.

Object-based authentication is based on possessing a token or tool th at perm its the

person access to the controlled resource. E.g. Keys, pass cards or a secure Id.

Biometric-based authentication measures individuals unique physical or behavioral

characteristics. It exists today in various form s such as fingerprint verification, retinal

scans, facial analysis, analysis of vein structures and voice authentication.

1.1 Speech

Now that we are aware of the technologies voice authentication employs, we need to

look at how the voice is produced, the characteristics that allow us to extract

meaning from it, and the method by which it can be converted into a form that can be

handled by com puter systems. In doing this particular attention will be paid to those

Key fingerprint =of the FA27 2F94 998D FDB5unique F8B5 06E4individual, therefore


4/9

characteristics AF19 voice that render it DE3D for each A169 4E46

allowing their identification. To do this, we can examine the physiological component

of human speech, which is produced by the human voice tract.

In simple terms, the voice is created by air passing over the larynx or other parts of

the vocal tract. The larynx vibrates creating an acoustic wave, essentially a hum,

which is modified by the motion of the p alate, tongue and lips. Sounds produced by

the larynx are called phonated or voiced sounds. Examples of voiced sounds would

be the m in mud or the r in ram . Simultaneously, other sounds are produced by

other parts of the vocal tract, for instance whispered sounds are created by air

rushing over the arytenoids cartilage at the back of the throat. Sounds not originating

in the larynx are called unvoiced sounds. Exam ples of these would be the f in fish

or the s in sea.

All sounds produced are, at the same time, fundamentally influenced by the actual

shape of the vocal tract. This shape is brought about both as a consequence of

hereditary and developmental factors, and of environmental factors.

In parallel to these physiological characteristics, speech contains a behavioural

component. This manifests itself as the accent o f the voice, and affects how quickly

words are spoken, how sounds are pronounced and emphasized, and what other

mannerisms are applied to speech.

Together these physiological and behavioural factors combine to produce voice

patterns that are essentially unique for every individual, and are difficult or

impossible to duplicate short of recording the voice.

When analyzed using modern technology, human spe ech appears to be rather

inefficient, in terms of time and energy expended to transfer information. Speech is

constructed out of various sounds, termed phonemes. Common English usage

utilises around 40 phonemes, analogous to the characters of the phonetic alphabet

seen in most dictionaries. For instance the word mud uses three phonemes

denoted /m / (the mmm sound), /u/ (the uh), and /d/.

1.2 Digitizing human speech.

The human voice produces a highly complex acoustic wave, which, fortunately, the

hum an ear and brain have evolved to interpret effectively. In technical terms thevoice is an analogue signal. An analogue signal is defined as one with a


5/9

continuously variable physical value. A single frequency tone, such as the dial tone

on a telephone will be a simple analogue signal. Such a simple signal will take the

form of a Sine Wave

In explaining how analogue signals can be converted to digital data, I will use the

example of the Sine Wave. This is appropriate because of a ba sic rule of signal

theory called Fouriers principle. In essence, this rule states that any signal, even

one as complex as a voiceprint, is in fact a combination of many simple sine waves,

of varying frequencies (cycles per second) and amplitudes (strength s), added

together. This means that any process that works for a Sine Wave, will work for any

other kind of signal as well.

The initial problem is to convert this analogue sound signal into a digital signal (i.e. a

sequence of numbers that a computer can input and manipulate). The first stage,

converting it from a sound to an electrical wave is simple, using a microphone.

The second stage involves converting the signal from a continuous wave to a series

of discrete voltage measurements. This is done by a process called sampling.

Sampling involves measuring the voltage of the signal at regular intervals, many

times per second.

If we use the sine wave of Figure 2 as an example: Say the wave has an amplitudeof 10 Volts: that is to say its peak positive voltage is 5 Volts and its correspondingly

lowest negative voltage is 5 Vol ts. We see that the wave goes through a complete

cycle every 10 milliseconds (ms), this means that it completes 100 cycles every

second, so it has a frequency of 100Hz. (Hz, Hertz , is the measure of cycles or

measured events per second.)

To effectively sample this signal we need to measure its voltage every 5 ms, a

sampling frequency of 200Hz. We know that we must use at least this sampling

frequency because of a fundamental rule of communications called Shannons Law,

which says that in order to full y sample any signal, without losing any part of it, you

must have a sampling frequency of at least double the highest frequency contained

shows the sampling of the signal.

2. LITERATURE SURVEY


6/9

3. STATEMENT OF THE PROBLEM

High integrity in speech recognition and authentication system is very important and critical

property of the Speech processing systems. Less integrity can lead toward a higher FRR or

FAR values, which indicates the failure of the system.

4. OBJECTIVES OF THE PROJECT WORK

In order to satisfy the goal of the research following objectives have been identified:

1. Study the Speech Processing techniques

2. Study the Voice authentication sysetms

3. Study the Speech processing research methods

4. Study various hybrid authentication systems based on voice authentication

5. Use and test various speech recognition models

6. Use and test speech integrity in existing speech recognition models

7. Selection and Creation of an improved speech recognition technique with high

integrity

8. Test and compare the results with existing systems

9. Test in the real-time on local and online voice print data


7/9

5. WORK DONE SO FAR

To achieve the goal of the research following work has been done so far:

1. Literature Survey and Problem Identification.

2. Studied about various methods of speech recognition and voice authentication techniques

3. Studied about the Short-time Fourier analysis, Cepstral analysis algorithms

4. Studied about various Pitch extraction techniques like Autocorrelation technique,

Cepstrum technique, Tracking and Smoothing, etc.

5. Studied the MATLAB simulator

6. Studied the existing speech/voice recognition and authentication sysetms.

6. REFERENCES

1. A Practical guide to Biometric Security Technology,

Silverm an, 2001, IEEE

Sim on Liu and Mark

http://www.computer.org/itpro/hom epage/jan_feb01/security3.htm

2. Hendry, M, Sm art Card Security and Applications, Artech House

3. Privacy Laws & Business 9 th Privacy Commissioners' / Data Protection Authorities

Biometrics as a Privacy -Enhancing Technology: Friend or Foe of Pri vacy?

Workshop, Dr. George Tom ko, Sept. 15 t h , 1998

http://www.dss.state.ct.us/digital/tom ko.htm

4. Speech Technology for Telecommunications, F.A. Westhall, R.D. Johnston, A.V.

Lewis


8/9

Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46

http://www.eee.bham.ac.uk/woolleysi/downloads /speech/2.pdf

5. Speech Links, Andrew Hunt

http://www.speech.cs.cm u.edu/com p.speech/

rr

6. Making Speech Sounds Phonetics, Oxford uni versity press

http://www.oup.com/pdf/elt/catalogue/0 -19-437239-1-b.pdf

7. Phoneti cs and the theory of speech production

http://www.acoustics.hut.fi/~slemmett/dippa/chap3.htm l

8. Speaker Verification, Biometrics Research dept, MSU

http://biometrics.cse.msu.edu/speaker.html

9. Speaker Recognition, Identifying people by their voices, BRNO University of

Technology, I ng. Milan Sigm und, 2000

http://deer.ro.vutbr.cz/nakl/habilit/sigm und.pdf

& Speaker Recognition, Sadaoki Furui , NTT Human Interface Laboratories, Tokyo

cslu.cse.ogi.edu/HLTsurvey/ch1node9.html

10. Visa gets behind voice recognition for e -commerce, Paul Roberts, IDG News

Service 10/21/02

www.nwfusion.com/n ews/2002/1021visa.html

Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46

11. Voice Recognition, Jim Baumann, Human Interface Technology Laboratory,

University of Washington.


9/9

http://www.hitl.washington.edu/scivw/EVE/I .D.2.d.VoiceRecognition.htm l

synopsis voice recognition

Documents