papers on speaker recognition

Upload: amardeepsinghseera

Post on 03-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Papers on Speaker Recognition

    1/5

    5. COMPARISON TO OTHER BIOMETRICS

    It is commonly asked, how does speaker verification compare to

    other biometrics, such as iris, fingerprint or face recognition?

    There really is no complete way to compare different biometrics

    since there are so many dimensions on which to evaluate a

    biometric (accuracy, suitability for application, ease of use,

    recognition time, cost, etc). However, in this section we discuss

    some of the strengths and weaknesses of speaker verification and

    point to one study which attempted to compare several

    biometrics based on accuracy.

    The main strength of speaker verification technology is that it

    relies on a signal that is natural and unobtrusive to produce and

    can be obtained easily from almost anywhere using the familiar

    telephone network (or internet) with no special user equipment or

    training. This technology has prime utility for applications with

    remote users and applications already employing a speech

    interface. Additionally, speaker verification is easy to use, has

    low computation requirements (can be ported to smartcards and

    handhelds) and, given appropriate constraints, has high accuracy.

    Some of the flexibility of speech actually lends to its weaknesses.

    First, speech is a behavioral signal that may not be consistently

    reproduced by a speaker and can be affected by a speakers

    health (cold or laryngitis). Second, the varied microphones and

    channels that people use can cause difficulties since most speaker

    verification systems rely on low-level spectrum features

    susceptible to transducer/channel effects. Also, the mobility of

    telephones means that people are using verification systems from

  • 8/12/2019 Papers on Speaker Recognition

    2/5

    more uncontrolled and harsh acoustic environments (cars,

    crowded airports), which can stress accuracy. Robustness to

    channel variability is the biggest challenge to current systems.

    Spoofing of systems is often cited as a weakness, but there have

    been many approaches developed to thwart such attempts

    (prompted phrases, knowledge verification). There is current

    efforts underway to address these known weaknesses and some

    of these weaknesses may be overcome by combination with a

    complementary biometric, like face recognition.

    Finally, we show some results from a study by the United

    Kingdoms Communications-Electronics Security Group (CESG)

    that attempted to compare performance of several biometrics.

    The complete report can be found in [10]. In Figure 5 we show a

    DET plot for eight systems (1 face, 3 fingerprint, 1 hand, 1 iris, 1

    vein and 1 voice). While it is debatable that a test can be

    conducted to compare all these biometrics, it is interesting to

    note that voice verification performed quite well. Readers,

    however, should read the report to get all the details of the test.

    Voice

    Figure 5 DET curves from CESG study comparing

    several biometrics. (Best of three attempts Figure 6 [10]).

    6. FUTURE TRENDS

    In this section we briefly outline some of the trends in speaker

    recognition research and development.

    Exploitation of higher-levels of information: In addition to the

    low-level spectrum features used by current systems, there are

  • 8/12/2019 Papers on Speaker Recognition

    3/5

    many other sources of speaker information in the speech signal that can be used. These include

    idiolect (word usage), prosodic

    measures and other long-term signal measures. This work will be

    aided by the increasing use of reliable speech recognition

    systems for speaker recognition R&D. High-level features not

    only offer the potential to improve accuracy, they may also help

    improve robustness since they should be less susceptible to

    channel effects.

    In recent work, Doddington has shown that a speakers idiolect

    can be used to successfully verify a person [11], and Andrews et. Automatic Speaker Recognition:Current Approaches and Future Trends1

    Douglas A. Reynolds

    MIT Lincoln Laboratory, Lexington, MA USA

    [email protected]

    al [12] havehttp://www.cs.joensuu.fi/pages/tkinnu/research/

    used n-grams of phonetic sequences for verifying

    3. Vector Quantization

    Speaker recognition is the task of comparing an

    unknown speaker with a set of known speakers in a

    database and finding the best matching speaker.

    Vector quantization (VQ) is a lossy data compression

    method based on the principle of block coding. In

    Vector Quantization a large set of feature vectors are

    taken and a smaller set of measure vectors is produced

    which represents the centroids of the distribution.

    http://www.cs.joensuu.fi/pages/tkinnu/research/http://www.cs.joensuu.fi/pages/tkinnu/research/http://www.cs.joensuu.fi/pages/tkinnu/research/http://www.cs.joensuu.fi/pages/tkinnu/research/
  • 8/12/2019 Papers on Speaker Recognition

    4/5

    3.1 Speaker Database

    The first step is to build a speaker database,

    Cdatabase = {C1,C2, ,CN} consisting of N codebooks,

    first converting the raw input signal into a sequence of

    feature vectors X= {x1, , xT }. These feature vectors

    are clustered into a set of M codewords, C = {c1, ,

    cM}. The set of codewords is called a codebook. The

    clustering is done by a clustering algorithm, and here

    K-means clustering algorithm is used for this purpose.

    3.2 K-means

    The K-means algorithm partitions the X feature

    vectors into M centroids. The algorithm first chooses

    M cluster centroids among the X feature vectors. Then

    each feature vector is assigned to the nearest centroid,

    and the new centroids are calculated. This procedure

    is continued until a stopping criterion is met, that is

    the mean square error between the feature vectors and

    the cluster-centroids is below a certain threshold or

    there is no more change in the cluster-center

    assignment.

    3.3 Speaker Matching

  • 8/12/2019 Papers on Speaker Recognition

    5/5

    In the recognition phase an unknown speaker,

    represented by a sequence of feature vectors {x1,, xT

    }, is compared with the codebooks in the database.

    For each codebook a distortion measure is computed,

    and the speaker with the lowest distortion is chosen.

    One way to define the distortion measure is to use the

    average of the Euclidean Distances. The Euclidean

    distance is the ordinary distance between the two

    points that one would measure with a ruler, which can

    be proven by repeated application of the Pythagorean

    Theorem. Thus, each feature vector in the sequence X

    is compared with all the codebooks, and the codebook

    with the minimized average distance is chosen to be

    THE BEST