speech intelligibility

Upload: agustin-arias

Post on 02-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Speech Intelligibility

    1/6

    1

    Acoustics Instruments and Measurements June 2013, Caseros, Buenos Aires Province, Argentina

    SPEECH TRANSMISSION INDEX MEASUREMENTS

    AGUSTN Y. ARIAS 1

    1Universidad Nacional de Tres de Febrero, Buenos Aires, Argentina.

    [email protected]

    1. INTRODUCTIONThe STI was developed in the early 70s by

    Houtgast and Steeneken as an objective measure of

    speech transmission quality [1][2]. One of the

    advantages of the STI is its wide range of application

    areas. The STI is extensively used in room acoustics,for instance to assess intelligibility in auditoria,

    churches and conference rooms. But the STI is also

    applied to telecommunication channels, such as

    (mobile) telephone lines and radio transmissions.

    The STI is based on that the information in speech

    is represented acoustically in the form of amplitudemodulations. The human speech is essentially a

    sequence of modulated tonal and noise-like sounds. If

    portion of these modulations is lost then the

    intelligibility will decrease. The Modulation Transfer

    Function, which can be computed or measured,

    expresses loss and preservation of modulations. The

    STI is calculated directly from the ModulationTransfer Function.

    In this report the procedures employed to obtain

    the STI values from measuring two different speech

    situations are presented. In addition, the

    Definition (D50) parameter is calculated so as to

    obtain and evaluate the speech intelligibility with a

    different method.

    The measurements were performed inside of the

    National University of Tres de Febrero, Caseros,

    Buenos Aires, Argentina.

    2. ABRIEF DESCRIPTION OF THE STI ANDTHE DEFINITION DAccording to Steeneken and Houtgast, the

    determination of the STI-values is based on

    measuring the reduction of the signal modulationbetween the sound source and the measurement

    position in octave center frequencies of 125 to 8000

    Hz. They proceeded on the assumption that not only

    reverberation and noise reduce the intelligibility of

    speech, but generally all external signals or signal

    changes that occur on the path from source tolistener. For ascertaining this influence they employ

    the Modulation Transfer Function (MTF), shown in

    Figure 1, for acoustical purposes. The available

    useful signal S (signal) is put into relation with the

    prevailing interfering signal N (noise). Thedetermined modulation reduction factor m(f) is a

    factor that characterizes the interference with speech

    intelligibility:

    f: modulation frequency in HzRT: reverberation time in s

    S/N: signal/noise ratio in dB

    Figure 1. Modulation transfer function: input/outputcomparison.

    The MTF of a sound transmission path can be

    determined in various ways, the principle being the

    derivation of the modulation reduction factor from

    the comparison of the intensity modulations at the

    output and the input to the path. Thus, a) speech

    signals, or b) the impulse response, or c) special testsignals can be used. In this report, a set of

    Phonetically Balanced Word Lists (PB)[3] in Spanish

    was used as the speech signals, as it is describe in

    below.

  • 7/27/2019 Speech Intelligibility

    2/6

    2

    The MTF is determined for the range of relevant

    frequencies present in the envelope of natural speechsignals. The relevant range for these modulation

    frequencies extends from 0,63 to 12,5 Hz in 14 one-

    third octave bands. So, each octave band has 14

    modulation frequencies, resulting in a total of 98

    values, as can be seen in Table 1.The STI calculation methods using the MTI and

    the specific male and female weighting factors are

    defined in the standard IEC 60268-16 (Sound

    system equipment. Part 16: Objective rating of

    speech intelligibility by speech transmission

    index)[4].

    Table 1. Frequencies for the STI method.

    The STI values may vary between 0 and 1,indicating the degree to which a transmission channel

    degrades speech intelligibility. This means that

    perfectly intelligible speech, when transferred

    through a channel with an associated STI of 1, will

    remain perfectly intelligible. The closer the STI valueapproaches zero, the more information is lost.

    In the other hand, the acoustical parameter D50

    (Definition) is defined as the ratio of the sound

    energy that arrives to the receiver position in the first

    50ms and the total sound energy received[5]:

    : Instantaneous sound pressure in the impulse

    response.

    The higher the value, the better the speech

    intelligibility and loudness at the point of reception.

    3. MEASUREMENTS POSITIONSWith the objective to perform the measurements

    under realistic speech communication conditions, two

    different areas within the University were chosen.The first one corresponds to the second and the first

    floor halls, placing the sound source in the secondfloor and two receiver positions (microphones) in the

    first floor (Figure 2). The second one corresponds to

    the side stairs area (Figure 3).

    Figure 2. 1 and 2 floor halls. Microphone and soundsource location

    Figure 3. Stairs. Microphones and sound source location

    4. EQUIPMENT EMPLOYEDThe following list describes the equipment

    employed to perform the impulse responses

    measurements:

    KRK Rokit8 loudspeaker used as sound

    source to reproduce Log-Sine Sweep, andspeech signals.

    Sound Level Meter Svantek 959 class one

    with its calibrator

  • 7/27/2019 Speech Intelligibility

    3/6

    3

    Two DPA 4060 measurement microphones

    Notebook HP Pavilion

    MOTU audio interface

    Aurora Plugins for Adobe Audition

    The Aurora Plugins for Adobe Audition were

    used to generate the Log-Sine sweep and to post-process the recordings with the STI module.

    5. MEASUREMENTS PROCEDUREThe Aurora plugins computes the STI according

    to the standard IEC 60268-16. To accomplish with

    the requirements impose by the standard, there werecarried out a measurement of a "noiseless" impulse

    response with a Log-Sine Sweep, and separate

    measurements of the octave spectra of the speech

    signal and of the background noise.

    The speech signal consists in a set of twenty-five

    Phonetically Balanced Word Lists (PB) in Spanish:

    Table 2. Phonetically Balanced Word Lists (Spanish)

    Lastre Sexto Suela Cine Pera

    Moldes Letra Diosa Vega Fina

    Menta Surco Piano Dina Tero

    Cinco Selva Duque Kilo Beca

    Persa Cieno Milla Duna Reno

    The signals level of the previously list of PB

    words were calibrated at normal (60 dBA), raised (66

    dBA) and loud (72 dBA), measuring those Leq valuesat 1 m of the loudspeaker as shown in Figure 4.

    Figure 4. Leq calibration of the PB words signal

    The three signals were reproduced through the

    loudspeaker in both situations described previously

    (second and first floor - side stairs) and recorded.

    These are the Speech+Noise measurementsThen, the IR measurement where performed in

    the same microphones and loudspeaker positions

    employing the Log-Sine sweep signal (1 minute ofduration, 80-8000Hz). This measurement is necessary

    to obtain the reverberation time in the microphone

    position. In addition, Aurora requires a calibration

    signal to calibrate the SPL results. For this rea son, itwas recorded a 1Khz-94dB signal using the Svantek

    calibrator. Finally, a one minute background noise

    measurement was performed at the microphone

    position. So, there are six recordings for each

    microphone (receiver) position in the two situations:three of the speech signals (three different Leqs), oneLog-Sine sweep, one calibration signal and one

    background noise.

    6. USE OF THE STI MODULE IN AURORAThe STI module of Aurora is shown in Figure 6:

    Figure 5. STI module of Aurora plugins.

    As shown in Figure 5, all the signals explained

    above are required to obtain STI [6].

    First, the calibration signal is loaded to calibrate

    the amplitude scale of the STI module. It must bespecified the Leq value of that signal (94dB).

    Then, the background signal is loaded and stored

    as Noise.

    Later, the speech signal is loaded. As this

    measurement was performed in presence of the

    background noise, it must be stored as Sig+N. In

    this way, the module will calculate properly the

    signal, subtracting the noise which was alreadyestimated from the previous step.

    Finally, the measured impulse response is

    processed for computing the STI with the button

    Compute STI.

    7. RESULTS AND OBSERVATIONSTo perform the results evaluation it is necessary

    to know the references values for the STI and the D 50

    parameters. These are listed in Table 3 for the STI.

    The D50 preferred values must be > 40%. This

    parameter is commonly used to characterize theaters,where the preferred value must be > 50%, so it can be

    applied to these cases with a more permissive

    limit[7].

  • 7/27/2019 Speech Intelligibility

    4/6

    4

    Table 3. Preferred values for STI and D50parameters

    Intelligibility

    RatingSTI D50

    Excellent > 0,75

    > 40%

    Good 0,60 - 0,75

    Fair 0,45 - 0,60Poor 0,30 - 0,45

    Bad < 0,30

    7.1. Second and First floor hallsFigure 6 shows the NC curves and the results

    obtained of the background noise measurements.

    Figure 6. Background noise results and the corresponding

    NC profile assigned.

    The NC profile assigned is NC-50. It can be

    appreciated that in mid-frequencies the background

    noise has considerable energy. This is related to the

    human activity (people talking, professors teaching,

    etc.) at the time when the measurements were

    performed.The STI results for Male voice are presented in

    Table 4.

    Table 4. STI results for the First speech situation

    Speech Leq STI Male

    Receiver Position #1

    60dB 0,167

    66dB 0,395

    72dB 0,519

    Receiver Position #2

    60dB 0,104

    66dB 0,39

    72dB 0,495

    It can be observed that for the 60dB speech Leq

    the STI values are extremely low which indicates badspeech intelligibility. This is consistent with what

    happens in reality, as the distance between source and

    receiver is approximately 7 meters and both were

    positioned on different floors. Moreover, as detailed

    above, the measurement location is a space ofconstant transit of persons, which results remarkably

    deteriorated. Only the STI value obtained with 72 dB

    signal indicates that it achieves an acceptable degree

    of compression.

    The D50 results are shown in Figures 7 and 8 forboth microphone positions. It can be observed thatthe results are practically the same, and both of them

    indicates a good speech intelligibility because the

    results are above 60%, when the minimum value

    recommended is about 40%. It is important to remark

    that these results were obtained from the impulse

    response measurements employing the Log-Sine

    sweep. They are too much dependent of the sound

    pressure level generated by the sound source as it is

    defined in Eq.(2).

    Figure 7. D50 [%] results. Microphone position #1

    Figure 8. D50 [%] results. Microphone position #2

    7.2.Side stairs areaFigure 9 shows the NC curves and the results

    obtained of the background noise measurements.

  • 7/27/2019 Speech Intelligibility

    5/6

    5

    Figure 9. NC curves and background noise results.

    The NC profile assigned is NC-50. The

    background noise results for the stairs positions are

    very similar to those obtained in the floors halls, but

    in this situation, the distance between source an

    receiver positions is too small, approximately 3meters. Another important consideration is that the

    stairs area has very reflective limit surfaces, resulting

    in a high predominance of the reverberant field in the

    receiver position, corresponding to a high

    reverberation time. For these reason it is expected a

    better STI results in terms of sound pressure level of

    the speech signal, but attenuated for a raise in the

    reverberation time compared with the previoussituation.

    The STI results for Male voice are presented in

    Table 5.

    Table 5. STI results for the Second speech situation

    Speech Leq STI Male

    60dB 0,441

    66dB 0,445

    72dB 0,472

    It can be observed that the results are very similar

    in all the speech sound levels, and the speech

    transmission can be classified barely as Fair. The

    reverberation time is the main factor that injury theSTI because the receiver position was placed in the

    reverberant field of the enclosure. The T30 values

    measured with the Log-Sine sweep in the receiver

    positions of the stairs and the first floor hall are

    shown in Figure 10. It can be observed that the

    predominance of this parameter in the stairs is higher

    and it is reflected in worst STI results as it is defined

    in Eq(1) for the MTF. Moreover, the higher values of

    the T30 belong to the mid-frequencies spectrum,which is where the human voice predominates.

    Figure 10. T30 comparison for stairs and the first floor hall.

    The D50 results are shown in Figure 11. It is

    notable that the results are always below the 40%,

    which indicates a poor intelligibility consistent withthe STI reduction in comparison with the previous

    situation. Again, the main factor that impairs the

    speech intelligibility performance is the reverberant

    sound energy, which predominates over the direct

    sound energy in the receiver position.

    Figure 11. D50 [%] results.

    8. CONCLUSIONSThe results obtained in both cases are of interest

    to analyze acoustic solutions that improve speechintelligibility according to needs.

    On one hand, the results obtained for the first

    situation of measurement (the second and the firstfloors) indicate that it is necessary to raise the

    loudness of the voice to transmit the words and be

    understood by the receiver. The main factor affecting

    the intelligibility is the distance between the sound

    source and the receiver point (approx. 7m.) which is

    reasonable with expectations. But the results are alsogreatly influenced by the level of background noise,

    which ranked NC-55 profile exceeding the

    recommended value of NC-40 for this type of

    enclosure. The problem to reduce background noisevalue is that it is caused by human activity itself,

    which is difficult to attenuate.

  • 7/27/2019 Speech Intelligibility

    6/6

    6

    But it is also important to note that the required

    voice loudness increase is not too significant but mayinterfere negatively in teaching classes, so the best

    solution to this problem is to avoid this kind of

    conversation between different floors, or decrease the

    source-receiver distance.

    On the other hand, the second case of study (sidestairs) presents other difficulties in addition to the

    background noise.

    The high reverberation time impairs the correct

    comprehension of speech at a distance not far from

    the source. One way to solve this problem is to place

    absorbent material on the walls that acts mainly in the

    mid-frequency range, so as to reduce reverberant

    sound energy and improve speech intelligibility.

    As in the previous case, the stairs are a high

    traffic of people, so that the noise produced by human

    activity itself also impairs the understanding of the

    spoken message. Properly calculating the amount of

    absorbent material to be placed, the background noisewill decrease as it is raised also due to the high

    reverberation energy.

    9. REFERENCES[1] Houtgast, T. and Steeneken, H.J.M. (1971),

    "Evaluation of Speech Transmission Channels by

    Using Artificial Signals", Acustica 25, 355-367.

    [2] Steeneken, H.J.M. and Houtgast, T. and (1980),

    "A physical method for measuring speech-

    transmission quality", J. Acoust. Soc. Am 67, 318-326.

    [3]http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm.

    [4] IEC 60268-16. Sound system equipment . Part

    16: Objective rating of speech intelligibility by

    speech transmission index.

    [5] Carrion Isbert, Antoni (1998), Diseo acstico

    de espacios arquitectnicos. P.407

    [6] Farina, Angelo. STI - Speech Transmission

    Index. Aurora Plugins web forum.

    [7] Carrion Isbert, Antoni (1998), Diseo acsticode espacios arquitectnicos. P.184

    http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm