speech intelligibility
TRANSCRIPT
-
7/27/2019 Speech Intelligibility
1/6
1
Acoustics Instruments and Measurements June 2013, Caseros, Buenos Aires Province, Argentina
SPEECH TRANSMISSION INDEX MEASUREMENTS
AGUSTN Y. ARIAS 1
1Universidad Nacional de Tres de Febrero, Buenos Aires, Argentina.
1. INTRODUCTIONThe STI was developed in the early 70s by
Houtgast and Steeneken as an objective measure of
speech transmission quality [1][2]. One of the
advantages of the STI is its wide range of application
areas. The STI is extensively used in room acoustics,for instance to assess intelligibility in auditoria,
churches and conference rooms. But the STI is also
applied to telecommunication channels, such as
(mobile) telephone lines and radio transmissions.
The STI is based on that the information in speech
is represented acoustically in the form of amplitudemodulations. The human speech is essentially a
sequence of modulated tonal and noise-like sounds. If
portion of these modulations is lost then the
intelligibility will decrease. The Modulation Transfer
Function, which can be computed or measured,
expresses loss and preservation of modulations. The
STI is calculated directly from the ModulationTransfer Function.
In this report the procedures employed to obtain
the STI values from measuring two different speech
situations are presented. In addition, the
Definition (D50) parameter is calculated so as to
obtain and evaluate the speech intelligibility with a
different method.
The measurements were performed inside of the
National University of Tres de Febrero, Caseros,
Buenos Aires, Argentina.
2. ABRIEF DESCRIPTION OF THE STI ANDTHE DEFINITION DAccording to Steeneken and Houtgast, the
determination of the STI-values is based on
measuring the reduction of the signal modulationbetween the sound source and the measurement
position in octave center frequencies of 125 to 8000
Hz. They proceeded on the assumption that not only
reverberation and noise reduce the intelligibility of
speech, but generally all external signals or signal
changes that occur on the path from source tolistener. For ascertaining this influence they employ
the Modulation Transfer Function (MTF), shown in
Figure 1, for acoustical purposes. The available
useful signal S (signal) is put into relation with the
prevailing interfering signal N (noise). Thedetermined modulation reduction factor m(f) is a
factor that characterizes the interference with speech
intelligibility:
f: modulation frequency in HzRT: reverberation time in s
S/N: signal/noise ratio in dB
Figure 1. Modulation transfer function: input/outputcomparison.
The MTF of a sound transmission path can be
determined in various ways, the principle being the
derivation of the modulation reduction factor from
the comparison of the intensity modulations at the
output and the input to the path. Thus, a) speech
signals, or b) the impulse response, or c) special testsignals can be used. In this report, a set of
Phonetically Balanced Word Lists (PB)[3] in Spanish
was used as the speech signals, as it is describe in
below.
-
7/27/2019 Speech Intelligibility
2/6
2
The MTF is determined for the range of relevant
frequencies present in the envelope of natural speechsignals. The relevant range for these modulation
frequencies extends from 0,63 to 12,5 Hz in 14 one-
third octave bands. So, each octave band has 14
modulation frequencies, resulting in a total of 98
values, as can be seen in Table 1.The STI calculation methods using the MTI and
the specific male and female weighting factors are
defined in the standard IEC 60268-16 (Sound
system equipment. Part 16: Objective rating of
speech intelligibility by speech transmission
index)[4].
Table 1. Frequencies for the STI method.
The STI values may vary between 0 and 1,indicating the degree to which a transmission channel
degrades speech intelligibility. This means that
perfectly intelligible speech, when transferred
through a channel with an associated STI of 1, will
remain perfectly intelligible. The closer the STI valueapproaches zero, the more information is lost.
In the other hand, the acoustical parameter D50
(Definition) is defined as the ratio of the sound
energy that arrives to the receiver position in the first
50ms and the total sound energy received[5]:
: Instantaneous sound pressure in the impulse
response.
The higher the value, the better the speech
intelligibility and loudness at the point of reception.
3. MEASUREMENTS POSITIONSWith the objective to perform the measurements
under realistic speech communication conditions, two
different areas within the University were chosen.The first one corresponds to the second and the first
floor halls, placing the sound source in the secondfloor and two receiver positions (microphones) in the
first floor (Figure 2). The second one corresponds to
the side stairs area (Figure 3).
Figure 2. 1 and 2 floor halls. Microphone and soundsource location
Figure 3. Stairs. Microphones and sound source location
4. EQUIPMENT EMPLOYEDThe following list describes the equipment
employed to perform the impulse responses
measurements:
KRK Rokit8 loudspeaker used as sound
source to reproduce Log-Sine Sweep, andspeech signals.
Sound Level Meter Svantek 959 class one
with its calibrator
-
7/27/2019 Speech Intelligibility
3/6
3
Two DPA 4060 measurement microphones
Notebook HP Pavilion
MOTU audio interface
Aurora Plugins for Adobe Audition
The Aurora Plugins for Adobe Audition were
used to generate the Log-Sine sweep and to post-process the recordings with the STI module.
5. MEASUREMENTS PROCEDUREThe Aurora plugins computes the STI according
to the standard IEC 60268-16. To accomplish with
the requirements impose by the standard, there werecarried out a measurement of a "noiseless" impulse
response with a Log-Sine Sweep, and separate
measurements of the octave spectra of the speech
signal and of the background noise.
The speech signal consists in a set of twenty-five
Phonetically Balanced Word Lists (PB) in Spanish:
Table 2. Phonetically Balanced Word Lists (Spanish)
Lastre Sexto Suela Cine Pera
Moldes Letra Diosa Vega Fina
Menta Surco Piano Dina Tero
Cinco Selva Duque Kilo Beca
Persa Cieno Milla Duna Reno
The signals level of the previously list of PB
words were calibrated at normal (60 dBA), raised (66
dBA) and loud (72 dBA), measuring those Leq valuesat 1 m of the loudspeaker as shown in Figure 4.
Figure 4. Leq calibration of the PB words signal
The three signals were reproduced through the
loudspeaker in both situations described previously
(second and first floor - side stairs) and recorded.
These are the Speech+Noise measurementsThen, the IR measurement where performed in
the same microphones and loudspeaker positions
employing the Log-Sine sweep signal (1 minute ofduration, 80-8000Hz). This measurement is necessary
to obtain the reverberation time in the microphone
position. In addition, Aurora requires a calibration
signal to calibrate the SPL results. For this rea son, itwas recorded a 1Khz-94dB signal using the Svantek
calibrator. Finally, a one minute background noise
measurement was performed at the microphone
position. So, there are six recordings for each
microphone (receiver) position in the two situations:three of the speech signals (three different Leqs), oneLog-Sine sweep, one calibration signal and one
background noise.
6. USE OF THE STI MODULE IN AURORAThe STI module of Aurora is shown in Figure 6:
Figure 5. STI module of Aurora plugins.
As shown in Figure 5, all the signals explained
above are required to obtain STI [6].
First, the calibration signal is loaded to calibrate
the amplitude scale of the STI module. It must bespecified the Leq value of that signal (94dB).
Then, the background signal is loaded and stored
as Noise.
Later, the speech signal is loaded. As this
measurement was performed in presence of the
background noise, it must be stored as Sig+N. In
this way, the module will calculate properly the
signal, subtracting the noise which was alreadyestimated from the previous step.
Finally, the measured impulse response is
processed for computing the STI with the button
Compute STI.
7. RESULTS AND OBSERVATIONSTo perform the results evaluation it is necessary
to know the references values for the STI and the D 50
parameters. These are listed in Table 3 for the STI.
The D50 preferred values must be > 40%. This
parameter is commonly used to characterize theaters,where the preferred value must be > 50%, so it can be
applied to these cases with a more permissive
limit[7].
-
7/27/2019 Speech Intelligibility
4/6
4
Table 3. Preferred values for STI and D50parameters
Intelligibility
RatingSTI D50
Excellent > 0,75
> 40%
Good 0,60 - 0,75
Fair 0,45 - 0,60Poor 0,30 - 0,45
Bad < 0,30
7.1. Second and First floor hallsFigure 6 shows the NC curves and the results
obtained of the background noise measurements.
Figure 6. Background noise results and the corresponding
NC profile assigned.
The NC profile assigned is NC-50. It can be
appreciated that in mid-frequencies the background
noise has considerable energy. This is related to the
human activity (people talking, professors teaching,
etc.) at the time when the measurements were
performed.The STI results for Male voice are presented in
Table 4.
Table 4. STI results for the First speech situation
Speech Leq STI Male
Receiver Position #1
60dB 0,167
66dB 0,395
72dB 0,519
Receiver Position #2
60dB 0,104
66dB 0,39
72dB 0,495
It can be observed that for the 60dB speech Leq
the STI values are extremely low which indicates badspeech intelligibility. This is consistent with what
happens in reality, as the distance between source and
receiver is approximately 7 meters and both were
positioned on different floors. Moreover, as detailed
above, the measurement location is a space ofconstant transit of persons, which results remarkably
deteriorated. Only the STI value obtained with 72 dB
signal indicates that it achieves an acceptable degree
of compression.
The D50 results are shown in Figures 7 and 8 forboth microphone positions. It can be observed thatthe results are practically the same, and both of them
indicates a good speech intelligibility because the
results are above 60%, when the minimum value
recommended is about 40%. It is important to remark
that these results were obtained from the impulse
response measurements employing the Log-Sine
sweep. They are too much dependent of the sound
pressure level generated by the sound source as it is
defined in Eq.(2).
Figure 7. D50 [%] results. Microphone position #1
Figure 8. D50 [%] results. Microphone position #2
7.2.Side stairs areaFigure 9 shows the NC curves and the results
obtained of the background noise measurements.
-
7/27/2019 Speech Intelligibility
5/6
5
Figure 9. NC curves and background noise results.
The NC profile assigned is NC-50. The
background noise results for the stairs positions are
very similar to those obtained in the floors halls, but
in this situation, the distance between source an
receiver positions is too small, approximately 3meters. Another important consideration is that the
stairs area has very reflective limit surfaces, resulting
in a high predominance of the reverberant field in the
receiver position, corresponding to a high
reverberation time. For these reason it is expected a
better STI results in terms of sound pressure level of
the speech signal, but attenuated for a raise in the
reverberation time compared with the previoussituation.
The STI results for Male voice are presented in
Table 5.
Table 5. STI results for the Second speech situation
Speech Leq STI Male
60dB 0,441
66dB 0,445
72dB 0,472
It can be observed that the results are very similar
in all the speech sound levels, and the speech
transmission can be classified barely as Fair. The
reverberation time is the main factor that injury theSTI because the receiver position was placed in the
reverberant field of the enclosure. The T30 values
measured with the Log-Sine sweep in the receiver
positions of the stairs and the first floor hall are
shown in Figure 10. It can be observed that the
predominance of this parameter in the stairs is higher
and it is reflected in worst STI results as it is defined
in Eq(1) for the MTF. Moreover, the higher values of
the T30 belong to the mid-frequencies spectrum,which is where the human voice predominates.
Figure 10. T30 comparison for stairs and the first floor hall.
The D50 results are shown in Figure 11. It is
notable that the results are always below the 40%,
which indicates a poor intelligibility consistent withthe STI reduction in comparison with the previous
situation. Again, the main factor that impairs the
speech intelligibility performance is the reverberant
sound energy, which predominates over the direct
sound energy in the receiver position.
Figure 11. D50 [%] results.
8. CONCLUSIONSThe results obtained in both cases are of interest
to analyze acoustic solutions that improve speechintelligibility according to needs.
On one hand, the results obtained for the first
situation of measurement (the second and the firstfloors) indicate that it is necessary to raise the
loudness of the voice to transmit the words and be
understood by the receiver. The main factor affecting
the intelligibility is the distance between the sound
source and the receiver point (approx. 7m.) which is
reasonable with expectations. But the results are alsogreatly influenced by the level of background noise,
which ranked NC-55 profile exceeding the
recommended value of NC-40 for this type of
enclosure. The problem to reduce background noisevalue is that it is caused by human activity itself,
which is difficult to attenuate.
-
7/27/2019 Speech Intelligibility
6/6
6
But it is also important to note that the required
voice loudness increase is not too significant but mayinterfere negatively in teaching classes, so the best
solution to this problem is to avoid this kind of
conversation between different floors, or decrease the
source-receiver distance.
On the other hand, the second case of study (sidestairs) presents other difficulties in addition to the
background noise.
The high reverberation time impairs the correct
comprehension of speech at a distance not far from
the source. One way to solve this problem is to place
absorbent material on the walls that acts mainly in the
mid-frequency range, so as to reduce reverberant
sound energy and improve speech intelligibility.
As in the previous case, the stairs are a high
traffic of people, so that the noise produced by human
activity itself also impairs the understanding of the
spoken message. Properly calculating the amount of
absorbent material to be placed, the background noisewill decrease as it is raised also due to the high
reverberation energy.
9. REFERENCES[1] Houtgast, T. and Steeneken, H.J.M. (1971),
"Evaluation of Speech Transmission Channels by
Using Artificial Signals", Acustica 25, 355-367.
[2] Steeneken, H.J.M. and Houtgast, T. and (1980),
"A physical method for measuring speech-
transmission quality", J. Acoust. Soc. Am 67, 318-326.
[3]http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm.
[4] IEC 60268-16. Sound system equipment . Part
16: Objective rating of speech intelligibility by
speech transmission index.
[5] Carrion Isbert, Antoni (1998), Diseo acstico
de espacios arquitectnicos. P.407
[6] Farina, Angelo. STI - Speech Transmission
Index. Aurora Plugins web forum.
[7] Carrion Isbert, Antoni (1998), Diseo acsticode espacios arquitectnicos. P.184
http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm