speech intelligibility

7/27/2019 Speech Intelligibility

1/6

1

Acoustics Instruments and Measurements June 2013, Caseros, Buenos Aires Province, Argentina

SPEECH TRANSMISSION INDEX MEASUREMENTS

AGUSTN Y. ARIAS 1

1Universidad Nacional de Tres de Febrero, Buenos Aires, Argentina.

[email protected]

1. INTRODUCTIONThe STI was developed in the early 70s by

Houtgast and Steeneken as an objective measure of

speech transmission quality [1][2]. One of the

advantages of the STI is its wide range of application

areas. The STI is extensively used in room acoustics,for instance to assess intelligibility in auditoria,

churches and conference rooms. But the STI is also

applied to telecommunication channels, such as

(mobile) telephone lines and radio transmissions.

The STI is based on that the information in speech

is represented acoustically in the form of amplitudemodulations. The human speech is essentially a

sequence of modulated tonal and noise-like sounds. If

portion of these modulations is lost then the

intelligibility will decrease. The Modulation Transfer

Function, which can be computed or measured,

expresses loss and preservation of modulations. The

STI is calculated directly from the ModulationTransfer Function.

In this report the procedures employed to obtain

the STI values from measuring two different speech

situations are presented. In addition, the

Definition (D50) parameter is calculated so as to

obtain and evaluate the speech intelligibility with a

different method.

The measurements were performed inside of the

National University of Tres de Febrero, Caseros,

Buenos Aires, Argentina.

2. ABRIEF DESCRIPTION OF THE STI ANDTHE DEFINITION DAccording to Steeneken and Houtgast, the

determination of the STI-values is based on

measuring the reduction of the signal modulationbetween the sound source and the measurement

position in octave center frequencies of 125 to 8000

Hz. They proceeded on the assumption that not only

reverberation and noise reduce the intelligibility of

speech, but generally all external signals or signal

changes that occur on the path from source tolistener. For ascertaining this influence they employ

the Modulation Transfer Function (MTF), shown in

Figure 1, for acoustical purposes. The available

useful signal S (signal) is put into relation with the

prevailing interfering signal N (noise). Thedetermined modulation reduction factor m(f) is a

factor that characterizes the interference with speech

intelligibility:

f: modulation frequency in HzRT: reverberation time in s

S/N: signal/noise ratio in dB

Figure 1. Modulation transfer function: input/outputcomparison.

The MTF of a sound transmission path can be

determined in various ways, the principle being the

derivation of the modulation reduction factor from

the comparison of the intensity modulations at the

output and the input to the path. Thus, a) speech

signals, or b) the impulse response, or c) special testsignals can be used. In this report, a set of

Phonetically Balanced Word Lists (PB)[3] in Spanish

was used as the speech signals, as it is describe in

below.


2/6

2

The MTF is determined for the range of relevant

frequencies present in the envelope of natural speechsignals. The relevant range for these modulation

frequencies extends from 0,63 to 12,5 Hz in 14 one-

third octave bands. So, each octave band has 14

modulation frequencies, resulting in a total of 98

values, as can be seen in Table 1.The STI calculation methods using the MTI and

the specific male and female weighting factors are

defined in the standard IEC 60268-16 (Sound

system equipment. Part 16: Objective rating of

speech intelligibility by speech transmission

index)[4].

Table 1. Frequencies for the STI method.

The STI values may vary between 0 and 1,indicating the degree to which a transmission channel

degrades speech intelligibility. This means that

perfectly intelligible speech, when transferred

through a channel with an associated STI of 1, will

remain perfectly intelligible. The closer the STI valueapproaches zero, the more information is lost.

In the other hand, the acoustical parameter D50

(Definition) is defined as the ratio of the sound

energy that arrives to the receiver position in the first

50ms and the total sound energy received[5]:

: Instantaneous sound pressure in the impulse

response.

The higher the value, the better the speech

intelligibility and loudness at the point of reception.

3. MEASUREMENTS POSITIONSWith the objective to perform the measurements

under realistic speech communication conditions, two

different areas within the University were chosen.The first one corresponds to the second and the first

floor halls, placing the sound source in the secondfloor and two receiver positions (microphones) in the

first floor (Figure 2). The second one corresponds to

the side stairs area (Figure 3).

Figure 2. 1 and 2 floor halls. Microphone and soundsource location

Figure 3. Stairs. Microphones and sound source location

4. EQUIPMENT EMPLOYEDThe following list describes the equipment

employed to perform the impulse responses

measurements:

KRK Rokit8 loudspeaker used as sound

source to reproduce Log-Sine Sweep, andspeech signals.

Sound Level Meter Svantek 959 class one

with its calibrator


3/6

3

Two DPA 4060 measurement microphones

Notebook HP Pavilion

MOTU audio interface

Aurora Plugins for Adobe Audition

The Aurora Plugins for Adobe Audition were

used to generate the Log-Sine sweep and to post-process the recordings with the STI module.

5. MEASUREMENTS PROCEDUREThe Aurora plugins computes the STI according

to the standard IEC 60268-16. To accomplish with

the requirements impose by the standard, there werecarried out a measurement of a "noiseless" impulse

response with a Log-Sine Sweep, and separate

measurements of the octave spectra of the speech

signal and of the background noise.

The speech signal consists in a set of twenty-five

Phonetically Balanced Word Lists (PB) in Spanish:

Table 2. Phonetically Balanced Word Lists (Spanish)

Lastre Sexto Suela Cine Pera

Moldes Letra Diosa Vega Fina

Menta Surco Piano Dina Tero

Cinco Selva Duque Kilo Beca

Persa Cieno Milla Duna Reno

The signals level of the previously list of PB

words were calibrated at normal (60 dBA), raised (66

dBA) and loud (72 dBA), measuring those Leq valuesat 1 m of the loudspeaker as shown in Figure 4.

Figure 4. Leq calibration of the PB words signal

The three signals were reproduced through the

loudspeaker in both situations described previously

(second and first floor - side stairs) and recorded.

These are the Speech+Noise measurementsThen, the IR measurement where performed in

the same microphones and loudspeaker positions

employing the Log-Sine sweep signal (1 minute ofduration, 80-8000Hz). This measurement is necessary

to obtain the reverberation time in the microphone

position. In addition, Aurora requires a calibration

signal to calibrate the SPL results. For this rea son, itwas recorded a 1Khz-94dB signal using the Svantek

calibrator. Finally, a one minute background noise

measurement was performed at the microphone

position. So, there are six recordings for each

microphone (receiver) position in the two situations:three of the speech signals (three different Leqs), oneLog-Sine sweep, one calibration signal and one

background noise.

6. USE OF THE STI MODULE IN AURORAThe STI module of Aurora is shown in Figure 6:

Figure 5. STI module of Aurora plugins.

As shown in Figure 5, all the signals explained

above are required to obtain STI [6].

First, the calibration signal is loaded to calibrate

the amplitude scale of the STI module. It must bespecified the Leq value of that signal (94dB).

Then, the background signal is loaded and stored

as Noise.

Later, the speech signal is loaded. As this

measurement was performed in presence of the

background noise, it must be stored as Sig+N. In

this way, the module will calculate properly the

signal, subtracting the noise which was alreadyestimated from the previous step.

Finally, the measured impulse response is

processed for computing the STI with the button

Compute STI.

7. RESULTS AND OBSERVATIONSTo perform the results evaluation it is necessary

to know the references values for the STI and the D 50

parameters. These are listed in Table 3 for the STI.

The D50 preferred values must be > 40%. This

parameter is commonly used to characterize theaters,where the preferred value must be > 50%, so it can be

applied to these cases with a more permissive

limit[7].


4/6

4

Table 3. Preferred values for STI and D50parameters

Intelligibility

RatingSTI D50

Excellent > 0,75

> 40%

Good 0,60 - 0,75

Fair 0,45 - 0,60Poor 0,30 - 0,45

Bad < 0,30

7.1. Second and First floor hallsFigure 6 shows the NC curves and the results

obtained of the background noise measurements.

Figure 6. Background noise results and the corresponding

NC profile assigned.

The NC profile assigned is NC-50. It can be

appreciated that in mid-frequencies the background

noise has considerable energy. This is related to the

human activity (people talking, professors teaching,

etc.) at the time when the measurements were

performed.The STI results for Male voice are presented in

Table 4.

Table 4. STI results for the First speech situation

Speech Leq STI Male

Receiver Position #1

60dB 0,167

66dB 0,395

72dB 0,519

Receiver Position #2

60dB 0,104

66dB 0,39

72dB 0,495

It can be observed that for the 60dB speech Leq

the STI values are extremely low which indicates badspeech intelligibility. This is consistent with what

happens in reality, as the distance between source and

receiver is approximately 7 meters and both were

positioned on different floors. Moreover, as detailed

above, the measurement location is a space ofconstant transit of persons, which results remarkably

deteriorated. Only the STI value obtained with 72 dB

signal indicates that it achieves an acceptable degree

of compression.

The D50 results are shown in Figures 7 and 8 forboth microphone positions. It can be observed thatthe results are practically the same, and both of them

indicates a good speech intelligibility because the

results are above 60%, when the minimum value

recommended is about 40%. It is important to remark

that these results were obtained from the impulse

response measurements employing the Log-Sine

sweep. They are too much dependent of the sound

pressure level generated by the sound source as it is

defined in Eq.(2).

Figure 7. D50 [%] results. Microphone position #1

Figure 8. D50 [%] results. Microphone position #2

7.2.Side stairs areaFigure 9 shows the NC curves and the results

obtained of the background noise measurements.


5/6

5

Figure 9. NC curves and background noise results.

The NC profile assigned is NC-50. The

background noise results for the stairs positions are

very similar to those obtained in the floors halls, but

in this situation, the distance between source an

receiver positions is too small, approximately 3meters. Another important consideration is that the

stairs area has very reflective limit surfaces, resulting

in a high predominance of the reverberant field in the

receiver position, corresponding to a high

reverberation time. For these reason it is expected a

better STI results in terms of sound pressure level of

the speech signal, but attenuated for a raise in the

reverberation time compared with the previoussituation.

The STI results for Male voice are presented in

Table 5.

Table 5. STI results for the Second speech situation

Speech Leq STI Male

60dB 0,441

66dB 0,445

72dB 0,472

It can be observed that the results are very similar

in all the speech sound levels, and the speech

transmission can be classified barely as Fair. The

reverberation time is the main factor that injury theSTI because the receiver position was placed in the

reverberant field of the enclosure. The T30 values

measured with the Log-Sine sweep in the receiver

positions of the stairs and the first floor hall are

shown in Figure 10. It can be observed that the

predominance of this parameter in the stairs is higher

and it is reflected in worst STI results as it is defined

in Eq(1) for the MTF. Moreover, the higher values of

the T30 belong to the mid-frequencies spectrum,which is where the human voice predominates.

Figure 10. T30 comparison for stairs and the first floor hall.

The D50 results are shown in Figure 11. It is

notable that the results are always below the 40%,

which indicates a poor intelligibility consistent withthe STI reduction in comparison with the previous

situation. Again, the main factor that impairs the

speech intelligibility performance is the reverberant

sound energy, which predominates over the direct

sound energy in the receiver position.

Figure 11. D50 [%] results.

8. CONCLUSIONSThe results obtained in both cases are of interest

to analyze acoustic solutions that improve speechintelligibility according to needs.

On one hand, the results obtained for the first

situation of measurement (the second and the firstfloors) indicate that it is necessary to raise the

loudness of the voice to transmit the words and be

understood by the receiver. The main factor affecting

the intelligibility is the distance between the sound

source and the receiver point (approx. 7m.) which is

reasonable with expectations. But the results are alsogreatly influenced by the level of background noise,

which ranked NC-55 profile exceeding the

recommended value of NC-40 for this type of

enclosure. The problem to reduce background noisevalue is that it is caused by human activity itself,

which is difficult to attenuate.


6/6

6

But it is also important to note that the required

voice loudness increase is not too significant but mayinterfere negatively in teaching classes, so the best

solution to this problem is to avoid this kind of

conversation between different floors, or decrease the

source-receiver distance.

On the other hand, the second case of study (sidestairs) presents other difficulties in addition to the

background noise.

The high reverberation time impairs the correct

comprehension of speech at a distance not far from

the source. One way to solve this problem is to place

absorbent material on the walls that acts mainly in the

mid-frequency range, so as to reduce reverberant

sound energy and improve speech intelligibility.

As in the previous case, the stairs are a high

traffic of people, so that the noise produced by human

activity itself also impairs the understanding of the

spoken message. Properly calculating the amount of

absorbent material to be placed, the background noisewill decrease as it is raised also due to the high

reverberation energy.

9. REFERENCES[1] Houtgast, T. and Steeneken, H.J.M. (1971),

"Evaluation of Speech Transmission Channels by

Using Artificial Signals", Acustica 25, 355-367.

[2] Steeneken, H.J.M. and Houtgast, T. and (1980),

"A physical method for measuring speech-

transmission quality", J. Acoust. Soc. Am 67, 318-326.

[3]http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm.

[4] IEC 60268-16. Sound system equipment . Part

16: Objective rating of speech intelligibility by

speech transmission index.

[5] Carrion Isbert, Antoni (1998), Diseo acstico

de espacios arquitectnicos. P.407

[6] Farina, Angelo. STI - Speech Transmission

Index. Aurora Plugins web forum.

[7] Carrion Isbert, Antoni (1998), Diseo acsticode espacios arquitectnicos. P.184
http://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htmhttp://sisbib.unmsm.edu.pe/bibvirtual/libros/medicina/cirugia/tomo_v/exp_audicion.htm

speech intelligibility

Documents