increasing air traffic control simulations realism through

7
Increasing Air Traffic Control simulations realism through voice transformation Mathieu Serrurier, Sylvain Neswadba, Jean-Paul Imbert To cite this version: Mathieu Serrurier, Sylvain Neswadba, Jean-Paul Imbert. Increasing Air Traffic Control sim- ulations realism through voice transformation. AudioMostly 2009, Conference on Interaction with Sound, Sep 2009, Glasgow, United Kingdom. <hal-01166944> HAL Id: hal-01166944 https://hal-enac.archives-ouvertes.fr/hal-01166944 Submitted on 23 Jun 2015 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es. brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by Scientific Publications of the University of Toulouse II Le Mirail

Upload: others

Post on 28-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Increasing Air Traffic Control simulations realism

through voice transformation

Mathieu Serrurier, Sylvain Neswadba, Jean-Paul Imbert

To cite this version:

Mathieu Serrurier, Sylvain Neswadba, Jean-Paul Imbert. Increasing Air Traffic Control sim-ulations realism through voice transformation. AudioMostly 2009, Conference on Interactionwith Sound, Sep 2009, Glasgow, United Kingdom. <hal-01166944>

HAL Id: hal-01166944

https://hal-enac.archives-ouvertes.fr/hal-01166944

Submitted on 23 Jun 2015

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

brought to you by COREView metadata, citation and similar papers at core.ac.uk

provided by Scientific Publications of the University of Toulouse II Le Mirail

Increasing Air Traffic Control simulations realism through voicetransformation

Mathieu SerrurierIRIT toulouse France

[email protected]

Sylvain NeswadbaDSNA RD Toulouse [email protected]

Jean-Paul ImbertDSNA RD Toulouse France

[email protected]

Abstract. Improving realism in simulations is a critical issue. In some air traffic control (ATC) simulationswe use a pseudo-pilot which pilots up to fifteen aircraft. Thus, having the same voice for different aircraft in thecase of pseudo-pilot decreases the realism of the simulation and may be confusing for the controllers especially instudy context. In research context, a virtual aircraft piloted in a flight simulator is sometime needed in additionto the pseudo pilot. For simulation needs, the flight simulator aircraft must be merged with pseudo-pilot’s one.This is not possible without voice modification since the controller can distinguish the pilot voice. In this paperwe propose a method for transforming the voices of the pilot and the pseudo-pilot in order to have one particularvoice and cabin noise for each aircraft. The two experiments that have been conducted show that, through ourvoice modification algorithm, the realism of the simulation is enhanced and the voice biases disappears.

1 IntroductionGaver shows in [2] that sounds help the engagementof the users with a system. This is especially true insimulation systems where immersion highly dependson their realism and their capacity to reproduce thetarget environement. In air traffic control (ATC) thesounds essentially come from the communicationswith the pilots.

Simulations are a critical part of ATC. They serve atleast three purposes. The first one is the training of thestudent controllers. The second one is the evaluationof new protocols or maneuvers. The last one is thetesting of new interfaces or softwares. Three humanactors perform the simulation :

• Controllers : The controllers are in charge of avirtual traffic. The conditions are as close as pos-sible to the real ones. As in reality the voice is theunique way of communication among controllersand aircraft.

• Pseudo-pilots : Since it is too costly to havea pilot in a simulator for each virtual aircraft,pseudo-pilots are used. A pseudo pilot is a humanoperator that pilots simultaneously up to fifteenaircraft. Instructions for the aircraft are transmit-ted to the traffic simulator through a dedicatedHMI (human machine interface). The pseudo-pilot is in charge of the voice communication of

all the aircraft he pilots.

• Pilot in a flight simulator : It is usual, ina research context, to have one aircraft pilotedthrough a flight simulator in order to increase therealism of the simulation or test particular scenar-ios, in this case, the simulation scenario is focusedon the aircraft controlled by the pilot. The pilotin the simulator is only in charge of the voice com-munication of unique the aircraft he pilots.

The architecture of an ATC simulation is presentedin Figure 1. Controllers, pilot and pseudo-pilotare interacting through a traffic simulator and asimulated radio based on Voice over IP (VoIP)) .Thecommunications between controllers and pilots orpseudo-pilots are entirely vocal. The signal is encodedin 8Khz and 8bits PCM using µ-law optimisation.It is obvious that the voice is a crucial issue in ATC

simulations. The architecture described previouslyintroduces biases in the simulation. First of all, havingthe same voice for all the aircraft piloted by a givenpseudo-pilot may be very confusing and damagingfor the controller. Therefore, the aircraft that ispiloted by a pilot in a simulator is the only one thatis associated with a unique voice. This makes iteasily identified by the controllers. Controllers willthus focus on this particular aircraft since difficultiesin a simulation often come from the aircraft pilotedthrough the flight simulator. Finally, the quality of

[Increasing Air Traffic Control simulations realism through voice transformation]

Figure 1: ATC simulation architecture

the communication by VoIP is greater than the qualityof the signal in real life where radios are limited to5khz frequency and are disturbed by VHF (Very HighFrequencie) and cabin noises.

In this paper we propose to morph the voice of thepseudo-pilot and the pilot in order to have a distinctvoice for each aircraft. Voice morphing has beenstudied essentially for two purposes. The first one isfor vocal recognition [5]. The system of recognitionis tuned for a specific reference voice. The voicemorphing is then used to transform the voice of theuser into the reference one in order to perform therecognition. These methods require much informationon user and reference voice. Voice morphing canalso be used for masking voice for security purpose[4]. After being morphed the voice has to be nomore identifiable as the owner’s one. The obtainedvoice is no more human and therefore this methodis not suitable for our case. We propose to use amethod named formant shifting which changes thefundamental characteristics of the voice. The methodrequires few information on the initial and the targetvoices. The only constraint of the morphing is toobtain a voice that is human. We also modify thesignal in order to limit the frequency to 5khz and addcabin and VHF noises. Speech synthesis has beenconsidered in order to have a different voice for eachaircraft. However, this approach has been given updue to a lack of realism.

The paper is organized as follows. We first describethe algorithm of voice morphing and the other treat-ment made on the signal. Then two experiments are

presented. In the first one, we show that by increasingthe realism of the simulation we increase the perfor-mance of the controllers by allowing them to quicklyidentify the calling aircraft. The second experimentshows that the voice morphing algorithm enables tomask the voice of the pilot in the simulator in order toincrease the difficulty to identify him.

2 Signal processing

As pointed out in the introduction the voice is trans-mitted through a VoIP protocol. The signal is pro-cessed at two stages. In the frequency space, obtainedafter fast Fourier transformation, the formant shiftingalgorithm is applied in order to change the characteris-tic parameters of the voice. The signal is impaired byrestricting the frequency and clipping the magnitude.In the time dimension, cabin noises are added to thesignal. Due to the manipulation in the frequency space,the signal has to be resampled. As a tradeoff betweenperformance and quality, the signal is resampled 16times. Each modification depends on some parametersthat constitute a voice modification profile. A differentmodification profile is assigned for each aircraft, eventhe one controlled by the pilot on the simulator. Theprocessing of the signal is made just before the VoIPstage. The pseudo-pilot selects the aircraft that needsto answer the controllers, then the voice is automat-ically transformed according to the parameters asso-ciated with the aircraft. The software is implementedin JAVA. It runs without specific hardware, directly onthe machine that runs the pseudo pilots, usually a dualcore computer. The processing is made in real time.

2.1 Formant shifting

The formant shifting is the most important part of thevoice transformation process. Formant shifting is anextension of the pitch shifting algorithm which shiftsall the signal in the frequency space. Formant shift-ing shifts independently each formant in the frequencyspace. Formant shifting allows us to create more dif-ferent voices than pitch shifting. The formants arehigh energy areas in the frequency spectrum. Roughlyspeaking, they are the peaks in the signal. The first for-mant is usually bounded between 90 and 450Hz. Thisis the formant that changes when people are singingfor instance. It is usual to consider that the print ofa voice is defined by the first four formant frequencies(f1,f2,f3,and f4). The formant shifting algorithm con-sists in shifting the initial formant of the signal intosome target formant that corresponds to other realis-tic voice formants. Let us consider the following ini-tial voice v with f1 = 300hz, f2 = 900, f3 = 1800,f4 = 2900 and the target voice v′ with f

′1 = 250,

- 2 -

[Increasing Air Traffic Control simulations realism through voice transformation]

f′2 = 1000, f

′3 = 1650 and f

′4 = 3000. When shifting

the formant the signal in the [0, f1] frequency interval

will be linearly compressed in the [0, f′1] frequency in-

terval, the signal in the [f1, f2] frequency interval will

be linearly expanded in the [f′1, f

′2] frequency interval

and so on. The algorithm is based on the StephaneBernsee pitch shifting algorithm [1]The only preprocess step needed for formant shifting

Figure 2: Linear predictive coding for a voice sam-ple

is to determine the formants of the initial voice. Inorder to determine the position of the formants, weuse the linear predictive coding algorithm [3] (see fig.2). Linear predictive coding is used to represent thespectral envelope of a digital signal of speech in com-pressed form, using the information of a linear predic-tive model. Having the spectral envelope of the signal,the formants can be easily localized since they corre-spond to the local maximums of the envelope (see fig.3). Thus, each aircraft of the simulation will be asso-ciated to distinct target formants.

Figure 3: Formant detection for a voice sample

2.2 Signal processing

In order to have a digital signal more realistic, threeadditional operations are done in the frequency space :

• Signal equalization. In order to simulate theenvironment of the cockpit, the signal is equalizedaccording different possible equalization schemes.

• Frequency reduction. The frequencies upperthan 5Khz are removed according to the capacityof cockpit radio.

• Magnitude clipping. The magnitude of the sig-nal is clipped in order to simulate poor quality ra-dio. The amount of clipping is a parameter thatmay vary for each aircraft.

2.3 Cabin noises

In real world, different noises disturb the signal of thepilot voice. They can come from background, VHF orthe reactors of the plane which can be heard throw theradio. Each aircraft has a particular cabin noise that iseasily identifiable by the controller. In order to simu-late this, environmental sounds are added to the signalat the end of the process. These sounds have been pre-viously sampled and contain different reactors soundsand cockpit sounds. The sample and the intensity ofthe cabin noises differ for each aircraft.

3 ExperimentsIn order to check the effectiveness of our voice morph-ing algorithm, two experiments have been made. Inthe first one, we show that, by increasing the realismof the simulation, we increase the controller’s efficiencyfor identifying aircraft. In the second one we measurethe ability to hide the pilot’s voice during the simula-tion. The voice are modified by the following ways:

• Determine the profile of the speaker voice.

• Chosing randomly target formant in a restrictedrange (up tu 10

• Chosing randomly a cabin noise and its level

• Chosing randomly a signal equalization scheme(from a database), a frequency reduction and amagnitude clipping range.

Users use an headse for hearing sounds. No specifichardware and software are used.

3.1 Experiment 1

The hypothesis of the first experiments is that modify-ing the voice implies that the aircraft are easier identi-fiable by the controller. In real world, controllers iden-tify an aircraft through the callsign announced by thepilot together with the voice of the pilot and the cabinnoises.

- 3 -

[Increasing Air Traffic Control simulations realism through voice transformation]

3.1.1 Protocol

The experiment has been performed by 20 participantswithout known hearing problems. They are aged from20 years to 40 years. 4 are controllers. 16 are male,4 are female. They are not necessarily controllers orpilots. This is not a biais in our experiments since nocompetence in ATC are needed. The experiment focuson the capacity of a user to distinguish voices. Weconsider a simulation with five aircraft with distinctcallsigns (AirFrance 2035, Twinjet 07, BritAir 46EK, Fox Bravo X-ray). These aircraft are piloted bya unique pseudo-pilot. A set of sentences has beenrecorded for each aircraft. The goal of the subject isto associate as quickly as possible a sentence to thecorresponding aircraft. The callsign of the aircraftis pronounced in each sentence. The callsign can beindicated at the beginning, in the middle or at theend of the sentence, as in reality.

The experiment has two phases. In the first one, thevoice is not modified, and then the voice is the samefor all the sentences regardless to the aircraft. In thesecond one, the voice is modified according to a set ofmodification parameters (voice formant, cabin noises,equalizer, ...) that is associated to each aircraft. Thestarting phase is chosen randomly. These parametersdon’t change during the experiment. The order of thephrases is chosen randomly. Ten sentences are pro-nounced for each aircraft. Five labels corresponding toeach aircraft are presented to the subject. The labelsare in circles, all with the same size. The circles aredispatched regularly around the launch button. Whenthe subject clicks on launch button a sentence is played.The subject must then click as quickly as possible onthe label corresponding to the aircraft without makingany error. There is no need to listen all the sentenceto select a label. The time between the two clicks ismeasured. The position of the labels is fixed randomlyat the beginning and doesn’t change during the exper-iment. If the subject makes an error, the sentence isnot replayed. The GUI of the experiment is presentedin figure 4

3.1.2 Results

As expected, when the voice is not modified, thesubject needs to wait for of the callsign. When thevoices are modified, the aircraft can be recognizedby the voice or the cabin noises before the callsignis announced. With 10 sentences per aircraft thesubjects have time to memorize the signal parametersfor each aircraft and then can identify it more quickly.The results are summarized in the following table :

Figure 4: GUI of experiment 1

Results without modification with modification

Avg time in s 4.35 3.69

Error (%) 0.5 2.7

The average time for the phrases without modifi-cation and with modifications are respectively 4.35sand 3.69s. This corresponds to a performance gainequal to 18%. Student test (threshold=0.00025) showsthat the difference between the two average rates isstatistically significant. Some subjects refer that, eventhough they have identified the aircraft thanks to thevoice, they wait for confirmation with the callsign inorder to avoid error. The error rate increases from0.5% to 2.7%. This error rate is essentially due tothe goal of the experiment and remains acceptablefor an ATC simulation since controllers may ask forconfirmation of the callsign when they are not sure.Moreover, in real life, the poor quality of the radiosignal may lead to some error. The experiment doesnot allows us to determine to what extend the increaseof performance is due to the voice transformationor to the noise added. Even the cabin and radionoises are probably more easily identifiable, the voicetransformation is fundamental for the realism of thesimulation.

It is interesting to consider the progression of the av-erage time during the experiment. The figure 5 showsthe evolution of the average time for identification ona sliding window of size 15 (i.e. the average time forthe last 15 extracts for all the subjects). The first ob-servation that can be made is that average recognition

- 4 -

[Increasing Air Traffic Control simulations realism through voice transformation]

Figure 5: Average identification time (in s) evolu-tion among time for (sliding window of size 15)

time remains stationary when the voices are not mod-ified. This illustrates that there is no learning process.On the contrary, when considering the modified voices,the average idetification time decreases along the ex-periments. It shows that the subject learns to matcha callsign with a specific voice and cabin noise.

3.2 Experiment 2

The hypothesis we want to validate is the following :if we change with our algorithm the flight simulatorpilot voice and the pseudo-pilot voice in order to haveone different voice per aircraft, controllers are no moreable to determine which aircraft is piloted through theflight simulator. If the hypothesis is validated, it willavoid the bias of pilot identification.

3.2.1 Protocol

The experiment has been performed by 20 participantswithout known hearing problems. They are aged from20 years to 40 years. 4 are controllers. 16 are male,4 are female.They were not necessarily controllers.Eight people (1 female and 7 male, not necessarilypilot or pseudo-pilot) have recorded each 10 sentences,with the aeronautic phraseology. Some of them havetaken part in the experiment. The participants knewthe voice of people who had recorded the sentences.This is not a strong bias since it works againstour hypothesis and it is similar to the situationsencountered in ATC simulations.

The subjects are split into two groups. In the firstgroup, the voices are not modified. In the second groupall the voices are modified. The experiment set as fol-

Figure 6: GUI of experiment 2

lows. A sequence of five sentences is proposed to thesubject. Four of these sentences are from a pseudo-pilot (the same for the four sentences), one is from thepilot. For the second group, each sentence pronouncedby the pseudo-pilot is modified with different parame-ters (formant, equalizer, cabin noises, ...) . The voiceof the pilot in the simulator is modified too. The fivesentences are played randomly. After hearing all thevoices, the subject tries to identify the sentence thathas been pronounced by the pilot in the flight simu-lator. This consists of identifying the voice that haspronounced only one sentence. If the subject makesan error, the sentences are not replayed. There are28 sequences of 5 sentences. The GUI of the experi-ment is presented in figure 6. The choice of the pairpilot/pseudo-pilot is made randomly with the follow-ing constraints : a pair can only be chosen once, Asentence can be played two times, but not in the samesequence and with different modifications.

3.2.2 Results

The subjects in the group 1 and the group 2 havean average error rate equal to respectively 5.5% and46.5%. Even with modifications, the experimentshows that the female voice is easily identifiable. Theresults are summarized in the following table.

Error (%) Group 1 Group 2

with female voice 5.5 46.5

without female voice 7.1 56.2

Student’s test (threshold=0.00025) shows that thedifference between the two error rates is statisticallysignificant. As expected the error rate for group one is

- 5 -

[Increasing Air Traffic Control simulations realism through voice transformation]

close to 0. This is not surprising since without modifi-cations, the pilot voice is easily identifiable. The the-oretical maximum error rate for group two is 80% (itcorresponds to a random choice of the answer). Al-though this maximum is not reached in the experi-ment, the algorithm performs very well and tends tovalidate our hypothesis. The fact that the maximumis not reached can be explained by different reasons.First some parameters such as pronunciation rhythmcan not be modified by our algorithm. Female voiceis also easily identifiable. However, some biases in ourexperiment may not appear in real situation. Firstof all, in our experiment people that are not accus-tomed with aeronautic phraseology have some hesita-tions when reading sentences. Moreover, in air trafficsimulation, controllers do not focus on recognizing thepilot as in our experiment.

4 Discussion and perspectives

In this paper, we have proposed a method to increaserealism of air traffic simulation by modifying thevoice of the pseudo-pilots. The algorithm of voicemodification is based on the shifting of the character-istic parameters of the considered voice. The voicetransformation is made in real time and does notdisturb the simulation.

The first experiment is interesting in many ways.It illustrates the fact that we can use cabin noisesand voice to easily identify the aircraft. It is worthstressing that this increase of performance has beenobtained by decreasing the quality of the signal. Evenif the identification of the aircraft is not a criticalissue for simulation, by making it easier the workloadof controllers is decreased. Its also shows that theperformances of controllers can be increased byincreasing the realism sound context of the simulation.This increase of realism is very useful in the contextof simulation for student controllers.

The second experiment shows that the modificationof voice allows us to avoid the bias due to identificationof the flight simulator pilot. This also makes thesimulation less confusing for controllers since there isone different voice for each aircraft, as in real world.

The software is currently used for real simulations.The controllers judge the modified voice as realisticand report that it highly increases the realism of thesimulation. The participants of simulation report un-formally the following feedback:

• controllers told us that voice modification inducesa quicker and better immersion in the simulation.

• following a simulation with controllers whoweren’t aware of the voice modification, theycouldn’t say how many pseudo-pilots were in-volved (there was only one).

Voice tranformation indudes an extra cost for thepseudo-pilot. Indeed, it is mandatory to select theaircraft before speaking. When pseudo pilot interfaceallows for aircraft selection, as it is in our case, thepseudo-pilot reported that this cost remains acceptablein comparison with the improvement of the simulation.

Future work will consist in making the female voicesless indentifiable. On the other hand, it would be inter-esting to propose automatic tuning of voice parametersin order to have voices that are as different as possible.

References[1] S. Bernsee. Pitch shift-

ing using the fourier transform.http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/.

[2] W. Gaver. Sound support for collaboration. In EC-SCW’91: Proceedings of the second conference onEuropean Conference on Computer-Supported Co-operative Work, pages 293–308, Norwell, MA, USA,1991. Kluwer Academic Publishers.

[3] J. Makhoul. Linear prediction: A tutorial review.Proceedings of the IEEE, 63(4):561–580, 1975.

[4] P. Perrot, G. Aversano, and G. Chollet. Voice Dis-guise and Automatic Detection: Review and Per-spectives. In Progress in Nonlinear Speech Process-ing, pages 101–117. Springer Berlin / Heidelberg,2007.

[5] H. Ye and S. Young. Perceptually Weighted Lin-ear Transformation for Voice Conversion. In Eu-rospeech, pages 2409–2412, 2003.

- 6 -