datalink in air traffic management: human factors issues in communications

8
Applied Ergonomics 38 (2007) 473–480 Datalink in air traffic management: Human factors issues in communications Alex W. Stedmon a, , Sarah Sharples a , Robert Littlewood a , Gemma Cox a , Harshada Patel a , John R. Wilson a,b a Human Factors Research Group, School of Mechanical, Materials and Manufacturing Engineering, University of Nottingham, Nottingham, UK b School of Safety Science, University of New South Wales, NSW, Australia Accepted 31 January 2007 Abstract This paper examines issues underpinning the potential move in aviation away from real speech radiotelephony (R/T) communications towards datalink communications involving text and synthetic speech communications. Using a novel air traffic control (ATC) task, two experiments are reported. Experiment 1 compared the use of speech and text while Experiment 2 compared the use of real and synthetic speech communications. Results indicated that generally there were no significant differences between speech and text communications and that either type could be used without any main effects on performance. However, a number of specific differences were observed across the different phases of the scenarios indicating that workload levels may be more varied when speech communications are used. Experiment 2 illustrated that participants placed a greater level of trust in real speech than synthetic speech, and trusted true communications more than false communications (regardless of whether they were real or synthetic voices). The findings are considered in terms of datalink initiatives for future air traffic management, the importance placed on real speech R/T communications, and the need to develop more natural synthetic speech in this application area. r 2007 Elsevier Ltd. All rights reserved. Keywords: Datalink; Air traffic management; Flightdeck of the future; Air traffic control; Speech; Text; Communications 1. Air traffic management The management and control of air traffic comprises a complex problem and with aircraft levels set to double in the next 15 years, some degree of automation will be needed to enable such desired safe increases in air traffic capacity (Siemieniuch and Sinclair, 2001; Kirwan and Rothaug, 2001). The modern flightdeck–air traffic control (FD–ATC) system encompasses the integration of other aircrew, air traffic control operators (ATCOs), ground crew, and auxillary agencies (such as airline companies and service staff) and their related practices and procedures (Stedmon et al., 2003). For example, during a typical flight, a pilot will be in constant communication with other members of the flightcrew and different ATCOs; the pilot will receive information from FD instruments and displays; and may develop an awareness of other activities occurring in nearby airspace by ‘eavesdropping’ on radio commu- nications between other aircraft and ATCOs (Cox et al., 2006). These sources of information contribute to FD crew and ATCO attention demands, mental workload and situation awareness (SA) and will affect subsequent communications and/or behaviour within the FD–ATC system. Aircraft safety during flight is highly dependent on information exchanges via radiotelephony (R/T) between the ATCOs and pilots (Navarro and Sikorski, 1999). Since R/T communication bottlenecks and delays frequently occur, there is a need to investigate alternative modes of communication such as datalink (Wickens et al., 1997). It is estimated that 37% of current communication failures could be prevented if datalink replaced all standard verbal controller–pilot communications, and if additional systems were devised to check that pilot understanding matched a controller message, this would provide an additional 30% ARTICLE IN PRESS www.elsevier.com/locate/apergo 0003-6870/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.apergo.2007.01.013 Corresponding author. Tel.: +44 115 951 4068. E-mail address: [email protected] (A.W. Stedmon).

Upload: alex-w-stedmon

Post on 26-Jun-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESS

0003-6870/$ - se

doi:10.1016/j.ap

�CorrespondE-mail addr

Applied Ergonomics 38 (2007) 473–480

www.elsevier.com/locate/apergo

Datalink in air traffic management:Human factors issues in communications

Alex W. Stedmona,�, Sarah Sharplesa, Robert Littlewooda,Gemma Coxa, Harshada Patela, John R. Wilsona,b

aHuman Factors Research Group, School of Mechanical, Materials and Manufacturing Engineering, University of Nottingham, Nottingham, UKbSchool of Safety Science, University of New South Wales, NSW, Australia

Accepted 31 January 2007

Abstract

This paper examines issues underpinning the potential move in aviation away from real speech radiotelephony (R/T) communications

towards datalink communications involving text and synthetic speech communications. Using a novel air traffic control (ATC) task, two

experiments are reported. Experiment 1 compared the use of speech and text while Experiment 2 compared the use of real and synthetic

speech communications. Results indicated that generally there were no significant differences between speech and text communications

and that either type could be used without any main effects on performance. However, a number of specific differences were observed

across the different phases of the scenarios indicating that workload levels may be more varied when speech communications are used.

Experiment 2 illustrated that participants placed a greater level of trust in real speech than synthetic speech, and trusted true

communications more than false communications (regardless of whether they were real or synthetic voices). The findings are considered

in terms of datalink initiatives for future air traffic management, the importance placed on real speech R/T communications, and the

need to develop more natural synthetic speech in this application area.

r 2007 Elsevier Ltd. All rights reserved.

Keywords: Datalink; Air traffic management; Flightdeck of the future; Air traffic control; Speech; Text; Communications

1. Air traffic management

The management and control of air traffic comprises acomplex problem and with aircraft levels set to double inthe next 15 years, some degree of automation will beneeded to enable such desired safe increases in air trafficcapacity (Siemieniuch and Sinclair, 2001; Kirwan andRothaug, 2001). The modern flightdeck–air traffic control(FD–ATC) system encompasses the integration of otheraircrew, air traffic control operators (ATCOs), groundcrew, and auxillary agencies (such as airline companies andservice staff) and their related practices and procedures(Stedmon et al., 2003). For example, during a typical flight,a pilot will be in constant communication with othermembers of the flightcrew and different ATCOs; the pilotwill receive information from FD instruments and displays;

e front matter r 2007 Elsevier Ltd. All rights reserved.

ergo.2007.01.013

ing author. Tel.: +44115 951 4068.

ess: [email protected] (A.W. Stedmon).

and may develop an awareness of other activities occurringin nearby airspace by ‘eavesdropping’ on radio commu-nications between other aircraft and ATCOs (Cox et al.,2006). These sources of information contribute to FD crewand ATCO attention demands, mental workload andsituation awareness (SA) and will affect subsequentcommunications and/or behaviour within the FD–ATCsystem.Aircraft safety during flight is highly dependent on

information exchanges via radiotelephony (R/T) betweenthe ATCOs and pilots (Navarro and Sikorski, 1999). SinceR/T communication bottlenecks and delays frequentlyoccur, there is a need to investigate alternative modes ofcommunication such as datalink (Wickens et al., 1997). It isestimated that 37% of current communication failurescould be prevented if datalink replaced all standard verbalcontroller–pilot communications, and if additional systemswere devised to check that pilot understanding matched acontroller message, this would provide an additional 30%

Page 2: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESS

Fig. 1. The ATC interface.

A.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480474

improvement (Gibson et al., 2001). Datalink is designed torelay communications between ATCOs and pilots, usingtext-based digital information rather than conventionalR/T communication channels (Kerns, 1991). However,datalink may also incorporate other digital formats such asautomated synthetic speech output.

To examine underlying human factors issues of potentialdatalink initiatives, two experiments were conducted usingan ATC task and different modes of presenting informa-tion to users. The experiments were based on the sameparadigm so that the first experiment provided a basis forthe second experiment. Each experiment was independentand investigated separate issues associated with datalink.Experiment 1 examined the use of speech and text-basedcommunications while Experiment 2 examined the use ofreal and synthetic speech. Both experiments also consid-ered SA, performance and attention variables and, whenconsidered together, offer a comprehensive examination ofdatalink information presentation.

2. Experiment 1: speech and text in ATC

This experiment investigated issues associated withspeech and text communications in an ATC-based task.With datalink systems delivering information in the visualmodality rather than the traditional auditory modality,there is a need to address the circumstances under whichoperators could miss critical information, or becomehabituated to visual stimuli (see Thorley et al., 2001).Consideration of the wider implications of workload andSA in datalink communications was required since theimpact of any task re-distribution was expected tocontribute to safety in enabling controllers and pilots tomaintain an up-to-date picture of the relevant situation(Rognin et al., 2001).

2.1. Method

2.1.1. Design of Experiment 1

The independent variable was mode of informationpresentation (plain text, coloured text, real speech, andsynthetic speech). A between-participant’ design was usedwith participants undergoing two of four scenarios (i.e. onespeech and one text condition). To minimise any ordereffects conditions were counterbalanced between-partici-pants. Dependent variable measures were workload(NASA-TLX and continuous ratings); performance (re-sponse times (RTs) to rate continuous workload); attention(percentage of workload cues attended to and beaconsrecognised); vigilance (number of aircraft deviationsnoticed); communications (number of comments byparticipants); SA (memory test scores); and preference formode of communication.

2.1.2. Participants

Thirty-two participants (16 males, 16 females) wererecruited for the study from an opportunistic sample. Their

age ranged from 18 to 55 years (mean ¼ 25 years). Allparticipants spoke English as their first language and hadnormal, or corrected to normal, vision and no prior ATCexperience.

2.1.3. Apparatus and materials: development of the ATC

interface

A dynamic ATC task was required that was realisticenough to be ecologically valid but not so complex thatnaı̈ve participants would find it too difficult. A MicrosoftPowerpoint interface was developed based on field visits toATC centres. A series of 260 ‘slides’ were presented as ananimated sequence so that the ATC task followed a precisescript which progressed at a pre-determined rate. In thisway, each participant conducted the task over the samelength of time, using the same information. An example ofthe interface (reproduced in black and white) is given inFig. 1.The interface represented a fictional ATC sector within

which participants were responsible for monitoring theaircraft. The aircraft moved through the sector as part ofthe animation with the screen refreshing every 6 s. Atvarious stages throughout the scenario and unknown to theparticipants, three SA tests were triggered where the screenwent blank and participants had to remember as muchinformation from the previous slide as possible. The SAtests provided the opportunity to split the main scenariointo four phases: low, medium, and high aircraft activity,and high with a near miss situation.On other occasions, icons appeared which represented

navigation beacons and participants were requested tostate which beacon was nearest to an aircraft in theirsector. The ATC task was presented with plain green textand icons on a black background.The ATC task was run as a Microsoft Powerpoint

presentation on a 700MHz PC and monitor. Standardcomputer speakers were used to relay speech communica-

Page 3: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESSA.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480 475

tions which were set at a constant volume throughout theexperiment. For the SA test participants completed aSAGAT style memory test (Endsley et al., 1997) on blankprintouts of the ATC interface.

2.1.4. Apparatus and materials: paper-based materials

The experimenter used an observation sheet to recordeach participant’s comments throughout the experiment.Superlab Pro software was used to collect continuousworkload ratings throughout the experiment and a work-load scale was provided to assist participants with theirratings. NASA-TLX questionnaires were also adminis-tered.

2.1.5. Procedure

Participants were familiarised with the experiment beforethey began the main trials and were instructed to monitoraircraft and comment on any behaviour which wasunexpected or in conflict with aircraft communications.Conflicts occurred when aircraft did not follow the actionsof their communications (e.g. turning to a different headingor ascending/descending to a flight level other than the onestated). As the behaviour of any aircraft in the presentationcould not be changed, participants conducted a monitoringtask by attending to relevant situations occurring on theradar screen and using other information presented tothem:

electronic flight-strips were included on the interface tosupport SA and workload; � speech files were incorporated into the presentation and

presented through speakers as real voice recordings orsynthetic speech;

� a text box was employed to present the text based

communications.

Approximately every 30 s a computer prompt signalledfor participants to perform continuous workload ratings,which were recorded via the computer keyboard. On threeoccasions during each scenario, participants were in-structed to complete an SA test, and on four occasionsthey had to state which beacon was nearest to an aircraft intheir sector. These tasks occurred without warning so thatparticipants could not anticipate and prepare for them.After each condition participants completed a NASA-TLXworkload questionnaire. Upon completing the experiment,participants were thanked, debriefed and paid for theirtime. The experiment took approximately 35–40min tocomplete.

2.2. Results and discussion

The data were tested for normality, equality of variance,and whether they met the assumptions for parametricanalysis. Analysis of variance (ANOVA) tests wereconducted on the data using 1� 4 between-participantsANOVAs for the different modes of information presenta-

tion (plain text, coloured text, synthetic speech, and realspeech); and 1� 4 between-participants ANOVAs for eachcondition across the four phases within each scenario.Post-hoc Tukey tests were conducted where appropriate.

2.2.1. Workload (NASA-TLX)

Workload was assessed using the abridged ‘raw’ versionof NASA-RTLX rated along six 100-point scales (Byers etal., 1989). The scores along the different scales werecombined into an overall score and analysed prior to theindividual scales. No significant effects were observed forinformation presentation (p40.05). A significant effectwas observed for order of condition presentation[t(62) ¼ 1.751, po0.05 (one-tailed)], indicating that parti-cipants rated their workload higher in the first scenario(mean workload ¼ 49.62), than in the second scenario(mean workload ¼ 43.74). No other significant effects wereobserved (p40.05).

2.2.2. Workload (continuous ratings)

Continuous workload ratings were collected approxi-mately every 30 s during each condition along a 9-pointscale (where 1 ¼ low workload and 9 ¼ high workload).No significant effects were observed for informationpresentation (p40.05). Across the four phases, significantmain effects were observed for real speech [F(3,63) ¼ 3.052,po0.05 (two-tailed)] and synthetic speech [F(3,63) ¼ 3.301,po0.05 (two-tailed)]. Post-hoc tests revealed that workloadwas rated higher in Phase 2 than in Phase 4 in both the realspeech and synthetic speech conditions. No other signifi-cant effects were observed (p40.05).

2.2.3. Performance (response time to rate continuous

workload ratings)

Response time data were collected each time participantsprovided continuous workload ratings. No significanteffects were observed for information presentation(p40.05). Across the four phases, a significant main effectwas observed for real speech [F(3,63) ¼ 6.144, po0.01(two-tailed)]. Post-hoc tests illustrated that RTs in Phase 1were significantly slower than in Phase 2 and Phase 4. Noother significant effects were observed (p40.05).

2.2.4. Attention (to workload cues)

The percentage of continuous workload cues respondedto by participants was analysed. No significant effects wereobserved for information presentation (p40.05). Acrossthe four phases, a significant main effect was observed forreal speech [F(3,63) ¼ 4.715, po0.01 (two-tailed)]. Post-hoc tests illustrated that fewer workload cues wereattended to in Phase 1 than in Phase 4. No other significanteffects were observed (p40.05).

2.2.5. Attention (beacons recognised)

The number of beacons correctly identified was ana-lysed. No significant effects were observed for informationpresentation (p40.05). Across the four phases, significant

Page 4: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESS

Table 1

Summary of Experiment 1 results

Task variable Key findings

Workload

(NASA-TLX)

� No differences between speech and text, but

workload higher on the first trial.

Workload

(continuous

ratings)

� For both real and synthetic speech, workload

was rated higher in Phase 2 than in Phase 4.

� Response times for real speech between Phases 2

and 4 were significantly slower.

Beacons

recognised

� More beacons were recognised in the plain text,

coloured text and synthetic speech in Phase 3

than in either Phases 2 or 4.

Workload cues � With real speech less workload cues were

attended to in Phase 1 than in Phase 4.

Aircraft

deviations

� More deviations were noticed in Phase 1 than in

the other phases.

� More aircraft deviations were noticed with

speech than text.

Communications � More communications were always made in

Phases 1, 2, and 3 than in Phase 4.

SA � SA improved on the second completion of the

task.

� More aircraft positions were remembered during

the second completion of the task.

Communication

mode preferences

� Real speech and coloured text were the most

preferred modes of communication.

A.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480476

main effects were observed for plain text [F(3,63) ¼ 3.413,po0.05 (two-tailed)], coloured text [F(3,63) ¼ 3.413,po0.05 (two-tailed)], and synthetic speech [F(3,63) ¼6.600, po0.01 (two-tailed)]. Post-hoc tests revealed thatin all three conditions more mistakes were made identifyingthe correct beacon in Phase 3 than in either Phase 2 orPhase 4, and that also in the synthetic speech condition,more mistakes were made in Phase 3 than in Phase 1. Noother significant effects were observed (p40.05).

2.2.6. Communications (number of comments by

participants)

The total number of comments made by participants wasanalysed. As the data were nominal, non-parametricKruskal–Wallis analyses were conducted. No significanteffects were observed for information presentation(p40.05). Across the four phases, significant effects wereobserved for plain text [X2(3) ¼ 30.704, po0.001, two-tailed], coloured text [X2(3) ¼ 33.389, po0.001, two-tailed], real speech [X2(3) ¼ 31.657, po0.001, two-tailed]and synthetic speech [X2(3) ¼ 37.219, po0.001, two-tailed].In all cases more comments were made in Phases 1, 2, and 3than in Phase 4. No other significant effects were observed(p40.05).

2.2.7. Vigilance (number of aircraft deviations noticed)

The number of aircraft deviations noticed in eachcondition was analysed. No significant effects wereobserved for information presentation (p40.05). Acrossthe four phases, significant main effects were observed forplain text [F(3,63) ¼ 19.069, po0.001 (two-tailed)]; co-loured text [F(3,63) ¼ 12.948, po0.001 (two-tailed)]; realspeech (F(3,63) ¼ 22.238, po0.001 (two-tailed)]; and syn-thetic speech [F(3,63) ¼ 43.085, po0.001 (two-tailed)].Post-hoc tests revealed that in all conditions moredeviations were noted in Phase 1 than in Phases 3 and 4.A significant effect was observed for speech [t(62) ¼ 1.997,po0.05 (two-tailed)], illustrating that more aircraft devia-tions were noticed using speech than text. No othersignificant effects were observed (p40.05).

2.2.8. SA (memory test scores)

SA was assessed via a SAGAT-style memory task withreference to aircraft on the display (Endsley et al., 1997).Performance was coded to according to three factors ofeach aircraft: position, direction, and identity (flight codesand flight levels). This provided a percentage of themaximum amount of information it was possible to recallbetween each of the three SA tests in the experiment (anoverall percentage), as well as percentage scores for each ofthe three factors. No significant effects were observed forinformation presentation (p40.05). A significant effect wasobserved for the order of condition presentation[t(62) ¼ �2.170, po0.05 (two-tailed)], indicating that SAimproved the second time participants conducted the task.When the components of the SA test (position, direction,and identity) were analysed independently, a significant

effect was observed for position [t(62) ¼ �2.528, po0.05(two-tailed)], indicating that participants rememberedmore aircraft positions the second time they conductedthe task. No other significant effects were observed(p40.05).

2.2.9. Preference for mode of communication

Participants were asked for their preference for mode ofcommunication: 47% of participants preferred the speechmodes, while 53% preferred the text modes. Of these 31%expressed a preference for real speech, 31% expressed apreference for coloured text, 22% expressed a preferencefor plain text, and 16% of participants expressed apreference for synthetic speech.

2.3. Summary of Experiment 1

A summary of the findings are presented in Table 1.Experiment 1 illustrated that across the measures there

was no difference between speech and text communicationsother than participants noticed more aircraft deviationswhen information was communicated via speech than text.A number of specific differences were observed across thedifferent phases of the scenarios; these are considered inSection 4.

Page 5: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESSA.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480 477

3. Experiment 2: real and synthetic speech in ATC

This experiment investigated issues associated withdifferent levels of trust between real and syntheticcommunications and also the perception of male andfemale synthesised speech in relation to trust (Mullennix etal., 2003). Four versions of the ATC task were used witheach containing four types of speech (real male, realfemale, synthetic male, and synthetic female) counter-balanced across the different aircraft in each of theversions. The text box was not used as speech commu-nications were used throughout and the electronic flight-strips were also not employed as they may have affected thetrust ratings if they were in conflict with verbal informa-tion. In this experiment, some communications had to bemore accurate than others and so a balanced number ofcorrect and incorrect communications in each communica-tion mode were presented. The ATC task was presentedwith plain green text and icons on a black background (i.e.the same as the plain text condition in Experiment 1).

3.1. Method

3.1.1. Design

A within-participants’ design was employed as all themodes of speech communication were included in eachscenario. The scenarios were counterbalanced between-participants so that any order effects were minimised. Theindependent variables were: mode of communication (realspeech and synthetic speech); accuracy of communication(true or false). True and false communications related tothe subsequent behaviour of aircraft so that a falsecommunication such as ‘‘Flight BA234 turning clockwisethrough 90 degrees’’ would not be carried out as stated. Ifthe communication was correct the aircraft would performthe task as stated. The true and false versions werebalanced for each communication across the conditions.Dependent variable measures were: trust (subjectiveratings); performance (RTs to rate trust and number/typeof communications responded to); and SA (memory testscores).

3.1.2. Participants

Forty-eight participants (24 males, 24 females) wererecruited from an opportunistic sample for the study. Theirage ranged from 19 to 23 years (mean ¼ 20.9 years). Allparticipants spoke English as their first language and hadnormal, or corrected to normal, vision and no prior ATCexperience.

3.1.3. Apparatus

The ATC task was run on the same apparatus asExperiment 1. A JVC digital video camera was used torecord each session in order that RTs for trust ratingscould be calculated. Synthetic speech was produced usingAT&T Natural Voices, Text-to-Speech engine.

3.1.4. Procedure

Participants were instructed to monitor the aircraft andprovide a verbal trust rating for each command they heard.On three occasions during the experiment, participantswere instructed to complete an SA test. These occurredwithout warning so that participants could not prepare forthem. Once the main ATC Scenario ended, participantswere thanked, debriefed and paid for their involvement.The experiment took approximately 35–40min to com-plete.

3.2. Results and discussion

To meet the assumptions for parametric analysis thedata were tested for normality and equality of variance.Post-hoc Tukey tests were conducted where necessary.

3.2.1. Trust (subjective ratings)

Trust for each communication was measured along asimple 10-point scale (where 0 ¼ no trust and 10 ¼ fulltrust) and mean ratings were analysed. Significant maineffects were observed for mode of communication[F(1,47) ¼ 15.986; po0.001 (two-tailed)] and also foraccuracy of communication [F(1,47) ¼ 680.855; po0.001(two-tailed)]. This indicates that real speech was trustedmore than synthetic speech, and that true statements weretrusted more than false statements. A significant interac-tion was observed for mode� accuracy of communication[F(1,47) ¼ 7.948; po0.01 (two-tailed)]. Post-hoc analysisrevealed that true communication in real speech wastrusted more than, false communication in real speech,true communication in synthetic speech or false commu-nication in synthetic speech. In addition, true communica-tion in synthetic speech was trusted more than falsecommunication in real or synthetic speech. No othersignificant effects were observed (p40.05) which meansthat there was no effect of voice gender, a surprising resultwhich needs to be explored further in subsequent work.

3.2.2. Performance (RTs to rate trust)

RTs for the trust ratings were calculated from the end ofa communication being presented to the verbal responsegiven by participants. A significant main effect wasobserved for mode of communication [F(1,31) ¼ 10,449;po0.01 (two-tailed)], indicating that real speech wasresponded to more quickly than synthetic speech. Noother significant effects were observed (p40.05).

3.2.3. Performance (response rate of number/type of

commands)

Performance was also rated by the number and type ofcommunications to which participants responded. It waspossible to analyse accuracy of response and the number ofresponses for each combination of communication andaccuracy was calculated. The data were then transformedby taking the square root for each score. A significant maineffect was observed for response [F(1,47) ¼ 448.377;

Page 6: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESSA.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480478

po0.001 (two-tailed)] indicating that the number ofcorrectly identified communications was higher thanincorrectly identified communications. No other significanteffects were observed (p40.05).

3.2.4. SA (memory test scores)

As in Experiment 1, SA was assessed via a SAGAT stylememory task with reference to aircraft on the display(Endsley et al., 1997). Performance was coded in the sameway according to the three factors: position, direction, andidentity of each aircraft. A significant main effect wasobserved between the tests [F(2,94) ¼ 17.477; po0.001(two-tailed)]. Post-hoc analyses revealed that there was asignificant increase in SA between Tests 1 and 3, as well asTests 2 and 3. Each of the three factors were analysed andsignificant effects were observed for position [F(2,94) ¼13.155; po0.001 (two-tailed)] and direction [F(2,94) ¼6.935; po0.01 (two-tailed)]. Post-hoc analyses revealedthat position SA increased between Tests 1 and 3 and alsobetween Tests 2 and 3. Direction SA also increasedbetween Tests 2 and 3. No other significant effects wereobserved (p40.05).

3.3. Summary of Experiment 2

A summary of the findings are presented in Table 2.Experiment 2 illustrated that real speech was responded

to more quickly and trusted to a greater degree thansynthetic speech. There were also clear performance effectsfor SA based on familiarity with the task.

4. General discussion

Experiment 1 investigated the use of plain text, colouredtext, real speech, and synthetic speech. These wereconsidered against workload, performance, attention,

Table 2

Summary of Experiment 2 results

Task variable Key findings

Trust � Real speech was trusted more than synthetic

speech.

� True statements were trusted more than false

statements.

Response times

to rate trust

� Real speech was responded to more quickly than

synthetic speech.

Communications

response rate

� The number of correctly identified

communications was higher than incorrectly

identified communications.

SA � There was a significant increase in SA between

Tests 1 and 3, as well as Tests 2 and 3.

� Position SA increased between Tests 1 and 3,

and also between Tests 2 and 3.

� Direction SA increased between Tests 2 and 3.

vigilance, communications, SA, and preference for modeof communication. Although no effects were observed inExperiment 1 for the NASA-TLX data, under the real andsynthetic speech conditions, continuous workload wasrated higher in Phase 2 than in Phase 4. This illustratedthat speech was more sensitive in assessing workload thantext as well as validating the scenario where aircraft activitywas higher in Phase 2 than in Phase 4. RTs to rate thecontinuous workload also illustrated an effect for realspeech, indicating that participants were quicker at ratingtheir workload in Phase 1 than in Phases 2 or 4. This mayhave been because in Phase 1 there was less activity and soparticipants could concentrate on the continuous workloadratings, while in Phases 2 and 4 there was more activitywhich had an impact on RTs. An interesting finding,however, is that for the percentage of workload cuesattended to there was a significant difference—less cueswere attended to in Phase 1 than in Phase 4. This may havebeen because participants were less familiar with the needto register their continuous workload in the early part ofthe experiment, even though they were quicker at respond-ing when they did remember. A significant difference wasobserved across all conditions for the number of aircraftdeviations; this illustrates that more deviations wererecognised in Phase 1 than in either Phases 3 or 4. Thiswould seem to indicate that participants were more vigilantin the early part of the scenario during which there wasmore activity. Coupled with the findings for RTs to rateworkload and attention with regard to number of cues, asthere was less activity at the start of the scenario,participants may have found it easier to attend to theaircraft behaviour without the disruption of increasedactivity later in the scenario.In relation to attention to number of beacons, there was

a significant effect for the scenario phases in the plain text,coloured text, and synthetic speech conditions, where moremistakes were made in Phase 3 than in either Phases 2 or 4.Whilst Phase 2 was rated higher for workload, Phase 3had a near-miss incident which may have disruptedgeneral attention as participants focused on this activity.As a consequence, this might explain the poor performancefor this phase of the scenario. The number of communi-cations that participants made illustrated that morewere made in Phases 1, 2 and 3 than in Phase 4; however,this last phase was shorter than the other phases, whichwould have an impact on the number of communicationsmade.A significant difference was observed for workload (as

measured by the NASA-TLX) and the order in whichparticipants conducted the trials. The second time theyperformed the task workload decreased; this was perhapsdue to a practice effect and increased familiarity with thedomain and task demands. In addition, for SA, an ordereffect was observed as participants performed better thesecond time they conducted the task. Further analysisillustrated that the position of aircraft was the factor thatunderpinned this effect.

Page 7: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESSA.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480 479

In relation to general preferences for modes of commu-nication, real speech and coloured text were favouredequally by the participants, with both speech and textin total receiving approximately equal preference ratings.This supports the general finding that speech and textcommunications did not present any significant maineffects and therefore either could be used. It is only thesensitivity of speech (in relation to workload ratings andthe response time to rate workload) over the otherconditions that suggest it as a mode of communicationthat could be used to monitor user behaviour better thanthe others.

From the results of Experiment 2, it can be seen thatparticipants placed a greater level of trust in real thansynthetic speech, that is, that they trusted true commu-nications more than false (regardless of whether they werereal or synthetic voices). It was also found that participantstook less time to respond to real speech and longer torespond to communications presented via synthetic speech.Further, the number of correctly identified communica-tions was higher than incorrectly identified communica-tions. The level of SA experienced by participants increasedas the experiment progressed. In particular, awareness ofposition and direction improved throughout this experi-ment.

Several conclusions can be drawn from this. Sinceparticipants placed a greater degree of trust in real speechthan in synthetic speech, this would appear to support thenotion that it is hard to establish a trusting relationshipwith someone that cannot be seen face to face (Riegels-berger et al., 2002). Although participants could not seeother parties, real speech was trusted more as it couldperhaps be assumed to come from a real person. Withregard to the implementation of datalink, this suggestsfurther research is needed, perhaps into the realism ofsynthesised speech. What is also apparent from Experiment2 is that participants rated true statements significantlyhigher for trust than false statements. This implies thatparticipants were responding to statements accurately, andcould distinguish between true and false statements withrelative ease.

It was also found that participants took less time torespond to real speech and longer to respond to commu-nications presented via synthetic speech. This may havebeen because it was easier to process real speech or to trackwhat was said through natural prosody, and thereforerespond to it. From the interaction that was observed, trueand correct real speech was rated higher than either inisolation, and synthetic and false statements were ratedlower than either in isolation. It would seem in both thesecases that the factors reinforce trust ratings beyond thepower of each separately.

The lack of difference in accuracy between synthetic andreal speech could be put down to participants taking alonger period of time to reach the same level of accuracywithin the experiment. Indeed this seems to have been thecase where participants took significantly longer to respond

to synthetic statements than real statements, even thoughthe level of accuracy in the response remained unchanged.During periods of time when task load is high, a pattern

known as the ‘complacency effect’ becomes apparent,where users tend to over-trust an automated tool (Para-suraman et al., 1993). Systems designers and users need tounderstand how systems deal with, and represent un-certainty so that information is neither blindly acceptednor treated as wholly unreliable (Cutler and Stedmon,1999). Due to the lack of familiarity of the participantswith the task, it can only be assumed that their workloadlevels were higher than if they were more practised at thetasks.The level of SA experienced by participants increased as

Experiment 2 progressed. In particular, awareness ofposition and direction increased throughout the trials.However, this may have been the result of a number offactors. The increase in SA would appear to supportliterature on SA where decision making is more effectivewith increased knowledge and experience (Endsley, 1988).However, SA is also vulnerable when the environment iscrowded, complex and, more importantly, unfamiliar to auser. This does not appear to be the case here, but it may bea result of the design involving a monitoring task ratherthan a fully interactive ATCO task.The SAGAT test was designed to test only the first level

of SA (perception of the elements in the environment)(Endsley et al., 1997). With this in mind, it could be arguedthat the test used was more as a short-term memory test,rather than a full investigation of SA. If this is the case,differing levels of information retention could be attributedto the allocation of cognitive resources to different areas,for example, participants may have had a better memoryfor aircraft which were perceived to be more important,such as those involved in a potential collision, or thoseabout to descend into the airport (Gronlund et al., 1998).Relating this back to short-term memory, it is important inthe domain of ATC to maintain an overall ‘picture’ ofimportant aircraft in a particular sector which will have animpact on SA (Isaac, 1997). Further, it is important thatpilots and ATCOs use the visuo-spatial sketchpad to retaininformation about where aircraft are located in theairspace, which is essential for the controller if the displayis momentarily lost from view (Wickens et al., 1998).

5. Conclusions

The importance of speech communications in ATC,which has traditionally relied upon direct R/T, has beenconsidered here. The two experiments focused on differentissues associated with real and synthetic speech, and text inan ATC task with respect to performance, vigilance, atten-tion, workload, trust, and SA. The findings suggest thatthere are no significant main differences between speechand text, but that real speech offers a more sensitive rubricfor measuring workload and performance differences, inaddition to being trusted more than synthetic speech.

Page 8: Datalink in air traffic management: Human factors issues in communications

ARTICLE IN PRESSA.W. Stedmon et al. / Applied Ergonomics 38 (2007) 473–480480

The introduction of datalink in the form of syntheticspeech has implications for the pilots and ATCOs using thesystem. Aspects of speech such as perceived urgency oremotion, conveyed via paralinguistic information, couldaffect the degree of trust and speed of response (Little-wood, 2004). Most synthetic speech systems are approach-ing the realism of human voices, but are still readilyidentified as machine-generated voices in which character-istics such as emotion, mood, and personality, are missing(Murray et al., 1996). This emphasises the fact that ATCOsand pilots may find it difficult to establish an effectiveworking relationship with this type of system as it currentlyexists (Littlewood, 2004).

With increasing demand for speech technology, there is agrowing need for systems which sound natural, and includeemotion and other pragmatic effects. Prototype systemshave evolved which use rules to alter voice, pitch, andtiming in speech via a commercial synthesiser, but thissystem still requires extensive research (Littlewood, 2004).This is heightened in the aviation domain by the pressure toachieve high levels of safety and efficiency in air trafficflow. In the transfer of information from ATCO to pilotvia datalink, information has to be processed more deeplyand carefully analysed, and this has the potential to distractattention away from primary tasks being undertaken.

With so much automation predicted for future air trafficmanagement, concern has been expressed about the role ofthe controller in the future. With the introduction ofdatalink, ATC operators would have more of a rolemanaging and optimising the smooth flow of communica-tions via the use and supervision of automation tools andso a better understanding is needed of the relationshipbetween the pilot, ATC and the content of the datatransmitted within the FD–ATC ‘system’ (Stedmon et al.,2003). It is crucial, therefore, when considering theseaspects to establish the extent to which datalink, along withdifferent levels of task delegation, supports or detractsfrom safe operations in the FD system of the future.

Acknowledgement

The work presented in this paper is supported by the UKEngineering and Physical Sciences Research Council,Project GR/R86898/01: Flightdeck and Air Traffic ControlEvaluation.

References

Byers, J.C., Bittner, A.C., Hill, S.G., 1989. Traditional and raw task load

index (TLX) correlations: are paired comparisons necessary? In: Mital,

A. (Ed.), Advances in Industrial Ergonomics and Safety I. Taylor &

Francis, London.

Cox, G., Stedmon, A.W., Nichols, S.C., Jackson, S., Wilson, J.R., Milne,

T.J., 2006. The flight deck of the future: field studies in datalink and

freeflight. In: Cook, M.J., Noyes, J.M., Masakowski, Y. (Eds.),

Decision Making in Complex Systems. Ashgate, Aldershot.

Cutler, H., Stedmon, A.W., 1999. Representing uncertainty in advanced

navigation and situational awareness displays. In: Harris, D. (Ed.),

Engineering Psychology and Cognitive Ergonomics—Vol. 3. Ashgate,

Aldershot.

Endsley, M.R., 1988. Design and evaluation for situation awareness

enhancement. In: Proceedings of the Human Factors Society 32nd

Annual Meeting. Human Factors Society, Santa Monica, CA, pp.

97–101.

Endsley, M.R., Mogford, R., Allendoerfer, K., Stein, E., 1997. Effect of

Free flight Conditions on Controller Performance, Workload, and

Situation Awareness: A Preliminary Investigation of Changes in Locus

of Control Using Existing Technologies. Texas Tech University,

Lubbock.

Gibson, H., Megaw, T., Donohoe, L., 2001. Failures in pilot-controller

communications and their implications for datalink. In: Harris, D.

(Ed.), Engineering Psychology and Cognitive Ergonomics—Vol. 5.

Ashgate, Aldershot.

Gronlund, S.D., Ohrt, D.D., Dougherty, M.R., Perry, J.L., Manning,

C.A., 1998. The role of memory in air traffic control. J. Exp. Psychol. 4

(3), 263–280.

Isaac, A.R., 1997. Situational awareness in air traffic control: human

cognition and advanced technology. In: Harris, D. (Ed.), Engineering

Psychology and Cognitive Ergonomics. Ashgate, Aldershot.

Kerns, K., 1991. Datalink communication between controllers and pilots:

a review and synthesis of the simulation literature. Int. J. Aviat.

Psychol. 1 (3), 181–204.

Kirwan, B., Rothaug, J., 2001. Finding ways to fit the automation to the

air traffic controller. In: Hanson, M. (Ed.), Contemporary Ergonomics

2001. Taylor & Francis, London.

Littlewood, R., 2004. Evaluation of speech commands in ATC.

Unpublished B.Sc. Dissertation. University of Nottingham, UK.

Mullennix, J.W., Stern, S.E., Wilson, S.J., Dyson, C., 2003. Social

perception of male and female computer synthesised speech. Comput.

Hum. Behav. 19 (4), 407–424.

Murray, I.R., Arnott, J.L., Rohwer, E.A., 1996. Emotional stress in

synthetic speech: progress and future directions. Speech Commun. 20

(4), 85–91.

Navarro, C., Sikorski, S., 1999. Datalink communication in flight deck

operations: a synthesis of recent studies. Int. J. Aviat. Psychol. 9,

361–376.

Parasuraman, R., Molloy, R., Singh, I.L., 1993. Performance conse-

quences of automation induced complacency. Int. J. Aviat. Psychol. 3,

1–23.

Riegelsberger, J., Sasse, A., McCarthy, D., 2002. The researcher’s

dilemma: evaluating trust in computer-mediated communication. Int.

J. Hum. Comput. Interact. 32 (4), 342–367.

Rognin, L., Grimaud, I., Hoffman, E., Zeghal, K., 2001. Implemen-

ting changes in controller-pilot task distribution: the introduc-

tion of limited delegation of separation assurance. In: Proceedings

of HESSD-01 4th International Workshop on Human Error,

Safety and Systems Development, 11–12 June, 2001, Linkoping,

Sweden.

Siemieniuch, C.E., Sinclair, M.A., 2001. The process owner: a role to

overcome problems of manufacturing complexity and organisational

learning. In: Hanson, M. (Ed.), Contemporary Ergonomics 2001.

Taylor & Francis, London.

Stedmon, A.W., Nichols, S.C., Cox. G., Neale, H., Jackson, S., Wilson,

J.R., Milne, T.J., 2003. Framing the flightdeck of the future: human

factors issues in free flight and datalink. HCI International ’03. In:

Proceedings of the 10th International Conference on Human–Com-

puter Interaction. Lawrence Erlbaum Associates.

Thorley, P., Hellier, E., Edworthy, J., 2001. Habituation effects in visual

warnings. In: Hanson, M. (Ed.), Contemporary Ergonomics 2001.

Taylor & Francis, London.

Wickens, C.D., Mavor, A.S., Parasuraman, R., McGee, J.P., 1997. Flight

to the Future: Human Factors in Air Traffic Control. National

Academy Press, Washington DC.

Wickens, C.D., Gordon, S.E., Liu, Y., 1998. An Introduction to Human

Factors Engineering. Addison-Wesley Longman Inc, USA.