a boostedt ree frameworkfor runway occupancy and exitprediction · 2018-12-05 · fig. 1). runway...

9
A Boosted Tree Framework for Runway Occupancy and Exit Prediction Dario Martinez, Seddik Belkoura and Samuel Cristobal Innaxis Research and Foundation Floris Herrema Delft University of Technology Philipp W¨ achter Austro Control Abstract—This paper presents a machine learning algorithm trained to predict the actual runway occupancy times at Vienna airport. Runway occupancy times are usually estimated by individual studies on previous operations or analytical methods. However, due to the uncertainty of the actual operations, wide safety margins need to be applied by air traffic controllers. Finding an acceptable compromise between the maximisation of the throughput and minimisation of the risk may improve runway utilisation. In the future, machine learning models could boost controllers’ confidence by g iving m ore a ccurate p redictions on expected runway occupancy times, resulting in a smaller buffer that ultimately increases capacity without compromising safety levels. The training of the machine learning model is focused on runway 34 of Vienna airport. Different predictive models com- pute the expected runway occupancy time and expected exit at different distances from threshold. Features are engineered based on meteorological conditions, sequence of flights, aircraft trajectories and flight p lan. At the end of the paper, a l ist of the relevance importance of precursors is presented exactly at the threshold and 2NM ahead of it. Keywords—Runway occupancy, prediction, machine learning, lightGBM. I. INTRODUCTION In the past, the most commonly used approach to increase airport capacity was to modify infrastructure, e.g. additional runway or terminals. However, this approach is difficult and costly to implement. On the other hand, recent developments in Artificial Intelligence and Machine Learning (ML) have proven to be more efficient in optimizing systems through data analytics. A representative indicator of airport performance is the Runway Utilization (RU), i.e. the sum of the Runway Occupancy Times (ROTs) of all the landing flights divided by the total time of use of the runway. For each landing flight, the ROT is defined as the time interval from the instant the flight surpasses the threshold to the instant it vacates the runway completely. Therefore, the RU is maximised by minimising the aggregated ROTs. Hence, the obvious approach to the problem is to study whether a decrease of the ROT is feasible [1]–[6]. From previous works, two drawbacks in the RU studies can be extracted: (a) all previous models try to manually classify aircrafts into groups and fail to differentiate performance characteristics; and (b) as claimed by Koenig [4] in the 70’s, a strategy based in minimising the ROT can be highly influenced by pilot/airline motivations, e.g. a pilot may spend more time on the runway for a better exit . This adds the complexity of non-trivial and most likely unknown incentives of airlines and pilots. In recent years, airport control has turned towards another type of solution for RU optimisation: controlling the time between consecutive landing flights. Time Based Separation (TBS) [7]–[10] is a new operating procedure for separating aircraft by time. While this effectively introduces a more robust and resilient strategy, it still raises some issues due to separation being predefined by weather conditions and aircraft types. In fact, the ROT might depend on several other factors unknown to the ATCO. In this context, Machine Learning can be a powerful tool in accounting for more potential factors, i.e. features, and measuring their impact on ROT predictions, i.e. precursors. In fact, the risk associated with decreased separation be- tween flights is highly dependent on the risk assessment of the controlled. This becomes particularly important in High Intensity Runway Operations (HIRO) periods where the amount of stress on the controllers and the higher level of risks might directly impact their optimisation of runway throughput. Currently, no human support system assists the arrival manager (AMAN) and departure manager (DMAN) on predicting runway exits and ROTs. We propose the first steps towards a predictive analytics tool that could help controllers deal with such complex and high risk assessment tasks. This tool specifically identifies the precursors causing higher than average ROTs, mostly by assessing the chances of a flight not taking the procedural exit. This paper presents a novel data-based approach: by using predictive modelling techniques, key predictor variables can be extracted from a set of observations. It is important to note that the predictive model only uses information available before the threshold to make the prediction and whole flight information to train the model. This paper falls under the scope of the H2020 ”Safe- Clouds.eu” project that aims to increase safety levels by using big data. This paper uses the SafeClouds platform called DataBeacon. DataBeacon enables safe data sharing using de- identification techniques called Secure Data Frames or SDFs. Because of this, the model was trained using a rich variety of data sources and the future plans include expanding over more datasets - thus the relevance of ”SafeClouds.eu” and DataBeacon. The paper is organised as follows: Section II provides a de- Eighth SESAR Innovation Days, 3 rd – 7 th December 2018

Upload: others

Post on 14-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

A Boosted Tree Framework for Runway Occupancy and Exit Prediction

Dario Martinez, Seddik Belkouraand Samuel Cristobal

Innaxis Research and Foundation

Floris HerremaDelft University of Technology

Philipp WachterAustro Control

Abstract—This paper presents a machine learning algorithm trained to predict the actual runway occupancy times at Vienna airport. Runway occupancy times are usually estimated by individual studies on previous operations or analytical methods. However, due to the uncertainty of the actual operations, wide safety margins need to be applied by air traffic controllers. Finding an acceptable compromise between the maximisation of the throughput and minimisation of the risk may improve runway utilisation. In the future, machine learning models could boost controllers’ confidence b y g iving m ore a ccurate p redictions on expected runway occupancy times, resulting in a smaller buffer that ultimately increases capacity without compromising safety levels.

The training of the machine learning model is focused on runway 34 of Vienna airport. Different predictive models com-pute the expected runway occupancy time and expected exit at different distances from threshold. Features are engineered based on meteorological conditions, sequence of flights, aircraft trajectories and flight plan. At the end of the paper, a l ist of the relevance importance of precursors is presented exactly at the threshold and 2NM ahead of it.

Keywords—Runway occupancy, prediction, machine learning, lightGBM.

I. INTRODUCTION

In the past, the most commonly used approach to increaseairport capacity was to modify infrastructure, e.g. additionalrunway or terminals. However, this approach is difficult andcostly to implement. On the other hand, recent developmentsin Artificial Intelligence and Machine Learning (ML) haveproven to be more efficient in optimizing systems through dataanalytics. A representative indicator of airport performanceis the Runway Utilization (RU), i.e. the sum of the RunwayOccupancy Times (ROTs) of all the landing flights dividedby the total time of use of the runway. For each landingflight, the ROT is defined as the time interval from the instantthe flight surpasses the threshold to the instant it vacatesthe runway completely. Therefore, the RU is maximised byminimising the aggregated ROTs. Hence, the obvious approachto the problem is to study whether a decrease of the ROT isfeasible [1]–[6]. From previous works, two drawbacks in theRU studies can be extracted: (a) all previous models try tomanually classify aircrafts into groups and fail to differentiateperformance characteristics; and (b) as claimed by Koenig [4]in the 70’s, a strategy based in minimising the ROT can behighly influenced by pilot/airline motivations, e.g. a pilot mayspend more time on the runway for a better exit . This adds the

complexity of non-trivial and most likely unknown incentivesof airlines and pilots.

In recent years, airport control has turned towards anothertype of solution for RU optimisation: controlling the timebetween consecutive landing flights. Time Based Separation(TBS) [7]–[10] is a new operating procedure for separatingaircraft by time. While this effectively introduces a morerobust and resilient strategy, it still raises some issues due toseparation being predefined by weather conditions and aircrafttypes. In fact, the ROT might depend on several other factorsunknown to the ATCO. In this context, Machine Learning canbe a powerful tool in accounting for more potential factors,i.e. features, and measuring their impact on ROT predictions,i.e. precursors.

In fact, the risk associated with decreased separation be-tween flights is highly dependent on the risk assessmentof the controlled. This becomes particularly important inHigh Intensity Runway Operations (HIRO) periods wherethe amount of stress on the controllers and the higher levelof risks might directly impact their optimisation of runwaythroughput. Currently, no human support system assists thearrival manager (AMAN) and departure manager (DMAN) onpredicting runway exits and ROTs.

We propose the first steps towards a predictive analyticstool that could help controllers deal with such complex andhigh risk assessment tasks. This tool specifically identifiesthe precursors causing higher than average ROTs, mostly byassessing the chances of a flight not taking the proceduralexit. This paper presents a novel data-based approach: by usingpredictive modelling techniques, key predictor variables can beextracted from a set of observations. It is important to note thatthe predictive model only uses information available before thethreshold to make the prediction and whole flight informationto train the model.

This paper falls under the scope of the H2020 ”Safe-Clouds.eu” project that aims to increase safety levels by usingbig data. This paper uses the SafeClouds platform calledDataBeacon. DataBeacon enables safe data sharing using de-identification techniques called Secure Data Frames or SDFs.Because of this, the model was trained using a rich varietyof data sources and the future plans include expanding overmore datasets - thus the relevance of ”SafeClouds.eu” andDataBeacon.

The paper is organised as follows: Section II provides a de-

Eighth SESAR Innovation Days, 3rd – 7th December 2018

Page 2: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

TABLE IAIP EXIT TAXIWAY DESIGN

Aircraft ICAO Category Procedural exit taxiway code Percentage

Heavy B4, B5 10%

Medium B5, B7 80%

Light B7, B9 10%

scription of the key characteristics of Vienna Airport (LOWW)and the data available. In Section III, the predictive model ispresented with an efficient and powerful classification boostedtree along with the problem tackled. In Section IV, variablesand potential precursors are explored and then extracted usingfeature engineering techniques. These features are then usedin a statistical learning approach to predict the runway exitin with a binary classification and ROT with a regressionin Section V, with a post-analysis of the results. Finally,conclusions and directions for future work are presented inSection VI.

II. VIENNA AIRPORT CHARACTERISTICS

The factors influencing actual runway occupancy times aresometimes not obvious and difficult to assess, even for experts.For example, some pilots familiar with an airport may take afurther exit in order to be closer to the assigned gate, especiallyif they are already delayed. In order to successfully build amachine learning model to predict ROT, it needs to be tunedspecifically for the target airport and its own characteristicssuch as topology, layout, design of the runways and exits. Inour case, the model covers Vienna airport (LOWW), which has2 runways used in both directions (R11/R29 and R16/R34 - seeFig. 1). Runway R34 will be the main focus as it concentrates46% of the traffic. Fortunately, because of the orientation ofthis runway, certain aspects such as the distance to the gatementioned before are not that relevant. Future studies willinclude runway R29 and test whether there is a differencebetween the influence of the assigned gate in both runways.

Landing distance and ROT are known to be influencedby airport characteristics such as runway length, slope, exittaxiway angle with respect to the runway [4]. However, withthis unique airport, those parameters can be considered asthe same for all the landing aircrafts. From a statisticalpoint, the variables remain constantly distributed for everysample in the dataset. Landing distance and ROT are also notperfectly correlated. For example, the same landing distancecan correspond to different ROTs because of different breakprofiles and velocities. The study does not take into accountspecific engineering aircraft technologies such as ”brake-to-vacate”, which could influence the ROT. In general terms,those flights will be considered outliers. In spite of this,some particular characteristics of the airport configuration arerelevant and will be confirmed by statistical analysis later on:

ATCOs use a mixed-operation configuration in R34 (i.e.both departure and arrivals on the same runway) in HIRO situ-ations. Obviously, the risk level of a higher runway utilisationin such conditions might be higher because of the reduced

Figure 1. LOWW Runway Design

buffer between planes. Therefore, airport configuration shouldbe included as a potential precursor (or a proxy of it, as theconfiguration was not available in the data and needed to beengineered - see Section IV for more details).

The Aeronautical Information Publication (AIP) of LOWWstipulates that ”To minimise the runway occupancy time, pilotsshould make use of the following procedures: (a) In general,an exit taxiway should be planned which is used after landingunder normal circumstances. Missing an earlier exit taxiwayand continuing slowly to the next exit taxiway should beavoided; and (b) If possible, the runway should be vacated viathe defined exit taxiway for each aircraft category”. In otherwords, there are advised, but not obligatory, exits dependingon the aircraft category (Heavy, Light, Medium, etc). Oneparticular problem is the imbalanced distribution of categoriesfor arrivals at LOWW airport, which is estimated at 80%Medium, 10% Heavy and 10% Light.

III. PROBLEM ASSESSMENT

To develop a reliable pattern recognition algorithm, not onlydoes the exit and/or the runway occupancy time of a aircraftneed to be predicted, but a precursors analysis must also beperformed in order to ensure that the ATCOs have relevant,reliable and detailed information before the actual landingwhen it is still useful. Experts from AUSTROCONTROL(ATCOs of the Vienna airport) have mentioned the potentialbenefits, stress-wise and risk-wise, of ATCOs getting access torobust predictions of a flight not taking the recommended exit.These predictions can cover aircraft spending additional time

Eighth SESAR Innovation Days, 3rd – 7th December 2018

2

Page 3: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

on the runway and help identify situations that could force a”go-around” if the runway is not vacated as expected.

Taking into consideration the ATCOs requirements andgoals, the problem can be summarised into two researchquestions:

• With what accuracy can we predict flights bypassing theexit taxiway defined in the AIP when the aircraft is 2NMaway from threshold?

• What is the error related to the forecast of the runwayoccupancy time (from threshold to complete exit taxiway)when the landing aircraft is 2NM away from threshold?

From the Machine Learning perspective, the first questioncorresponds to a binary classification problem, i.e. ’0’ for thecases in which the aircraft took the defined exit and ’1’ forthe cases in which it didn’t. The second question correspondsto a regression problem, i.e. the ’prediction’ of a continuousvalue.

The instant in which the prediction is made has been initiallyset to 2NM before threshold. As stated by ATCO experts,2NM before threshold represents an approximation of theirtime-window of focus. Indeed, ATCOs usually focus theirattention on the flights close to landing and do not rely asmuch on previous information. The controllers involved inSafeClouds.eu have decided that 2NM should be the referencedistance; just enough time to react but narrow enough of awindow to reduce the number of possible unwanted alerts.

In the dataset, 74% of the flights exit through the expectedtaxiway. Therefore, 26% of the flights behave ”abnormally”by not taking the expected exit. This means that the datais ”imbalanced”. As an initial study, the Machine Learningproblem presented in this section has been tackled as abalanced problem, further work is envisaged using specificimbalanced-class mathematical machinery.

IV. METHODOLOGYA. Data

The data was provided by AUSTROCONTROL and coversthe whole 2015 and segments of 2014 and 2016. It contains:

A radar track database - a concatenation of 4D trajecto-ries for flights defined by their callsign, date, aircraft reg-istration and ICAO category. Each trajectory is defined asP ∗i = [t∗i , X

∗i , Y

∗i , Z

∗i , R

∗i , V

∗i ] where each line of Pi accounts

for a new timestamp ti; (Xi, Yi, Zi) stands for the latitude,longitude (in degrees) and Flight Level of the flight; Vi recordsthe velocity of the flight and Ri is a boolean that values 1whenever the position of the aircraft is between the runwaythreshold and an exit taxiway. The resulting matrix is ofsize (ni, 6), ni being the number of observations of thatparticular flight i. The overall radar track database is thereforea concatenation of all the Pi of size (

∑i ni, 6).

An airport information database which returns the callsignand date of the flights along with some planned information asthe Estimated Time of Arrival (ETA), departure and arrivingairports, runway and gates assigned.

Different meteorological databases such as METAR,SNOWTAM or WMA. The information can be redundant

among the different datasets, but of relevance, wind speed,visibility and QNH were available.

Radar Track and Airport information databases are mergedtogether using callsigns and dates as unique keys, which filtersthe data as such: (a) flight trajectories spotted by the radar butnot landing or departing from the airport are automaticallydiscarded; (b) Matching uncertainties (e.g. redundant callsignover a same day) are also discarded. Note that, in thisdeliverable, only landings are taken into consideration andonly those that correspond to the studied runway. This finalpart of the data consists of 59.369 flights.

The parameter Zi is an altitude with respect runway. Thisaltitude is seen as a ”flight level” that is normally below thetransition altitude, and as a result depends on an isobaricpressure reference that may change. Also, the distance toarrival is absent from the data, therefore making it difficultto identify the point of prediction (2NM) as described inour forecasting problem. Thus, it is necessary to formatand complete the original data. The dataset incorporates thefollowing modifications:

Distance from arrival The vector of distances of theflight (at each time stamp) D∗

i from the reference point T,which indicates the threshold of runway 34, is added. Thethreshold T coordinates are defined as (Xthres, Ythres, Zthres)as (48.092222, 16.596667, 597). For each trajectory point, Di

is calculated as the great-circle distance from T, using theHaversine formula:

A = sin2(Xi −Xthres

2)+cos(Yi)cos(Ythres)sin

2(Yi − Ythres

2)

D = 2.H.atan2(√a,√1a)

with coordinates in radian and H = R + ELOWW , R beingthe mean radius of the earth and ELOWW = 597ft being theelevation of the runway 34.

Flight Level to Altitude Merging the data has also al-lowed to link all flights with a specific atmospheric pressuremeasured and reported in the METAR as the QNH (Query:Nautical Height). It is possible to approximate the altitudefrom the flight level using the following formula:

Zi(NM) = Zi(FL)∗100++288.15

0.0065 ∗ 0.3048(QNH

1013.25

0.0065∗2879.81

−1)

Angle to runway The angle, with respect with directionof the runway, has been computed. It shows the variationin trajectory with respect to the direction of the runway andindirectly to the ILS. The angle, in radians, is calculated withrespect to the threshold coordinates as:

φi = arctan(Xi −Xthres

Yi − Ythres)

Energy The energy of the flight might also seem a valueinteresting to manufacture. However, the kinetic energy of aflight depends on its mass. One solution around the problemis to compute the amount of energy per unit of mass. Anothercolumn has been added:

Eighth SESAR Innovation Days, 3rd – 7th December 2018

3

Page 4: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

KEi = Zi − ELOWW +V 2i

2.g)

Zi being in NM and g being the acceleration of gravity on thesurface of the earth at sea level.

B. Feature Engineering

Table II presents the most promising list of features ex-tracted from a preliminary data mining exercise. Note that,though not all of them are going to be relevant precursors,it is necessary to calculate them from the raw data in orderto assess their relevance. Table II presents both engineeredfeatures and available data within the databases. Also, pleasenote that some features are static information related to theflights and sometimes to the weather; others are extractedfrom dynamic characteristics of the flights (trajectories timeseries, distances and angles, etc.). Static information does notconvey problems and consists of unique information per flight(e.g. type of aircraft, state of the runway, etc.). Dynamicinformation, such as position, velocity, etc. are much morecomplex to understand physically in a model and to processin a learning algorithm as the amount of observations per flightis too high (>100), therefore increasing over-fitting (curse ofdimensionality [11]). Unless specific dynamic models are used(such as state space models, LTSM etc.), a strategy is neededto handle all the input time series. A two-step approach hasbeen used:

1) Time-Series are sub-sampled. This is the process ofreducing the number of samples of each series without losingsignificant information for the case study. In this particularcase, only the information at [10, 9.5, . . . , 2.5, 2]NM has beenconsidered instead of the whole time series using the alreadycomputed value Di. The sub-sampling granularity normallyreduces the computational requisites of the pre-processingsteps as the expense of losing some information. In our case,similar results are expected independent of the granularity ofthe sampling.

2) Then features extraction is performed based on a pre-liminary data inspection and on domain-experts (pilots andATCOs) experience. The extraction was also performed usinga non-supervised clustering of the time series in order torecognise patterns. However, the results of the clustering werenot satisfactory, partially due to mediocre tuning of the hyper-parameters.

In Table II, the features that have been extracted fromthe data or engineered are presented. An additional columndifferentiates between original features (raw information pre-sented the dataset) or engineered features (calculated duringthe analysis). Notice that the model is being trained with 26engineered features and only 4 ”original” features . This is in-tended as, apart from reducing the dimensionality of the time-series, the nature of the machine learning algorithm used alsoinfluences that decision. While a powerful enough algorithmcan identify how using QNH to transform the Flight Levelinto altitudes helps improve predictions, many processing stepsare too complex for an algorithm to automatically learn the

patterns without some guidance. Note that some studies liketo use data reduction techniques, eg. Principal ComponentAnalysis (PCA), for those situations. However, the featuresextracted using this methodology are not easily interpretableand not useful in the precursor analysis.

The final dataset defined to the learning algorithm is ofshape (59369, 34), with 59.369 being the number of flightsand 34 the number of features used to train the model.

C. Machine learning algorithm

Gradient Boosting (GB) Frameworks (also known as Gra-dient Boosting Machines, GBM) [12] are powerful techniquesfor building predictive models. They select an arbitrary dif-ferentiable loss as the objective function and uses an additivemodel of many weak learners - typically regression trees - tominimize this loss. The parameters of the additional decisiontrees are tuned by a gradient descent algorithm. This method-ology has gained popularity recently alongside continuousdevelopment of the well-established Random Forest algorithm[13]. The main advantage of using GBMs over other MLalgorithms is that the model is iteratively trained. For eachnew round, the model uses data samples that were ”difficult”to learn in previous iterations. This is perfect behaviour forthis particular problem in addressing the 26% of flights thatare not taking the AIP exit.

There are two iwdely-used GBMs in the data sciencecommunity: XGBoost [14] and LightGBM [15], [16]. Theformer is very popular among Kaggle community and hasbeen used for many competitions. The latter is a newcomerwith several improved features and has already been appliedsuccessfully to fields like genomics [17] or acoustics [18]. Allthese methods can be used both as a classifiers or as regressors.Specifically, LightGBM:

• Uses histogram based algorithms, which aggregates con-tinuous features into discrete bins to speed up trainingand reduce memory usage.

• Grows the tree leaf-wise, which can reduce even morethe loss than a level-wise algorithm.

They are very similar in practice and when comparingresults of using Random Forest, XGBoost and LightGBM.However, LightGBM had slightly better overall classificationaccuracy. As a result, LightGBM was chosen as the MachineLearning algorithm (for both classification and regression)used in all the presented experiments.

D. Cross-validation

When evaluating the performance of a binary classificationusing Machine Learning algorithms, classical performancemetrics, such as plain accuracy, are usually not enough. Falsepositives and false negatives are usually overlooked when onlyaccuracy is used to understand the classifier. In this casestudy, a higher rate of false-negatives has arguably biggerconsequences than a higher rate of false-positives. A falsepositive alert would only trigger the attention of the controllerto a trivial situation (potentially distracting her from a riskiersituation requesting attention). However, false negatives might

Eighth SESAR Innovation Days, 3rd – 7th December 2018

4

Page 5: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

Name Source O/E DescriptionaircraftRegistration x Radar track O Registration tail of the aircraftICAOCategory x Radar track O ICAO category of the aircraft, i.e. Light, Medium or HeavyarrivalGate Airport information O Expected Gate in the Flight Plan

mixedOperation Airport information EComputed from airport information. Mixed Operation is fixedas True when at least one departure and one arrival aredetected in the runway in the preceding 15 minutes.

groundVisibility m METAR O Visibility measured by METAR

isRunwayWet METAR E Binary feature indicating if there is information on abnormalfriction on runways R88, R34 or R16 in the METAR report.

windGust METAR E Binary feature indicating if there is a wind gust value in theMETAR report or not.

windSpeed trans kts METAR E

Wind values reported by METAR are directed to the truenorth, a simple trigonometric correction has been applied toextract the perpendicular projection of the wind with respectto the runway: vwind.cos(αwind − 250) being vwind thereported wind speed (kts) and αwind the reported winddirection (degrees).

windSpeed parallel kts METAR E

Wind values reported by METAR are directed to the truenorth, a simple trigonometric correction has been applied toextract the parallel projection of the wind with respect to therunway: vwind.sin(αwind−250) being vwind the reportedwind speed (kts) and αwind the reported wind direction(degrees).

badVisibility METAR EBinary Feature indicating if cloud layer opacity reported byMETAR is higher than 5 okt and if the cloud layer height isless than 1500 ft.

Throughput Airport information E

Based on Estimated Times of Arrivals available on the airportinformation database, the calculated estimated throughput ofarrivals within the hour (i.e. 30 minutes before to 30 minutesafter the ETA of the observed flight)

aircraftRegistration y Airport information EThe following aircraft is extracted from the airport informa-tion, based on the ETAs. See discussion on the possible noisethis introduce.

ICAOCategory y Airport information E The following aircraft category (i.e Light, Medium or Heavy)

Distance next flight Radar track E The following flight distance from arrival when the observedaircraft is at distance of prediction (e.g. 2NM)

velocity next flight Radar track E The following flight velocity when the observed aircraft is atdistance of prediction (e.g. 2NM)

number breaks Radar track E

The number of significant decelerations that the aircraftperformed during landing phase up until the distance ofprediction. The threshold for significance has been chosenmanually.

max break Radar track E The value of the highest deceleration.

max break distance Radar track E The distance from arrival at which the highest decelerationhappened.

number acceleration Radar track E The number significant accelerations the aircraft performedduring landing phase up until the distance of prediction.

max acceleration Radar track E The value of the highest acceleration.

max acceleration distance Radar track E The distance from arrival at which the highest accelerationhappened.

mean slope Radar track E The mean slope of speed during landing phase over thesubsampled trajectory.

large angle Radar track E The angle between aircraft position at 10NM and the directionof the runway.

adherence distance Radar track E The distance at which the aircraft angle with respect to thedirection of the runway is inferior to 1

num flat intervals Radar track E The number of 0.5NM intervals in which the flight remainedat the same altitude.

last energy Radar track E Normalised Energy at 2NM.mean slope energy Radar track E Slope of energy over the subsampled trajectory.last speed Radar track E Speed at the moment of prediction.last FL Radar track E Altitude (feet) at the moment of prediction.

last angle Radar track E Angle of the aircraft position with respect the the runway atthe moment of prediction.

delay Radar track + Airport information E

Delay is approximated computing the difference betweenthe moment of the prediction and the ETA. Positive valuesindicate that the aircraft is already behind schedule (i.e. actualdelay).

diff speed Radar track E Difference between speed at the moment of prediction and0.5NM before.

diff alt Radar track E Difference between altitude at the moment of prediction and0.5NM before.

TABLE IIPOTENTIAL PRECURSORS, O/E STANDS FOR ORIGINAL/ENGINEERED

Eighth SESAR Innovation Days, 3rd – 7th December 2018

5

Page 6: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

be riskier as the algorithm falsely leads the ATCOs to ex-pect normal behaviour from the incoming plane, thereforeaugmenting the risk of the situation (e.g. if the followingflight has been given the authorisation to fly closer with theexpectation of the first plane to behave normally). To take suchinformation into account, we compute the Receive OperatingCharacteristics (ROC) curves: ROC curves are created byplotting the true positive rate against the false positive rate.The most widely used metric for performance is Area Underthe Curve (AUC) which equals the probability that a classifierwill rank a randomly chosen positive instance higher than arandomly chosen negative one. In other words, the higher theAUC, the better the classifier [19].

Although these algorithms have been trained with adequatevalidation functions to measure how the Machine Learningmodel fit training data, its stability needs to be validated.This is based on the assessment of how well the learner willgeneralise an independent/unseen dataset in the future, other-wise known as cross-validation. Cross-Validation is crucial toavoid problems such as overfitting (the model contains moreparameters that can be justified by the data) or selection bias(the selection of training data not being properly randomised).Cross-validation can be performed using different strategies.In thiscase, the K-Fold Cross-Validation was used: the datais divided in k subsets and the holdout method is repeated ktimes. Each time, one of the k subsets is used as the test datasetand the other k-1 subsets are used as the train dataset. The errorestimation is averaged over all k trials to calculate the totaleffectiveness of the model. This methodology is particularlyuseful because it reduces bias by using most of the data forfitting. It also reduces the variance because most of the data isalso used in the test dataset. For the experiments, the numberof folds K were fixed at 8.

In the dataset, and due to the problem with the imbalancein the flights taking the AIP exit, the algorithm will tend topredict the output of the most numerous class. To contrast thisstatistical effect, the stratified form of K-Fold cross validationwas used where all the folds are made by preserving thepercentage of samples for each class. This methodology,in combination with the Gradient Boosting framework, willalmost completely tackle the problem of having imbalancedclasses.

V. RESULTS

A. Binary classification case

The results of the classification return an accuracy of0.772 and an AUC of 0.784. These results are representedas a confusion matrix in Figure 3, where 92% of the flightsshown taking the expected exit (i.e. stipulated in the AIP)are classified correctly. However, it also shows that otherflights are only correctly classified 34% of times. We stronglybelieve that very little improvement can be achieved throughmathematical optimisation (i.e. improving the tuning of thealgorithms or the validation strategies). The test performed bythe authors (e.g. change of number of estimates of the trees,improve the number of folds of the validation, etc.) did not

Figure 2. Normalised Confusion matrix of the classification

improve the prediction in a significant way. This implies thatthere is information about the flight that is not accounted forand relevant to the exit deviation from the AIP.

Figure 3 offers a look at the relevant features that have beenused by the lightGBM algorithm to perform the classification.Delay can be seen as the most important value. A delayedflight might be more prone to take an exit close to its assignedgate even though it is not the one stipulated by the AIP.Hence the fact that the gate assigned is the second in linefollowed by the type of aircraft. One potential reason is thatATCOs grant pilots later exits if traffic situation permits andit reduces the taxi-time needed for the aircraft concerned. Thealignment of the flight with the ILS glide slope as the velocityof the aircraft remains important features. Of course, thevelocity of the following flight (i.e. next to land) is important.AUSTROCONTROL ATCOs confirmed the importance ofsuch a feature as depending on the distance between bothflights and their respective velocities. If needed, they can askthe pilot to exit early, if possible. Radio communication isabsent from accessible data, yet this particular feature mightaccount for situations in which ATCOs might contact the pilotto request an early exit. (This does not account for ATCOexperiences or nature, such as being more cautious than normalfor example.)

Several questions are present at this stage: What informationcan we be missing? Is it present in the data at hand at leastin an approximated way? Is it present later in time?

The same analysis was performed with all the data untilreaching threshold to find out whether something specific hap-pens between 2NM and threshold. The results were improvedvery slightly (AUC of 0.793) with the speed at threshold beingregistered as the most important feature for classification (seeFig. 4). Although such a tool (i.e. prediction at threshold)would be useless for ATCOs, it shows that, except for value ofthe speed at threshold, few things change from a prediction at2NM. Actually, Figure 5 shows how the performances hardly

Eighth SESAR Innovation Days, 3rd – 7th December 2018

6

Page 7: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

Figure 3. Feature importance when the prediction is performed at 2NM fromthreshold.

decrease even when predictions are made 5NM from threshold.Such behaviour is two-fold: (a) the several static features

(i.e. does not change during the course of a flight, such as theaircraft registration number) are highly ranked in the featureimportance of the classification; (b) the values of the features(e.g. last speed) vary depending on distance of prediction.In this scenario, variability across all flights may stay equalgiven the low effect of said values on prediction, thereforeproviding an equally satisfactory classification. To push thisfurther, all the features were discarded that have been extractedfrom dynamic parameters of the flight, restraining the modelto a set of much reduced features (11 static features presentin Table 2). The result of AUC of 0.718 is surprising. In otherwords, the addition of dynamic features only improves theprediction by 9%. Figure 7. shows how the model is still ableto predict 96% of the flights following the AIP, but reducedits precision in the other class of flights by identifying only14% of them in comparison to 35% found in Figure 2. Theeffect of dynamics features is much more appreciable this waybut still too negligible to fully cover unexpected cases. Pleaserefer to the discussion section for possible ways to improvethis prediction.

B. Continuous case

Here, we wanted to directly forecast the runway OccupancyTime of the flight using the same features as before. light-

Figure 4. Feature Importance when prediction performed at threshold.

Figure 5. AUC as a function of prediction distance.

GBM and other tree decision-based algorithms also proposea solution for that. The forecast of a continuous variableinvolved performing not only the splitting of each feature, buta regression for each new sub-sample created. Based on that,a prediction has been obtained with a mean absolute error of 8seconds. Taking into account that the mean ROT in LOWW isof 49.66 seconds with a standard deviation of 14.4, the resultscan seem satisfactory at first. Figure 8. shows how slightlymore than 80% of the ROTs are predicted with less than 14

Eighth SESAR Innovation Days, 3rd – 7th December 2018

7

Page 8: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

Figure 6. Cumulative Probability of Occupancy time prediction Error

Figure 7. Absolute Error distributions

seconds of error, which is the standard deviation of the realROT.

However, a closer look shows that this model suffers fromthe same limitations as the binary classification presentedbefore. In post-analysis, dividing the flights into those thatactually followed the AIP and those that didn’t, the MAEpasses from 5.7 seconds to 15 seconds for, respectively, amean ROT of 45 and 62 seconds. In other words, the ratioerror to value passes from 12.5% to 24%, underlining onceagain how the models struggle to correctly identify the flightsnot following the norm.

VI. DISCUSSION

Within this contribution we pave the way towards a machinelearning tool that would provide ATCOs a more accurateforecast of the behaviour of the landing flights. In the future,such a tool could help ATCOs confidently reduce the sepa-ration between consecutive flights and maximise the runwayoccupancy while keeping or improving its safety.

The first step consists of a quantitative assessment of theknowledge present in the data available to ATCOs. Based ontheir operational data, we have been able to train a model thataccurately predicts (96%) when a flight will take an expectedexit. However, the model can only spot 36% of flights thatbehave against the AIP specifications. When the same problemis formulated in a continuous way, the model is able to predictthe ROT of the landing flight with an average absolute errorof 8 seconds, which is reduced to an average absolute error of

5.7 seconds when the flight behaves as expected by the AIPand 15 seconds otherwise.

In light of the results, three improvements are to be devel-oped further:

First, considering that the landing procedures are three-fold : a) the flare manoeuvre, b) the point of the main geartouchdown to the point where the nosewheel touches down,and c) the ground braking distance and roll. ROT depends onthe aircraft braking capability and on the pilots technique andpreference, but ATCOs have no information regarding brakingtechnique. The next step in SafeClouds.eu is to take advan-tage of the presence of Flight Data Monitoring informationin order to complete the features with specific informationregarding the pilot braking plans (e.g. AUTOBREAK status,whether spoilers have been armed before 2NM, etc.). If suchinformation helps improve the prediction, the next step wouldbe to find a proxy for these elements (e.g. if weight, availablein FDM, is an important parameter for the model, data miningtechniques can infer them from trajectories [20]). If such proxyis not possible, this will emphasize the need of more datasharing to the ATM/ATC community.

Second, it is still possible to improve the model presentedin this contribution. We suspect a considerable amount ofnoise to be present in the data. For example, the filtering done(i.e. matching irregularities) might cause information such asthe identification of the next landing flight to be spurious. Inaddition, the actual tool uses the estimated time of arrivals tospot the next landing flight. While such a solution helped limitthe computer requirements of the pre-processing steps, sucha gain comes at a cost as next flight identification might beerroneous. A searching algorithm should be added in order toensure the next landing flight in real-time.

Finally, the data processed here do not reflect the reality ofthe operational environment of the ATCOs. ATCOs can requestearly exits from the pilots through radio communication with-out any recording of the event. The presented work has triedto create features that would identify such events (e.g. suddenand strong deceleration), however, the results are mitigated.FDM presents a specific feature which states the status of theradio communication that may help in that sense, but fromthe ATC perspective, new features should be found in orderto better cover those cases.

These results and the rest of SafeClouds.eu project willhopefully pave the way towards a complete understanding ofrunway occupancy precursors and how to forecast them usingavailable data. More airports are joining SafeClouds.eu to helpimprove the tool presented in this study. Such tools might be agood addition to the TBS protocol already tested and validatedin several airports in Europe, such as Heathrow.

REFERENCES

[1] R Horonjeff, RC Grassi, RR Read, and G Ahlborn. A mathematicalmodel for locating exit taxiways. Institute of Transportation and TrafficEngineering, University of California, Berkeley, 1959.

[2] William E Weiss and JN Barrer. Analysis of runway occupancy timeand separation data collected at la guardia, boston, and newark airports.Technical report, MITRE CORP MCLEAN VA, 1984.

Eighth SESAR Innovation Days, 3rd – 7th December 2018

8

Page 9: A BoostedT ree Frameworkfor Runway Occupancy and ExitPrediction · 2018-12-05 · Fig. 1). Runway R34 will be the main focus as it concentrates 46% of the traffic. Fortunately, because

[3] Terry A Ruhl. Empirical analysis of runway occupancy with applicationsto exit taxiway location and automated exit guidance. TransportationResearch Record, (1257), 1990.

[4] Steven E Koenig. Analysis of runway occupancy times at major airports.Technical report, MITRE CORP MCLEAN VA METREK DIV, 1978.

[5] Byung Kim, Antonio Trani, Xiaoling Gu, and Caoyuan Zhong. Com-puter simulation model for airplane landing-performance prediction.Transportation Research Record: Journal of the Transportation ResearchBoard, (1562):53–62, 1996.

[6] Antonio A Trani, AG Hobeika, BJ Kim, V Nunna, and C Zhong.Runway exit designs for capacity improvement demonstrations. phase2: computer model development. 1992.

[7] Charles Morris, John Peters, and Peter Choroba. Validation of thetime based separation concept at london heathrow airport. In 10thUSA/Europe ATM R&D Seminar, Chicago, volume 10, 2013.

[8] Hartmut Helmke, Ronny Hann, Maria Uebbing-Rumke, Daniel Muller,and Dennis Wittkowski. Time-based arrival management for dual thresh-old operation and continous descent approaches. In 8th USA/EuropeATM Seminar, Napa, CA, USA, 2009.

[9] Milan Janic. Toward time-based separation rules for landing aircraft.Transportation Research Record: Journal of the Transportation ResearchBoard, (2052):79–89, 2008.

[10] Vincent Treve Goce Nikolovski and Floris Herrema. Local tbs delayreduction effect on global network operations. In 2018 InternationalConference on Research in Air Transportation, Barcelona, Spain, 2018.

[11] David L Donoho et al. High-dimensional data analysis: The curses andblessings of dimensionality. AMS math challenges lecture, 1(2000):32,2000.

[12] Jerome H Friedman. Greedy function approximation: a gradient boostingmachine. Annals of statistics, pages 1189–1232, 2001.

[13] Andy Liaw, Matthew Wiener, et al. Classification and regression byrandomforest. R news, 2(3):18–22, 2002.

[14] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boostingsystem. In Proceedings of the 22nd acm sigkdd international conferenceon knowledge discovery and data mining, pages 785–794. ACM, 2016.

[15] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, WeidongMa, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradientboosting decision tree. In Advances in Neural Information ProcessingSystems, pages 3146–3154, 2017.

[16] Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, andYi Zhang. Deep embedding forest: Forest-based serving with deepembedding features. In Proceedings of the 23rd ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, pages1703–1711. ACM, 2017.

[17] Dehua Wang, Yang Zhang, and Yi Zhao. Lightgbm: an effectivemirna classification method in breast cancer patients. In Proceedingsof the 2017 International Conference on Computational Biology andBioinformatics, pages 7–11. ACM, 2017.

[18] Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia,Emilia Gomez Gutierrez, and Xavier Serra. Acoustic scene classi-fication by ensembling gradient boosting machine and convolutionalneural networks. In Virtanen T, Mesaros A, Heittola T, Diment A,Vincent E, Benetos E, Martinez B, editors. Detection and Classificationof Acoustic Scenes and Events 2017 Workshop (DCASE2017); 2017Nov 16; Munich, Germany. Tampere (Finland): Tampere University ofTechnology; 2017. p. 37-41. Tampere University of Technology, 2017.

[19] Jin Huang and Charles X Ling. Using auc and accuracy in evaluat-ing learning algorithms. IEEE Transactions on knowledge and DataEngineering, 17(3):299–310, 2005.

[20] Junzi Sun, Joost Ellerbroek, and Jacco M Hoekstra. Aircraft initial massestimation using bayesian inference method. Transportation ResearchPart C: Emerging Technologies, 90:59–73, 2018.

Eighth SESAR Innovation Days, 3rd – 7th December 2018

9