conceptual framework using dl for airport cep … · conceptual framework using dl for airport cep...

CONCEPTUAL FRAMEWORKUSING DL FOR AIRPORT CEP

Wei Lin, Ph.D.Chief Data Scientist, Senior Manager, Americas Consulting PracticeApplication and Big Data/IoT TransformationDell [email protected]

Bill SchmarzoCTO, Big Data Analytics Consulting PracticeDell [email protected]

Knowledge Sharing Article © 2017 Dell Inc. or its subsidiaries.

2018 Dell EMC Proven Professional Knowledge Sharing 2

Table of Contents

Abstract ............................................................................................................................................................. 3

1. Introduction................................................................................................................................................. 3

2. Conceptual Prototype for Airport Event Processing .................................................................................... 4

2.1 Airport ..................................................................................................................................................... 4

2.2 Deep Learning to Process Image Data Introduction ................................................................................ 7

3 Feature Recognition and Event Trigger Processing .................................................................................... 9

3.1 Features Selection .................................................................................................................................. 9

3.2 DL Recognition ..................................................................................................................................... 14

3.3 LSTM Responses Trigger ..................................................................................................................... 18

4 Conclusion and Future Work .................................................................................................................... 24

References ...................................................................................................................................................... 25

Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do

not necessarily reflect Dell EMC’s views, processes or methodologies.


Abstract Airport event processing is known for its complexity and it is termed as complex event processing (CEP). In this

conceptual framework, Deep Learning (DL) techniques are leveraged to simplify the processing by encapsulating

the events sequence and responses by recognizing event and then, generating logical responses via supervised

learning. Otsu, CNN, and LSTM are used in this conceptual framework to perform feature extraction, pattern

recognition and predict the responses.

The emergency scenario is a fire on an airliner parked on the tarmac captured via video feeds. The situation is

to first recognize the event and then generate four responses, e.g. 1. Trigger alert to Fire Department (to

extinguish fire), 2. Trigger alert to Law Enforcement (to barricade scene of fire and evacuate passengers), 3.

Trigger alert to Command Post (to coordinate air traffic tower and flights) 4. Trigger alert to Airport Operations

(to clear path and facilitate Fire Department and Law Enforcement activities).

1. Introduction Airport event streaming analysis tracks and analyzes continuous information to identify areas of interest and

derive a conclusion from them. Airport complex event processing (CEP) is event processing that combines data

from a wide variety of diverse data streams to infer events or patterns that suggest more complicated and/or dire

situations. The goal of CEP is to identify meaningful events (such as opportunities or threats) and respond to

them quickly and with the most relevant response.

Airports in particular provide a number of well-known examples of CEP opportunities. Airport events may be

happening across the various layers of an airport such as airport lobby, ticket counters, Customs gates,

terminals, tarmac, airliners, ground operations, customer service, rental car counters, etc. and in format of text

messages, sensors input, video, audio, images, audio communications, equipment locations, traffic reports,

weather reports, or other kinds of data.

An event may also be defined as a "change of state," when a measurement exceeds a predefined threshold of

time, temperature, or other value or new events could be triggered. CEP provides organizations a correlated way

to analyze patterns in real-time and help the business and operation communicate better with service

departments. This Knowledge Sharing paper lays out an approach for “mining” official manuals and documents

to codify and enforce policies during emergency events.

In this paper, we present a simplified airport CEP scenario with an airliner on fire on the tarmac. The sensor

inputs will focus on identifying, analyzing and classifying an image and then generating (predicting) the most

relevant event responses. The steps include

1. Event-pattern detection (feature extraction)

2. Event abstraction (convert image to knowledge)

3. Event aggregation (aggregate across time stamp)

4. Event response triggers (trigger alerts to departments)

The analytical cycles used in this content consist of Descriptive, Exploration, Predictive, and Prescriptive

Analytics. These four essential analytics are used to understand emerged patterns, convert outliers into

controllable variables, estimate time to events, and make real time predictions that lead to analytics-driven smart

interactions.


This paper is arranged in sections.

Section 1 is an Introduction of the Analytics Framework for a complex event processing

Section 2 describes the overall entities (airport, airport CEP architecture, CNN, LSTM) in this study

Section 3 describes details of image (video frame) feature extraction, recognition, and response forecast

analytics aspects

Section 4 is the conclusion and future work

2. Conceptual Prototype for Airport Event Processing This section will introduce the entities in the study. They are airport, airport CEP functional architecture,

Convolution neural network (CNN), and Long Short Term Memory (LSTM).

2.1 Airport Airports are divided into landside and airside areas. Landside areas include parking lots, public transportation,

train stations and access roads. Airside areas include all areas accessible to aircraft, including runways, taxiways

and aprons. Access from landside areas to airside areas is tightly controlled at airports. Passengers on

commercial flights access airside areas through terminals, where passengers can purchase tickets, clear security

check, or claim luggage and board aircraft through gates. The waiting areas which provide passenger access to

aircraft are typically called concourses, although this term is often used interchangeably with terminal. The area

where aircraft park next to a terminal to load passengers and baggage is known as a ramp (or tarmac). Parking

areas for aircraft away from terminals are called aprons. Airports can be towered pending on air traffic density

and available funds. Due to their high capacity and busy airspace, many international airports have air traffic

control located on site. Airports with international flights have customs and immigration facilities. International

flights often require a higher level of physical security, although in recent years, many countries have adopted

the same level of security for international and domestic travel. Airports provide commercial outlets for products

and services. Most of the vendors are located within the departure areas. These include clothing boutiques, and

major fast food chains. Some airport restaurants offer regional cuisine specialties for those in transit so that they

may sample local food or culture without leaving the airport.

Aircraft and passenger Boarding Bridges Maintenance, Pilot Operations, Commissioning, Training Services, aircraft rental, and hangar rental are most often performed by a fixed-base operator (FBO). At major airports, particularly those used as hubs, airlines may operate their own support facilities. Figure 1 shows Airport functional areas.


Figure 1. Airport functional areas (using SFO as an example)

The airport conceptual complex event processing framework [3] consists of three layers. The base layer is a data

integration layer. Second layer is the application integration layer and the third layer is the presentation layer.

The three layers are shown in Figure 2.

Figure 2.Airport CEP functional areas [3]


(1) Hadoop will be used as the data content, modeling repository and API service facility. Hadoop will also

be used for analytical models development, performance monitoring and continuously training against

historical data.

(2) Data layer’s primary function is to integrate or blend data from the different data sources. Two universal

dimensions (time and location) are used to align further of asynchronies data. Data from (6) could align

thee cores aspects, e.g. people (airport employees, airline employees, passengers), processes (staff

schedule, staff training, customer services) and technologies (server, hardware, software) and then

correlate to events/events’ proximity to establish relationships/heat map or seasonality.

(3) Application layer interfaces services to perform orchestration and decision support. The core of this layer

is a business rule engine which determines the events’ possible next best actions and generate alerts

and route to the corresponding parties with location (GIS), event content alone with location and time.

The operation decisions will be transmitted via external system connectors and obtain the receiving status

of the external systems.

(4) Presentation layer is like a single pane of glass containing status of airline schedules, gate occupancies,

passengers’ throughput, airport RFID devices streaming, and facilities status logs. The event monitoring

could be drilled down to equipment’s heart beats, equipment streaming logs, system operational ranges,

KPIs, and GIS /Time stamp-enriched information. The KPIs are used as baseline to measure outliers,

and ROI of improvements,

(5) The external systems contain two catalogues, e.g. external streaming data services sources such as

weather, traffic and integrated auxiliary (response) systems such as Fire Department, TSA, and Law

Enforcement. The auxiliary systems could receive alerts and responses to alerts requests from airport.

(6) Airport internal streaming data sources are ingested through data layers. This is a bi-direction data

transmission, e.g. CEP could adjust sensors parameters such as camera’s lance angle. The type of

sensors data include, but are not limited to, WiFi, RFID, Video, Audio, and Image.

For the scenario in this paper, the entities involved are Video, Data Layers, Business Rule Engine, Alert Service

and External system connectors and Requested Response Units. Those entities are highlighted in Figure 2.

The functional architecture could be translated into generic Hadoop solution architecture as shown in Figure 3.

The complex event processes could map into the core data platform highlighted in red.

For internal data ingestion, external data integration and alerts communications, this reference architecture could

impose different security requirements. This will match external systems connectors’ requirements as well.

The machine learning real-time analytics containers could process data feeds independently and/or collectively.

The considerations of distributing owned/shared data in physical/virtual/cloud could reside in the data marts with

visualization environment and adjustable computing power allocation by optimized computing power in

hardware. This will match business rules engine and alerts services well.

The Dev/Test/Prod clusters could support DL/ML development and performance monitoring.

The presentation layer will match to subject area workgroups to track KPIs and ROI of business actions by their

feedback.

The inbound channels will ingest airport sensors inputs.


Figure 3 Hadoop Reference architecture

2.2 Deep Learning to Process Image Data Introduction Video inputs are one of the key airport data sources and it consumes significant time and effort for the airport

monitoring center to properly monitor and analyze the video feeds. To correlate video streaming content

(connected time stamps T(1 to n)) and objects within image would be more complicated.

Advances in Deep Learning (DL) makes general image recognition possible. The basic construct of DL is

connected layers of artificial neural networks where each layer is responsible for extracting features of the image

constructed in a prior layer (or input layer).

There are two basic ways to prepare an image. 1. Greyscale: an image will be converted to greyscale (range of

gray shades from white to black). Each pixel has a value based on dark degree and convert image into array for

computing. 2. RGB Values: an image’s color can be represented as RGB values (a combination of red, green

and blue ranging from 0 to 255). The result of each RGB value is extracted and put in an array for interpretation.

The basic procedure when matching a new image to known, annotated images is to convert the image to an

array by using the same technique, then compare the numbers patterns against the already-known objects.

Then, it computes the confidence scores for each class. The class with the highest confidence score is the

predicted one. DL extends the pattern matching to features matching, increasing the confidence significantly.

One of the most popular techniques used to improve the accuracy of image classification is Convolutional Neural

Networks (CNN) [7]. CNN is a special type Neural Networks that works in the same way of a regular neural

network except that it has a convolution layer at the beginning

Instead of feeding the entire image as an array of numbers, the image is broken up into a number of tiles and

the machine then tries to predict what each tile is. Finally, the computer tries to predict what’s in the picture

based on the prediction of all the tiles. This allows the computer to parallelize the operations and detect the

object regardless of where it is located in the image. In general, in a deep convolutional neural network, several

layers are stacked and are trained to the task at hand. The network learns several low/mid/high level features at

the end of its layers.


Residual Network (ResNet) [2] in Figure 4 is a variant of CNN. In residual learning, instead of trying to learn

mapping features, it tries to learn residual. Residual can be simply understood as subtraction of features learned

from input of that layer. Thus, the Feature residual would be H(x) –x. The layers learned approximate residual

function would be F(x) = H(x) – x, thus H(x) = F(x) +x. Thus, the F(x, {w}) could represent multiple CNN layers.

ResNet does this using shortcut connections (directly connecting input of nth layer to some (n+x)th layer. It has

proved that training this form of networks is easier than training simple deep convolutional neural networks and

also resolves the problem of degrading accuracy.

Figure 4 Resnet configuration [2]

A recurrent neural network (RNN) [8] is one type of artificial neural network using its internal memory to process

sequential inputs. This makes it applicable to tasks such as predicting most likely next events. Basic RNN is

configured as a network of neuron-like nodes, each with a directed (one-way) connection to other nodes. Each

node has a time-varying activation and each connection has a modifiable weight. Nodes are either input nodes,

output nodes or hidden nodes.

For RNN in supervised learning in discrete time settings, sequences of real-valued input vectors arrive at the

input nodes as one vector at a time. At any given time step, each non-input unit computes its current activation

(result) as a nonlinear function of the weighted sum of the activations of all units that connect to it. Supervisor-

given target activations can be supplied for output units at later time steps. For example, if the input sequence

is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a

label classifying the digit.

Each sequence produces an error as the sum of the deviations of all target signals from the corresponding

activations computed by the network. For a training set of numerous sequences, the total error is the sum of the

errors of all individual sequences.


3 Feature Recognition and Event Trigger Processing Complex event analysis is focusing on recognition and prediction. One important aspect in enhancing event

analysis is semantic-level video analysis of activity and event understanding, which aims at accurately

describing video contents using key semantic elements, such as activities and events.

Unconstrained videos are qualitatively very different and even more challenging than widely-used video

datasets, in which video clips contain fairly coherent single action or atomic event occurring within a short

duration. A series of temporal structure analysis methods could specifically design to tackle these complexities.

Integrated with other vision techniques, an integrated approach can extend the domains of video that can be

understood by machine vision systems.

The approaches in this section lay out the steps that simplify airport complex event analysis using DL. The

given time sequence of images are the scenario of an airplane on fire. The steps include 1. Feature selection:

identify “fire” section of the image. 2. Sequential Image recognition via CNN: recognize T1 image content to T4

image content with/out using feature selection region, 3. Response prediction via LSTM: based on the event to

raise alerts to corresponding auxiliary departments (e.g. Fire Department, Command Post, Law Enforcement

and others).

3.1 Features Selection For the Feature selection, Otsu's method [9] is used to automatically perform clustering-based image

thresholding via the reduction of a gray level image to a binary image. The algorithm assumes that the image

contains two classes of pixels following bi-modal histogram (foreground pixels and background pixels). It then

calculates the optimum threshold separating the two classes so that their combined spread (intra-class variance)

is minimal so that their inter-class variance is maximal.

Otsu's method exhaustively searches for the threshold that minimizes the intra-class variance (the variance

within the class), defined as a weighted sum of variances of the two classes as shown in Eq 1

(EQ 1)

Where 𝜎 is variance of the two classes (0 and 1) and ω is probability of the two classes (0 and 1). t is the

threshold to separate this two classes. The class probability ω(t) is computed from the L bins of the histogram:

(EQ 2)

Otsu method shows that minimizing the intra-class variance is the same as maximizing inter-class variance.

(EQ.3)

and which is expressed in terms of class probabilities ω and class means µ. Thus the class mean µT(t) would

be


(EQ4)

The following relations can be easily verified.

(EQ5)

The class probabilities and class means can be computed iteratively. This yields an effective iterative algorithm.

The code segment below is used to process an image (video frame) of airplane cargo fire. The objective is to

identify the range of pixels that represent anomaly, which is cargo fire in this case.

This list of libraries in Figure 7 are used in image segmentation Otsu algorithm.

Figure 7. Python Libraries used in the Otsu algorithms

Given a frame (airplane_fire.png), the first step is to normalize the intensity to scale it to 0 and 1.

Figure 8. Image normalizing

R, G and B are extracted out for processing. R is selected to distinguish flame with other objects. Figure 10 and

Figure 11 show the original image and its R/G/B filtering.

Figure 9. Image R/G/B filtering


Figure 10 shows the original image and Figure 11 show the results of execution R/G/B filtering.

Figure 10. Image of an airplane on fire.

Figure 11. Image R/G/B filtering

Next, Otsu algorithm is used to calculate threshold. Threshold is defined as image’s front and background

segments threshold. It is assumed that image has binomial distribution between front and background.

The histogram is binning into 1000 bins. Figure 12 shows the code and its front/background segmentation at

otsu threshold = 0.5134.


Figure 12. Image front/background segmentation at otsu threshold =0.5134

By incrementing the threshold by 0.2 (1.2, 1.4, 1.6, 1.8, 2.0), Figures 13 to Figure 17 display front/background

segmentations at incremental otsu thresholds.







Once the front/background image segment identified with 1.8* threshold (otsu threshold = 0.9241), the image

will be mapped back to original image.


The process is to overlay the region of the image (with otsu threshold =0.9241) and return the four corners’ pixels

(350,1047,440, 1203) related to the front ground object identified in Figure 18.

Figure 19. Image overlay to obtain object identified with otsu threshold =0.9241


Figure 20 shows the result.

Figure 20. The airplane cargo fire segmentaion.

The selected object is further cropped and ready for recognition shown in Figure 21.

Figure 21. Isolate airplane cargo fire segmentaion

3.2 DL Recognition Image recognition is using pre-trained CNN (Vgg16, Vgg19, Inception, Exception, and Resnet). The computing

stack are Python, Spark, Tensorflow, and Keras [5]. The stack is shown in Figure 22.

Figure 22. The computing stack.


For the pre-trained CNN Keras code, the following code segments represent types of models in considerations.

Figure 23. First 2 segment of Keras pre-trained CNN code segments

Figure 24. Last segment of Keras pre-trained CNN code segments

The timestampped frames in temportal format [1] are showing first 4 intervals. Figure 25 shows the first 4 T

(timestampped) images.

Figure 25. Airplane cargo fire (T1 – T4)

A voting model is leveraged to increase the accuracy of classification via general image net. For T1, 5 models

(Vgg16, Vgg19, Inception, Exception, and Resnet) [6] are executed and “Airliner” is the top ranked matched.

Figure 26 to Figure 30 shows the results.


Vgg16: T1 top ranked match is Airliner 36.38%

Figure 26. Vgg16 T1 top ranked match is Airliner 36.38%

Vgg19: T1 top ranked match is Airliner 94.25%

Figure 27. Vgg19 T1 top ranked match is Airliner 94.25%

Inception: T1 top ranked match is Airliner 92.59%

Figure 28. Inception T1 top ranked match is Airliner 92.59%

Exception: T1 top ranked match is Airliner 69.61%

Figure 29. Exception T1 top ranked match is Airliner 69.61%

Resnet: T1 top ranked match is Airliner 95.17%


Figure 30. Resnet T1 top ranked match is Airliner 95.17%

T= 1 example

Figure 31. Example of T1.

The process is repeated for all timestampped images. T4 contains addition image segmentation as shown in

Figure 21.

T=4

Figure 32. Resnet T4 top ranked match is fire_screen 99.35%

The “fire” is recognized as fire screen by out of box pre-trained model. Additional learning transferring would

need to be conducted.

Thus, combining T1 + T4 and video telemetrary, the event will be described as

Airliner is on fire at <location> of time T1.


3.3 LSTM Responses Trigger To predict who to involve and what to do when an airliner is on fire, we use airport emergency manuals [4] as

training samples for LSTM to learn, predict and enforce official airport policies to follow during an emergency

situation. We will decompose the manual into individual letters in order to be able to predict the next letter for a

seed sentence and determine or score its relevance to the emergency situation.

The Emergency manual alphanumerical letters/numbers and special characters are translated into real numbers

shown below.

{'\n': 0, ' ': 1, '"': 2, '#': 3, "'": 4, '(': 5, ')': 6, '*': 7, ',': 8, '-': 9, '.': 10, '/': 11, '0': 12, '1': 13, '2': 14, '3': 15, '4': 16, '5':

17, '6': 18, '7': 19, '8': 20, '9': 21, ':': 22, ';': 23, '?': 24, '[': 25, ']': 26, 'a': 27, 'b': 28, 'c': 29, 'd': 30, 'e': 31, 'f': 32, 'g':

33, 'h': 34, 'i': 35, 'j': 36, 'k': 37, 'l': 38, 'm': 39, 'n': 40, 'o': 41, 'p': 42, 'q': 43, 'r': 44, 's': 45, 't': 46, 'u': 47, 'v': 48, 'w':

49, 'x': 50, 'y': 51, 'z': 52, '–': 53, '’': 54, '“': 55, '”': 56, '•': 57}

Thus, for a sentence like command post for all emergencies, a joint fixed-position and/or mobile command post

will be established, its letters/characters to numerical translation matches to

pattern = [29, 41, 39, 39, 27, 40, 30, 1, 42, 41, 45, 46, 1, 32, 41, 44, 1, 27, 38, 38, 1, 31, 39, 31, 44, 33, 31, 40,

29, 35, 31, 45, 8, 1, 27, 1, 36, 41, 35, 40, 46, 1, 32, 35, 50, 31, 30, 9, 42, 41, 45, 35, 46, 35, 41, 40, 1, 27, 40,

30, 11, 41, 44, 1, 39, 41, 28, 35, 38, 31, 1, 29, 41, 39, 39, 27, 40, 30, 1, 42, 41, 45, 46, 1, 49, 35, 38, 38, 1, 28,

31, 1, 31, 45, 46, 27, 28, 38, 35, 45].

For example, “command” is (c to 29), (o to 41), (m to 39), (m to 39), (m to 39), (a to 27), (n to 40), (d to 30).

The LSTM construct is using two connected layers with 256 RNN nodes in each layer. The dropout is set to 0.2.

Figure 33 and 34 show the code segments in training mode.


Figure 33. Code segment of LSTM training using airport emergency procedure manual

The code segment shown above is using 100 characters as input/output pair. Training data set is assigning 64

pairs in a batch and run 50 epochs for the training.

Figure 34. Code segment of LSTM construct and training

The epochs in the training mode (first 5 epochs are shown in Figure 35) display the monotonic loss decreasing

at Figure 36. Multiple training cycles are conducted to obtain best performance. Each epoch, weights matrix will

be saved in a predefined filename.


Figure 35. LSTM training epochs and corresponding “loss” measurements

Figure 36. LSTM training epochs and corresponding “loss” reduction

For the retrieval, most of the code are the same except the loading back the weights back to LSTM.

Figure 37. The LSTM common code segments for training and retrieval

2.9

75

2.7

49

82

.54

56

2.3

00

42

.09

54

1.9

51

71

.83

92

1.7

48

31

.66

69

1.5

98

91

.53

26

1.4

82

31

.43

08

1.3

85

81

.33

87

1.2

93

91

.25

45

1.2

21

91

.19

05

1.1

53

11

.12

77

1.0

99

71

.07

06

1.0

45

31

.01

21

0.9

94

40

.97

60

.95

37

0.9

31

70

.91

46

0.8

98

40

.87

97

0.8

65

60

.85

01

0.8

34

30

.82

95

0.8

08

80

.79

87

0.7

82

60

.77

76

0.7

61

0.7

49

0.7

39

70

.73

80

.72

32

0.7

18

30

.70

54

0.6

42

70

.64

24

0.6

42

1

1 2 3 4 5 6 7 8 9 1 01 11 21 31 41 51 61 71 81 92 02 12 22 32 42 52 62 72 82 93 03 13 23 33 43 53 63 73 83 94 04 14 24 34 44 54 64 74 84 95 0

LOSS


The best weights matrix (weights-improvement-50-0.6724.hdf5) is loaded back LSTM model and execute by the

seeds phase.

Figure 38. The LSTM common code segments for retrieval

Event trigger is using the key phase “Airliner is on fire ……”. The seeds were selected as the following

1. " command during all fire or medical related emergencies, the senior los angeles fire department offic "

2. " above with the los angeles fire department. provide barricades or means to secure contaminated area

"

The output of LSTM is shown in Figure 39 and 40.

Figure 39. LSTM output by “fire” seed 1.

Figure 40. LSTM output by “fire” seed 2.


By combining outputs and perform nouns ranking, Figure 41 shows the next step

Figure 41. Extract high frequency nouns from LSTM outputs

Using the noun ranking shown above, law enforcement is selected as next candidate for cascaded prediction.

For Law Enforcement, seeds are selected as below.

1. " law enforcement, firefighting and rescue agencies, medical resources, the principal tenants at the "

2. " h the law enforcement officer-in-charge to ensure adequate clear zones are maintained and airport op

"

Figure 42. LSTM output by “law enforcement” seed 1.

Figure 43. LSTM output by “law enforcement” seed 2.


By combining outputs and perform nouns ranking,

Figure 44. Extract high frequency nouns from LSTM outputs

Using the noun ranking shown above, command post and airport operations are selected as next candidate for

cascaded prediction.

For Command Post:

1. " command post for all emergencies, a joint fixed-position and/or mobile command post will be establis"

2. “ emergency will report to the command post to assist in liaison and coordination, control tower fun”

Figure 45. LSTM output by “command post” seed 1.

Figure 46. LSTM output by “command post” seed 2.

For Operations:

Figure 47. LSTM output by “airport operations” seed 1.


Figure 48. LSTM output by “airport operations” seed 2.

From the aggregations, for airliner on fire, the sequence of event triggers suggested are

1. Fire Department: extinguish fire, manage fuel containment

2. Law Enforcement: crowd control, scene control and passenger evacuation

3. Command Post: coordinate air traffic, active decision center

4. Airport Operations: clear paths and facilitate Fire Department and Law Enforcement activities and

airport workforce

4 Conclusion and Future Work

This study showcases the potential value and approach to integrate Deep Learning into Airport complex event

processing support. The emergency scenario is to identify a landed airliners fire via video feed. The situation is

first recognized and four responses are generated, e.g. 1. Trigger alert to Fire Department (to extinguish fire), 2.

Trigger alert to Law Enforcement (to barricade scene of fire and evacuate passengers), 3. Trigger alert to

Command Post (to coordinate air traffic tower and flights) 4. Trigger alert to Airport Operations (to clear path and

facilitate Fire Department and Law Enforcement activities).

Airport event processing is known for its complexity. The traditional tree/graph decision support for complex

decision making would grow and require consolidation where deep learning could consume the new materials

by training. In this conceptual framework, Deep Learning techniques are leveraged to simplify the processing by

encapsulating the events sequence and responses by recognizing event and generating logical responses via

supervised learning. Otsu, CNN, LSTM are used in this conceptual framework to perform feature extraction,

pattern recognition and predict the responses.

The future work includes

1. Train LSTM with two types of documents. One is additional airport emergency documents for better

response generation. The other is to obtain detailed investigation reports and train systems to predict

events in addition to responses.

2. Obtain Airport detailed video footage to perform Learning transfer by injecting airport-specific

video/images in training CNN. This will help solve the issue of “fire” vs. “fire screen” like issues.

3. Enhance feature selection by integrating CNN with Otsu algorithms.


References [1] Kang Li, Video Event Recognition and Prediction based on Temporal Structure Analysis, Dissertation,

Department of Electrical and Computer Engineering, Northeastern University, January, 2015

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Deep Residual Learning for Image Recognition, Microsoft

Research, arXiv:1512.03385v1 [cs.CV] 10 December 2015

[3] Gabriel Pestana, Sebastian Heuchler, Augusto Casaca, Pedro Reis, and Joachim Metter, Complex Event

Processing for Decision Support in an Airport Environment, Internal Journal on Advances in Software, Vol 6 no

3 & 4, year 2013, pp. 246-260

[4] Emergency Procedure, Los Angeles World Airport VNY Rules and Regulations, March 2005, Section 5

[5] Keras Documentation, https://keras.io/

[6] Trained image classification models for Keras, https://github.com/fchollet/deep-learning-models

[7] Convolutional Neural Networks (LeNet) – DeepLearning 0.1 documentation, DeepLearning 0.1. LISA Lab.

Retrieved 31 August 2013.

[8] Sepp Hochreiter; Jürgen Schmidhuber, Long Short-Term Memory, Neural Computation. 9 (8): 1997, pp 1735–

1780.

[9] Nobuyuki Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems

Man and Cybernetics, 9 (1): 1979, pp 62–66

Dell EMC believes the information in this publication is accurate as of its publication date. The information is

subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO

RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS

PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR

FITNESS FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an applicable

software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

https://keras.io/

https://github.com/fchollet/deep-learning-models

conceptual framework using dl for airport cep … · conceptual framework using dl for airport cep...

Documents