a cloud computing framework with machine learning ...worldcomp- ?· a cloud computing framework...

Download A Cloud Computing Framework with Machine Learning ...worldcomp- ?· A Cloud Computing Framework with…

Post on 10-Nov-2018




0 download

Embed Size (px)


  • A Cloud Computing Framework with Machine Learning

    Algorithms for Industrial Applications

    Brian Xu, D. Mylaraswamy, and P. Dietrich

    Honeywell Aerospace, Golden Valley, MN, USA

    Abstract. In this paper, a novel cloud computing

    framework is presented with machine learning (ML)

    algorithms for aerospace applications such as condition

    based maintenance, detecting anomalies, predicting the

    onset of part failures, and reducing total lifecycle costs.

    This cloud framework has been developed by using

    MapReduce, HBase, and Hadoop Distributed File System

    (HDFS) technologies on a Hadoop cluster of OpenSUSE

    Linux machines. Its ML algorithms are based on Mahout

    ML Library and its web portal is built using JBoss and

    JDK. Importantly, the big data from various Honeywell

    data sources are managed by our HBase and analyzed by

    various ML algorithms. Users can use this cloud based

    analytic toolset through web browsers anytime and

    anywhere. More analytic results of using this framework

    will be published later.

    Keywords: Cloud Computing, Big Data Analytics,

    Machine Learning, Algorithms, MapReduce, CBM.

    1 Introduction

    To deal with data overabundance and information

    overload, big data analytics and cloud computing

    technologies are being used by the world top IT and

    software companies such as Google, IBM and Microsoft

    [1]. Currently these technologies are being adopted by

    other industries. In this paper, we present our prototype

    of cloud framework with machine learning (ML)

    algorithms for aerospace applications, specifically,

    Condition Based Maintenance (CBM), monitoring,

    diagnostics, and product reliability and performance. In

    Honeywell, there are big (volume, velocity, variety) data

    that are collected and streamed from thousands of

    aircraft (operational datasets, maintenance data, etc.), test

    cells (hundreds of sensors and measurements, etc.), and

    repair shops (records of electronics, avionics, mechanical

    repairs, etc.). For instance, one test cell can generate 300

    MB test data daily per engine. Our approach is to

    combine the best strengths and synergies of both cloud

    computing and machine learning technologies, in order

    to effectively analyze the big data and develop

    capabilities of predictive analysis, actionable information,

    better CBM and decision making.

    Technically, by combining and leveraging cloud

    computing and ML technologies, our major goals are

    included (not limited to): (1) detecting anomalies from

    parts, components and systems, (2) predicting the onset of

    failures of parts (e.g., components, LRUs, etc.) to

    maximize asset usages and availability, minimize the

    downtimes, and (3) sustaining better and effective CBM

    policies, and (4) reducing total lifecycle costs of our

    aerospace assets and networks. Our primary tasks are to

    realize these goals by analyzing the big data and

    transforming information into knowledge. In our CBM

    applications, after we developed our Hadoop cluster by

    leveraging Apache ecosystems [2], we have focused on

    analyzing and mining our data sources by using open

    source ML algorithms including Mahout Library [3] and

    by developing our ML algorithms using R and Matlab.

    This paper is organized as follows: Section 2 describes

    our cloud-based ML framework and components. Apache

    Hadoop, MapReduce, and HBase are used to develop an

    effective cloud computing infrastructure on which our

    machine learning framework is built, and the technical

    details are described in Section 3. Mahout ML algorithms

    are briefly introduced in Section 4. Our conclusions are

    presented in Section 5.

    2 Architecture and components of Cloud-based ML Framework

    Our specific tasks are to find valuable insights, patterns

    and trends in big data (large volume, velocity, and variety)

    that can lead to actionable information, decision making,

    prediction, situation awareness and understanding. To

    complete these technical tasks, we have developed a cloud

    framework with machine learning technologies for cyber-

    learning, leveraging machine learning algorithms (SVM,

    random forests, PCA, K-means, etc.), knowledge mining,

    and knowledge intensive problem solving.

    We developed our cloud-based ML framework, by

    developing a Cloud Controller, Cluster Controllers, and

    Node Controllers on our Hadoop cluster of Linux machines.

    We used Eucalyptus cloud tool [4] to develop our primary

    software framework. The framework architecture and key

    components are shown in Figure 1.

    In Figure 1, we implemented the HBase that is a scalable,

    distributed database and supports real-time access large

    data repositories such as Oracle, MySQL, etc. Currently,

    we have 5 major HBase tables (more big tables can be

    created as needed):

    1. CBM_use: This table manages user credentials and access privileges.

  • 2. Field_reports: This table contains data from operating assets installed on various aircrafts and

    operating vehicles.

    3. ListOfValues: This table contains variables (typically vehicle installed sensors) and sampled

    historical data. Each data set has a unique

    timestamp associated with it.

    4. Repair_reports: This table contains data collected during the repair of a component. Typically data

    includes removal data, field observations (free text),

    parts replaced/repaired, and shop observations (free


    5. Testcell_reports: This table contains data from the laboratory acceptance and qualification testing.

    Most of the components we track undergo an

    acceptance test before they are shipped back to the


    In general, the HBase has two technical components: (a)

    Convenient base classes that support Hadoop

    MapReduce jobs and functions with HBase tables; and (b)

    Query predicate pushes down via server side scan and

    gets filters that will select related data for track

    management systems.

    As seen in Figure 1, HBase tables can work with

    relational databases such as SQL Server or MySQL and

    achieve the highest speed in processing and analyzing

    the big data. The following is an example of the code in

    Listing 1 for our HBase to get data from our SQL Server,

    e.g., Honeywell Predictive Trend Monitoring and

    Diagnostics (PTMD) database, and others.

    Figure 1.

    Architecture and Components of our Cloud-based ML Framework.

    Listing 1. A Code Segment for our HBase to get data from

    the SQL Server.


    security/bin/hbase classpath

    ${HADOOP_HOME}/bin/hadoop jar /opt/hbase-0.92.1-

    security/hbase-0.92.1-security.jar importtsv -

    Dimporttsv.columns=HBASE_ROW_KEY, asset:model,

    asset:serialnumber, test:device, test:type, test:objective,

    test:operator, ,

    event:time, algorithm:name, algorithm:date Device_test



    Our cloud-based ML framework works with our existing

    SQL databases and analytic tools as seen in Figure 2.

    Major existing data sources include stream datasets from

    test cells of aircraft engines, Auxiliary Power Units

    (APUs), assets (e.g. electronic parts, mechanic parts, etc.)

    repair shops, and aircraft fleets. We have our SQL

    servers, MySQL databases, and analytic tools (MatLab,

    proprietary toolbox, etc.). Newly developed HBase

    tables are populated by ETL, and the selected datasets

    from existing RDBMS and the HBase provide the data

    column families for Mahout ML tools to analyze.

    3 Cloud-Based ML Framework Built Using Apache Ecosystem

    Our commodity computers were virtualized by using

    Xen Hypervisor (www.xen.org/) as a virtualization


  • platform. OpenSUSE Linux operation system was

    installed on these virtualized computers. Our Apache

    Hadoop [2] software framework is installed as seen in

    Figure 3 on these virtual machines to support data-

    intensive distributed applications in aerospace industries.

    Our first Hadoop cluster consists of three nodes, one of

    which was designated as the NameNode and JobTracker.

    The other two machines acted as both DataNode and

    TaskTracker. A distributed filesystem was configured

    and formatted across the three nodes.

    MapReduce is the core of the Hadoop technology for

    easily writing applications to process vast amounts of

    data in-parallel on Hadoop clusters. Our Hadoop cluster

    consists of a single master JobTracker and one slave

    TaskTracker per cluster-node. The master node is

    responsible for scheduling the jobs' component tasks on

    the slave nodes, monitoring them and re-executing the

    failed tasks. The slave computers execute the tasks as

    directed by the master.

    Figure 2.

    The Cloud-based ML Framework working with our existing data sources and analytic tools.

    Figure 3.

    Screenshot of the Apache Hadoop Framework.

  • Figure 4 (a).

    A Screenshot of the MapReduce running on our Hadoop Cluster.

    Figure 4 (b).

    A Screenshot of our MapReduce working with HDFS.

    Specifically, our MapReduce runs on the Hadoop cluster

    and jobs can be submitted fro


View more >