aflux: big data analytics in iot mashup tools• introduction to new jvm based mashup tool: aflux...
TRANSCRIPT
Tanmaya Mahapatra
Technische Universität München
Fakultät für Informatik
Lehrstuhl für Software & Systems Engineering
Garching, 05.07.2018
aFlux: Big Data Analytics in IoT Mashup Tools
• IoT Application Development Landscape & Challenges• IoT Mashup Tools: Examples, Limitations• Introduction to new JVM based Mashup Tool: aFlux• aFlux Internals• Supporting Big Data Analytics• Towards a Unified Approach for querying Big Data• Stream Analytics in aFlux• aFlux: Supporting Spark• aFlux: Supporting Flink• Road ahead
2Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Slide Contents
IoT Application Development:Landscape & Challenges
IoT has been defined as the interconnection of ubiquitous computing devices for the realization of value to end users.
• The true potentials of IoT are realized only by its application landscape.• Data collection from devices:
• For analysis to better understand the environmental context.• Task automation for time optimization and enhancing the quality of
human life.
4Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
IoT & Application Landscape
Insights from data can allow the creation of sophisticated & high-impact applications. E.g. Traffic congestion can be avoided by using learned traffic
patterns.
5Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Usage of IoT Data
Mining individual and group preferences
Mining patterns of end-users (mobility models, …)
Analyzing the state of engineering structures (structural health monitoring, …)
Predicting the future state of the physical environment (flood prediction in rivers, …)
• Unfortunately, the development of software application for IoT is notsimple.
6Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Difficulties in IoT Application Development
Write complex boiler plate codes to access devices
Perform data mediation
Device identification & Co-ordination
A mashup is a composite application developed starting from reusable data, application logic, and/or user interfaces typically sourced from the web.
7Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Streamlining Application Development for IoT: Mashups
User Interface
Logic
Data
IoT Mashup Tools: Examples & Limitations
• Mashup Tools typically offer graphical interfaces for specifying the data flow between sensors, actuators & services which lowers the barrier of creating IoT applications for end-users.
9Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
IoT Mashup Tools
State of the Art of Existing Mashup Tools
Device Management
• Provide simplified mechanisms to register IoT devices to the platform
• Devices can be named, grouped, deleted
Mashup Creation
• Provide a visual programming environmentto combine different services
• Provide support for pseudo codes/code-snippets for enhancing the business logic
Mashup Deployment
• Provide a run-time environment to execute the mashups.• Deployed mashups can be accessed by REST or can be re-
mashed.
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 10
• Node-RED is an open-source mashup tool developed by IBM.• Provides a GUI where users drag-and-drop blocks that represent
components of a larger system which can either be devices, software platforms or web services that are to be connected.
• These blocks are called nodes.• A node is a visual representation of a block of JavaScript code designed
to carry out a specific task. • Additional blocks(nodes) can be placed in between these components to
represent software functions that house business logic/data mediation logic.
• With Node-RED the time and effort spent on writing boilerplate code is greatly reduced, and the developer can focus on the business logic.
11Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
IBM Node-RED
12Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Flow of a Mashup App in Node-RED
Data from Sensors
Data Mediation
Business Logic Services
Mashup Design of a custom Intrusion Detection System from firewall logs
Strengths of Mashup Tools: At a Glance
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 13
Assist the user in application development
Abstracts the low-level complexity
Flexible & Intuitive
Support Integrated administration
Developmental methodology: Mashup + Model-based
14Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Research Direction: Why Integration is Potentially Beneficial? Mashup
ToolsBig Data
Tools
Complex & massive
parallelism
Good for analysing huge data-sets
Operate on more abstract level & heterogeneous
datasets
Simple & single threaded
Good for specifying control
flow
Operate on specific datasets
Full fledged integration would enable mashup developers to specify Big Data analytics jobs and consume their results within a single
application model.
15Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Existing Limitations of Mashup Tools
Blocking execution and synchronous communication in mashups
Single-threaded mashups
Visual Programming Limitations
End-user focus in mashup tools
aFlux:New JVM based Mashup Tool
17Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
aFlux
IBM Node-RED aFlux
Asynchronous Execution
Multi-threaded Mashup
Support for Big Data Analytic
Towards Unified Big Data querying
Easy Creation of Domain Specific Apps for end-users
• In the actor model, an actor is the foundation of concurrency or rather like an agent which does the actual work.
• It is analogous to a process or thread. • Actors are very different from objects
• an object can interact directly with another object i.e. changing its values or invoking a method.
• This causes synchronization issues in multi-threaded programs • In an actor system there is no direct way to invoke or interact with an
actor. • They respond to messages.
• An actor may change its internal state, perform some computation, fork new actors or send messages to new actors.
18Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
aFlux: Conceptual Approach
19Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
How actors behave?
• In designing aFlux, we decided to go with akka, a popular actor system. • The main intuition is that when a user designs a flow we model this flow
internally in terms of actors i.e. an actor is a basic execution unit of the mashup tool for designing a flow.
• we have three actors namely A, B and C where the computation starts with actor A. On completion it sends a message to actor B and so on.
• Since the akka system can be configured in many different ways for parallel and distributed operations and it abstracts away how the actors execute within it, it makes the user created program also parallel and distributed.
20Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
aFlux
• In aFlux, a user can create a mashup flow which is called a “flux”. • A flux is analogous to a flow in IBM Node-RED. • The only requirement for designing a flux is that it should have a start
node and an end node.
21Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Execution Pattern
22Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Asynchronous Execution Patterns
• The synchronous components typically block the execution flow, i.e. when they receive a message on their input port they start execution and pass the message through their output ports on their completion.
Synchronous
• On the other hand, asynchronous-capable components have two different types of output ports namely, blocking and non-blocking ports.
Asynchronous
23Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Asynchronous Execution Patterns
• To abstract away independent logic within a main application flow, the system supports logical structuring units called sub-flows.
• A sub-flow encompasses a complete business logic and is independent from other parts of the mashup.
• The sub-flows in the system are like normal asynchronous-capable components. They have input and two sets of output ports (i.e. blocking and non-blocking).
• They encompass within themselves a complete flow of graphical components used for Big Data querying.
24Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Logical Structuring Units in aFlux
25Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
A Sub-Flow can house a complete Flux
Supporting Big Data Analytics in aFlux
• aFlux has graphical components specific to Big Data query languages.
27Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Supporting Big Data Analytics in aFlux
• The last component used in a Big Data query is the Big Data executor• This executor component parses all the prior connections and generates
the final query to be passed on to the Big Data system for execution.
28Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Supporting Big Data Analytics in aFlux
29Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
A Pig Flux in aFlux
Towards a Unified Approach for querying Big Data
• The graphical components for Big Data queries are tightly coupled to the semantics of the target query language.
• The target languages may evolve and hence we need to create more components• This complicates end-user development
• The idea is to have a set of uniform graphical components for query framing.
• Develop a unified query language to permit end-users to formulate their data queries.
• Internally use translation models to generate queries in different target languages depending on the need as well as context.
• It introduces a layer of abstraction between the language elementsshown on the user interface, i.e. visual elements, and the target querylanguage which is executed on Big Data systems
31Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Towards Unified Analytics
32Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Unified Analytics: Internals
Stream Analytics in aFlux
34Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Data Processing
• Streams are NOT collections• Stream is a sequence of ongoing events ordered in time• Streams have no beginning and no end (unbounded data)• Ever-changing sets of data and values always in motion• Traversable only once
35Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
IoT and Stream Analytics
Stream Analytics platforms
36
• Stream analytics is important in the IoT business but− IoT mashups are not optimized for stream analytics− IoT mashup tools (e.g. Node-Red) are single-threaded and
implement a synchronous model
37Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Solution
Benefits:−No complicated set ups−No coding to analyze data streams−Non-experts can use this tool−Short learning curve−Easy to use & user-friendly UI
Goals:1. Implement a stream analytics suite2. Parameterize stream analytics properties3. Evaluate our stream analytics tool
38Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Potential Benefits & Goals
• Akka: Library used for stream analytics (Scala & Java)• Higher-level abstraction over Akka’s existing actor model• Implementation of the Reactive Streams specifications
39Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Solution
3 component types:1.Processing: Filter, Count, Moving Average, Max, Min2.Fan-in: Merge, Zip3.Fan-out: Broadcast
40Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Stream Analytics in aFlux
Stream analytics component architecture
Akka component
Akka Streams component
Akka component
Akka Streams component
41
• Ensure that the receiving side is not forced to buffer arbitrary amounts of data. (overflow strategy defined by user, e.g. back-pressure)
• Non-blocking asynchronous communication between components• Allow the queues which mediate between threads to be bounded.
(queue size defined by user, no failures due to big workload of data)− The benefits of asynchronous processing would be negated if the
communication of back-pressure were synchronous (like the single-threaded Node-Red)
Property ParameterizationBuffer size, Overflow strategy, Window parameters
42Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Benefits & Property Parameterization
43Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Property Parameterization
44Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Property Parameterization • Buffer size of the source queue• Overflow strategies:
• dropHead• dropTail• dropNew• dropBuffer
• Window parameters:• Window type (content/time)• Window method
(tumbling/sliding)• Window size• Sliding step
Scenario:• A9 – highway in Munich • All cars run in the same direction (South to North)• 3 loop detectors on 3 lanes• 4th lane (shoulder lane) is initially closed• 500th tick à accident happens• When average occupancy rate of loop detectors > 30% à open shoulder-
lane
45Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Evaluation
• aFlux flow for stream processing, • SUMO traffic simulator (Traffic simulation A9 project)
− TraCI + Kafka
46Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Experiment Setup
• Testing parameters:1. Loop detectors’ occupancy2. Mean speeds of cars3. Lane state4. Time
• Analytics methods tested:1. One-by-one processing (no window)2. Content-based tumbling window processing of size 503. Content-based tumbling window processing of size 3004. Content-based tumbling window processing of size 5005. Content-based sliding window processing of size 500
47Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Testing Parameters & Analytics Methods Used
48Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Average mean speed of cars: No Window
49Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Average mean speed of cars: Diff. Windows
Method Responsiveness Settling Time ScalabilityNo Window Very Slow Long LowTumbling 50 Very Fast Very Long Very LowTumbling 300 Slow Very Short HighTumbling 500 Slow None Very HighSliding 500 Slow None Very High
50Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Summary of Results
aFlux: Supporting Spark
52Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Enabling Spark Queries in aFlux
Sample Spark Application
53Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Spark Flow in aFlux
54Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Typical Structure of Big Data Jobs
55Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Spark Jobs in aFlux: Concepts
56Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Zooming into the Core Idea
aFlux: Supporting Apache Flink
58Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Apache Flink: Overview
Smart Santander
City-scale experimental research facility• 3000 IEEE 802.15.4 devices• 200 GPRS modules• 2000 joint RFID tag/QR code labels• Static locations (streetlights, façades, bus stops) as well as on-board
of mobile vehicles (buses, taxis)
Integration of Big Data Analytics in IoT Mashup Tools | MahapatraSource: http://smartsantander.eu
59 of 20
Smart Santander
Use cases• Traffic Intensity Monitoring−Around 60 devices located at the main entrances of the city of
Santander • Environmental Monitoring−Around 2000 IoT devices installed mainly at the city center−Sensors installed in around 150 public vehicles, including buses,
taxis and police cars.• Outdoor parking area management• Parks and gardens irrigation
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra60 of 20
Smart Santander
Enabling Stream Analytics in Smart Santander with FlinkMappings of the API required:• DataStream API for stream processing• CEP Library for event processing
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Source: https://flink.apache.org/
61 of 20
Steps: Overview
Task #1: SmartSantander connector for Flink
Task #2: aFlux actors
Task #3: Flink API mapper
Task #4: Front-end Validation
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 62 of 20
Implementation Tasks: aFlux Actors
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Actors are in charge of:• Generating Java code from the user-defined properties• Generating final Flink job• Exchanging messages with previous/posterior actors → FlinkFlowMessage
Set of aFlux tools developed:
63 of 20
Implementation Tasks: aFlux Actors
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
Structure of a Flink job
64 of 20
Implementation Tasks: Front-End Validation
Semantics between nodes:
An array of ToolSemanticsCondition can be defined when creating a new toolErrors are shown to the user when creating the flow
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
<Tool A>should
come (immediately)before
<Tool B>must after
65 of 20
Evaluation
Goal → prove how easy it is to create Flink jobs from aFluxEvaluation based on two use cases:
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 66 of 20
UC1: Real Time Data Processing
AggregateFunction, AllWindowedStream, DataStream, FilterFunction, MapFunction, RichSourceFunction, SlidingWindow, StreamExecutionEnvironment, TumblingWindow
Code Description
UC1E1 Temperature vs. air quality in a certain area in relation with the average of the city
UC1E2 Air quality vs. traffic charge in the city center
UC1E3 Noise vs. traffic charge in the city center
UC1E4 Max/min monitor
UC2: Pattern Detection
DataStream, Pattern, PatternSelectFunction, PatternStream
Code Description
UC2E1 Traffic increasing in a certain area
UC2E2 Heatwave in the city
Evaluation
UC1E1: live data from SmartSantander API (6th June 2018 9am-10am @ 5-min Wind.)
Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 67 of 20
Road Ahead …
69Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra
What’s Next?Current Agenda:
1. Paper in Review: Stream Analytics in IoT Mashup Tools2. 2 Papers in Pipeline: Spark and Flink3. Development of a Uniform Methodology, Thesis Writing4. Thesis Submission Planned in June, 2019.
Thanks -- Questions? Ideas?