performance modeling of iot applications€¦ · dr. subhasri duttagupta, tcs. computer measurement...
TRANSCRIPT
Computer Measurement Group, India 1Computer Measurement Group, India 1
www.cmgindia.org
Performance Modeling of IoT Applications
Dr. Subhasri Duttagupta, TCS
Computer Measurement Group, India 2
Contents
• Introduction to IoT System
• Performance Modeling of an IoT platform
• Performance Modeling of a sample IoT Application
• Performance Modeling of a real-life WSN system
• Summary
Computer Measurement Group, India 3Computer Measurement Group, India
Motivating Examples
Scalability Analysis
End-to-End DelayAnalysis
WNA Lab, Amrita University
Computer Measurement Group, India 4
Elements of a Typical IoT Platform
Analytics
Message Routing
& Event Processing
Sensor Data Management
AppsDevice
Management
Device Agents
Device Agents
API
Things withEmbedded Sensors
Mobile Devices
Gateway Devices Cloud Services Apps, Clients & Portals
Developers
http(s), tcp, udp, mqtt
LWM2M
RESTful
OPC-UA, Modbus, Continua
A high performance, scalable platform for Internet-of-Things
Computer Measurement Group, India 5Computer Measurement Group, India
Questions that Performance Modelling can answer
• When does any of the subsystem become a bottleneck• As the sensor data rate increases• As the number of queries injected by users increases
• How many VMs a particular subsystem needs to handle a certain load
• Is the SLA going to be met for a certain growth in the number of users
• What kind of performance modelling is useful in a specific situation
Computer Measurement Group, India 6Computer Measurement Group, India
Challenges in Performance Modelling of IoT
• Diverse Technology
• Huge number of Smart Devices of various types
• Lack of suitable testing platform and tool for testing End-to-End system
• Difficult to Predict exact workload mix
• Addition of frequent new services and devices
• Changes in Deployment platform
Computer Measurement Group, India 7Computer Measurement Group, India
Modeling Background
Closed Systems are characterized by No of users and think time
Service Demand – Amount of CPU/Disk time spent in serving one unit of Output (Utilization of resource / Throughput)
Open Systems characterized by rate of arrival
Input to the Model
Computer Measurement Group, India 8
What Questions Modelling helps in Answering
Which component becomes the bottleneck when the system is
handling peak data rate from a number of sensors
How many No of VMs required at each layer
For a certain no of sensors with a response time SLA
For a certain no of API clients and with a Response time SLA
• Can the system handle a certain growth of users accessing the IoT
services
Can the platform support different types of APIs (random access
query, range query, sequential scan query) simultaneously without
affecting the performance
Computer Measurement Group, India 9Computer Measurement Group, India
Steps in Modelling Exercise
1. Understand the architecture• Identify the components that are significant
2. Analyse the commonly used workloads and their parameters
3. Do performance test or analyse available data to obtain Service Demands for each type of workload
4. Analyse workload to find out any variation in service demands based on certain conditions
5. Decide the flow of requests within the model•Attach probability to various alternate flows
Computer Measurement Group, India 10Computer Measurement Group, India
Architecture Diagram of a Subsystem
Hbase Cluster
Phoenix Coprocessor
on Region Servers
Tomcat NIO Servers
SOS
DerbyPhoenix
JDBC
Spring MVC
Tomcat NIO Servers
SOS
DerbyPhoenix
JDBC
Spring MVC
Hazelcast
Distributed
Cache
Message
Exchange
API ID
Generation Service
Pulls and Queues ID
ID
REST
Clients
Observation
& Audit Trails
Sensor Observation Services
Computer Measurement Group, India 11Computer Measurement Group, India
Workloads for Sensor Observation Services
Different APIs
• GetObs – latest, by Sensor, Time range
• PostObs
• Get/Post Sensor
• Get Feature – get features of sensors
• Get Capability – get capability of SOS
Find out whether API output depends on
the parameters passed
SOS
Computer Measurement Group, India 12Computer Measurement Group, India
JMT: Powerful Java Modelling Tool
• Developed since 2002 by 10+ generations of PG and UG students at Politecnico di Milano and Imperial College London
• http://jmt.sourceforge.net/
• JMT is open source: GPL v2
– size: ~4,000 classes; 21MB code; ~200k lines
• Download the jar file and simply run
java –jar JMT.tar
• M.Bertoli, G.Casale, G.Serazzi.JMT: performance engineering tools for system modeling.ACM SIGMETRICS Performance Evaluation Review, Volume 36 Issue 4, New York, US, March 2009, 10-15, ACM press
Computer Measurement Group, India 13Computer Measurement Group, India
JMT – Java Modeling Tools
• JSIMgraph - Queueing network models simulator with graphical
user interface
• JSIMwiz - Queueing network models simulator with wizard-based
user interface
• JMVA - Mean Value Analysis and Approximate solution algorithms
for queueing network models
• JABA - Asymptotic Analysis and bottlenecks identification of
queueing network models
• JWAT - Workload characterization from log data
• JMCH - Markov chain simulator
Computer Measurement Group, India 14Computer Measurement Group, India
Analytical Modeling of SOS Model
HBase version
Delay stations Queuing stations
Postgres version
Computer Measurement Group, India 15
Modeling of SOS with PostGres
SOS Modules
• Predicting Max Throughput, Response time for a specific Deployment
• Predicting performance for a different Deployment
• Use of Performance Mimicking Benchmarks [Duttagupta, IoT 2016]
No of Threads
Think time
Performance Testing Tool
UtilizationThroughput
Computer Measurement Group, India 16
Performance of a Single API on AWS (using Postgres)
• Throughput saturates at 256 users and at 123 trans/sec.
• Response time increases more than 1 sec beyond 256 users
• For scaling to higher no of users, we need to add more VMs to Tomcat
0
20
40
60
80
100
120
140
0 100 200 300 400 500 600
Thro
ugh
pu
t (T
xn/s
ec)
No of Users
PostObs on AWS
Actual Throughput Predicted Throughput
-500
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
Res
po
nse
tim
e (i
n m
s)
No of Users
PostObs on AWS
Actual Response Time Predicted Response Time
Response time ~ 2s
Computer Measurement Group, India 17
Mixed API for a different Datastore (Hbase)
• GetObsLatest 10% threads and PostObservation 90% threads
• Modeling helps in performance of Mixed API given that of Single API
0
50
100
150
200
0 100 200 300 400 500 600
Thro
ugh
pu
t
No of Users
GetObsLatest + PostObs
Actual Throughput Predicted Throughput
0
200
400
600
800
1000
1200
0 100 200 300 400 500 600
Res
po
nse
Tim
e
No of Users
GetObsLatest + PostObs
Actual Response time Predicted Response time
PostObs SD=12.5 ms GetObsLatest SD= 5.2 ms
Computer Measurement Group, India 18
Does API support Horizontal Scalability?
With 2 Tomcat VMs, Application scales up-to 512 users.
- Model predicts API to have linear scalability and actual test results show that two VMs scales to twice the number of users without increasing response time.
Response time ~ 0.5s
0
50
100
150
200
250
0 200 400 600 800 1000
Thro
ugh
pu
t (T
xn/s
ec)
No of Users
PostObs with 2VMs on AWS
Actual Throughput Predicted Throughput
0
500
1000
1500
2000
0 200 400 600 800 1000
Res
po
nse
tim
e (i
n m
s)
No of Users
PostObs with 2VMs on AWS
Actual Response Time Predicted Response Time
Computer Measurement Group, India 19
Number of VMs Required for a resp time SLA
Tomcat layer is scaling horizontally
PostGres VM needs to be upgraded to bigger VM beyond 1280 users
Response time SLA = 1 sec
Computer Measurement Group, India 20
Modeling Challenge – Service demand variability
Service Demand varies with higher no of threads, it also depends on the API
API No of threads Tomcat Service Demand
PostObs 64 11.2 ms
768 8.5 ms
1024 7.2 ms
GetObsLatest 64 8.7 ms
256 3.8 ms
512 2.4 ms
GetObsbySensor 32 44 ms
64 50.8 ms
128 68.4 ms
Computer Measurement Group, India 21
Modeling of a Rule Processing Engine
• Factors Impacting Performance
• Complexity of Rule applied on messages
• Payload of messages
• Inter-arrival delay of consecutive messages
• No of rules applied to an observation
• No of message producers and consumers
• Server/VM architecture
• We consider the effect of message rate and complexity of
rules
Computer Measurement Group, India 22Computer Measurement Group, India
Architecture of a Rule Processing Engine
• Messages first come to Tenant RabbitMQ based on API keys• Then Rule processing engine routes them to various topic exchanges• Multiple Topic MQs exist for multiple tenants with same topic
Computer Measurement Group, India 23Computer Measurement Group, India
Model of a Rule Processing Engine
Computer Measurement Group, India 24Computer Measurement Group, India
Performance of a Message Routing System
Message latency is very low until either RabbitMQ or MR VM saturates.
Once Util > 90%, Latency can increase rapidly due to Queue build up at the server.
0
50
100
150
200
250
300
350
400
450
0 200 400 600 800 1000 1200 1400
Late
ncy
(m
s)
Flow rate msg/sec
Latency in MQ Module
Actual latency Predicted Latency
Flow rate Latency RabbitMQCPU%
900/s 8 ms 80%
975/s 13 ms 87.4%
1050/s 1.9 – 8 sec 93.5%
Computer Measurement Group, India 25Computer Measurement Group, India
Combined Model with SOS sending data to MQ
PostObs request gets forked after processing at SOS and gets processed by Rabbit MQ + MR engine
Modeled using a combination of Open Request and Closed Request [ACM IoT 2016]
Computer Measurement Group, India 26Computer Measurement Group, India
Performance Analysis of a Sample Application on IoT Platform
Performance Analysis of two subsystems on IoT Platform
We have seen
Next
Computer Measurement Group, India 27Computer Measurement Group, India
Sample App – Energy Monitoring System
Computer Measurement Group, India 28Computer Measurement Group, India
Architecture – Data flows from Platform to backend
Computer Measurement Group, India 30
Modeling Problems
• How many more buildings the current infrastructure can support?
• How many Online users dashboard can support with the present
deployment for typical queries?
Computer Measurement Group, India 31
Q: How many more buildings can be supported
• Data comes from difference sources
• Occupancy data
• Energy Meter Readings
• Find out the distribution of Inter-arrival time of observation
• SOS log gives timestamps of arrival for each observation
• Two metrics are calculated from Inter-arrival time samples –
Mean and Standard Deviation
Computer Measurement Group, India 32Computer Measurement Group, India
Backend Access Log
• We extract the timestamps of subsequent observations being posted.
• With the timestamp, we calculate the inter-arrival time of every observations.
Computer Measurement Group, India 33Computer Measurement Group, India
Distribution for Inter-arrival time
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210
Pro
bab
ility
PDF of Inter-arrival time
Interarrival time (in ms)
• If it is Exponential distribution and Mean = 37.2, std deviation should have been also 37.2 but it is 45.6
• What trend does this distribution reflect?• How do we compute parameters corresponding to this
distribution?
Computer Measurement Group, India 34Computer Measurement Group, India
What is hyper-exponential distribution
• We need to find out a set of exponential distributions and their probability to match the distribution of data
𝒊=𝟎𝒏 𝒑𝒊
λ𝒊
Computer Measurement Group, India 35Computer Measurement Group, India
Fitting distribution to Data
• We can find out µ1 µ2 and their probabilities so that it matches with desired mean and standard deviation.
• µ1 = 22, p1 = 0.6 We obtain mean = 37.2 and STD = 45.6• µ2 = 60.1, p2 = 0.4
• But Ideally we should use a function to obtain the correct mixture so that it matches the shape of the distribution.
Computer Measurement Group, India 36
Q: How many online users can be supported
• What if Performance Testing is not an option?
– Rely on Utilization monitoring interface
• Find the Trend of the utilization data to derive Min, Max, Average.
• History is for 1 day, last 7 days or 1 month
• Is the utilization due to one type of workload?
Computer Measurement Group, India 37
When No Dashboard Queries are running
• Utilization is due to data injection and Alert Queries
Higher utilization at every 25-30 mins and remains high for 15 mins, CPU% varies between 11%-20%
Computer Measurement Group, India 39
Deriving Service demands for Different Workloads
• We compute the service demand based on mean throughput – derived from the log
• Front-end Client node handles traffic from only Dashboard
• Backend Data node handles traffic from Dashboard as well as from the sensor backend
ES BackendDashboard Queries
Sensor Data
Alert Queries
Computer Measurement Group, India 40Computer Measurement Group, India
Deriving Service demands for ES datanode
• Use of Least Square Technique
𝑈𝐸𝑆 = 𝑋𝐷𝑎𝑡𝑎 × 𝐷𝐷𝑎𝑡𝑎 + 𝑋𝐴𝑙𝑒𝑟𝑡 × 𝐷𝐴𝑙𝑒𝑟𝑡 + 𝑋𝐷𝑎𝑠ℎ𝑄 × 𝐷𝐷𝑎𝑠ℎ𝑄
𝑋𝐷𝑎𝑡𝑎 = 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑓𝑜𝑟 𝐷𝑎𝑡𝑎 𝐼𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛
𝐷𝐷𝑎𝑡𝑎 = Service Demand for Data insertion
Computer Measurement Group, India 41Computer Measurement Group, India
Model for Energy Monitoring Subsystem
Computer Measurement Group, India 42
Take aways from Modelling of Sample App
• Throughput in an Open system mostly remains unchanged
• Iowait% occurs due to logging of Debug/Info
• Utilization over a short duration can become very high
even at low Concurrency
• Model can be built based on data from production log
and utilization information
Computer Measurement Group, India 43Computer Measurement Group, India
What other Modeling Techniques we can use
• Markov Chains can be used when we can model system as a set of states
• Need to know States, Their Transition rates
• Example of a real-life Land slide Monitoring systems
• A Number of different types of sensors are used• Rain gauge, pore Pressure, Humidity, Movement
• Makes decisions based on the value of sensor readings
Computer Measurement Group, India 44Computer Measurement Group, India
Deriving Parameters for the Markov Chain
OFF
Sr
S234 Smp
ON
Srm
r(t) > Th1
r(t) < Th1
r(t) > Th3 || p(t) > Thp
r(t) < Th2
r(t) > Th2
|| m(t) > Th
m
r(t) < Th3
Power drain
Power drain
WNA Lab, Amrita University
Computer Measurement Group, India 45Computer Measurement Group, India
Summary – Things we covered
• Basics of Performance Modeling using Queuing Networks
• Given an architecture, how to build the model for the
system
• Performance Modeling of an IoT Platform – Closed System,
Open System, No of VMs required, Scalability analysis
• Performance Modeling of a Sample Application running on
an IoT Platform – Gathering inter-arrival time distribution,
Deriving service demands of workload
• Outcome of a Performance Model
• Other Modeling Techniques – Markov Chains
Computer Measurement Group, India 46
Important Resources• M.Bertoli, G.Casale, G.Serazzi.
User-Friendly Approach to Capacity Planning Studies with Java Modelling Tools.Int.l ICST Conf. on Simulation Tools and Techniques, SIMUTools 2009, Rome, Italy, 2009, ACM press
• S. Kounev, and A. Buchmann, “Performance modeling and evaluation of large-scale J2EE applications,” In Proceedings of the Computer Measurement Group's Conference, 2003.
• E. Lazowska, J. Zahorjan, G. Graham and K. Sevcik, “Quantitative System Performance: Computer System Analysis Using Queueing Network Models,” Prentice-Hall, 1984
• Performance Modeling and Design of Computer Systems: Queueing Theory in Action, by Prof. Mor Harchol-Balter
• A gentle introduction to some basic queuing concepts, by William Stallings.
• Automatically Determining Load Test Duration Using Confidence Intervals, R Mansharamani, S Duttagupta, A Nehete, CMG India, Pune, 2014
• Subhasri Duttagupta, Rajesh Mansharamani.Extrapolation Tool for Load Testing Results,Int. Symposium for Performance Evaluation of Computer System and Telecommunication System, 2011
• Subhasri Duttagupta, Mukund Kumar and Manoj Nambiar, Performance Modeling of IoTapplications, 6th ACM Conference on Internet of Things, IoT 2016.
Computer Measurement Group, India 47
Open Issues
• How to take care of variability of technology used by
various sensors for connecting to IoT system
• How Service demand varies with load or higher flow rate
• What are the fundamental limits for a technology stack
for a certain kind of workload