a survey on cloud computing simulation and modeling

34
Vol.:(0123456789) SN Computer Science (2020) 1:249 https://doi.org/10.1007/s42979-020-00273-1 SN Computer Science REVIEW ARTICLE A Survey on Cloud Computing Simulation and Modeling Ilyas Bambrik 1 Received: 14 April 2020 / Accepted: 27 July 2020 / Published online: 4 August 2020 © Springer Nature Singapore Pte Ltd 2020 Abstract Cloud Computing (CC) has attracted a massive amount of research and investment in the previous decade. The economical model proposed by this technology is a viable solution for consumers as well as being a profitable one for the provider. How- ever, deploying real world cloud experiments to test new policies/algorithms is time consuming and very expensive, especially for large scenarios. As a result, the research community has opted to test their contributions is CC simulators. Although the models proposed by these simulators are not exhaustive, each one is made to address a specific process. Alternatively, others tools are made to provide a platform and the necessary building blocks to model any desired sub-component (application/ network model, energy consumption, scheduling and Virtual Machine provisioning). In this paper, a detailed survey about the existing CC simulators is made discussing features, software architecture as well as the ingenuity behind these frameworks. Keywords Cloud computing · Cloud simulation · Application model · Network model · Energy consumption · Virtual machine provisioning Introduction Over the last two decades, simulators have been widely used in various scientific fields. A simulator is a software that is able to reproduce the behaviour of a specific system, with a respectable precision. More often than not, reproducing the real event or deploying the target system is very expensive, hard and sometimes impossible. Thus, though the simula- tion tool introduces a small imprecision due to the complex nature of the system, it offers several advantages such as: (a) result reproducibility, (b) cost effectiveness, (c) flexibility. Due to the expensive cost of computer network compo- nents, simulators have been extensively used by the scientific and industrial communities in this domain. A network simu- lator usage vary from testing network topologies (Packet Tracer [1]), simulating general network functionalities/ protocols (such as Network Simulator 2 [2] and Network Simulator 3 [3], OMNet ++ [4]), to simulating specific types of networks such as sensor networks (TOSSIM [5]), P2P (PeerSim [6]) and Grid Computing (GridSim [7], SimGrid [8]). Generally, in computer networks, a simulator is meant to enable the researchers/industrials to study the: (a) feasi- bility, (b) performance, (c) resiliency/fault tolerance of the solution. Consequently, over the years, network simulators have become a popular and valid platform to test various network configurations, components and protocols before deploying the proposed solution. Given that Cloud Computing (CC) has attracted massive investments form the major industrial actors in recent years, developing cloud simulation tools became essential to offer designers a way to test their propositions. Limiting opera- tional cost and efficient resources allocation can significantly increase the profitability from a provider perspective. From an energy consumption point of view, reports have shown that Data Centers (DCs) in the USA reached between 1.7 and 2.2% of the total electricity consumption of the country in 2011 [9]. Further studies illustrated that typically, only 30% of the DC resources are used while the rest are in idle state [10]. Thus, introducing a significant waste in operational cost, as well as negative implications on the environment. Evidently, with the constant expenditure of cloud demands and the providers’ infrastructure, this consumption can only inflate. Hence, researchers were motivated to develop provi- sioning methods meant to decrease electricity consumption of the DCs. Also, honouring the Service Level Agreement * Ilyas Bambrik [email protected] 1 Laboratoire de Recherche en Informatique de Tlemcen (LRIT), Department of Computer Science, Faculty of Science, New University Pole Abou Bekr Belkaid, Tlemcen Mansourah, 13000 Tlemcen, Algeria

Upload: others

Post on 20-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey on Cloud Computing Simulation and Modeling

Vol.:(0123456789)

SN Computer Science (2020) 1:249 https://doi.org/10.1007/s42979-020-00273-1

SN Computer Science

REVIEW ARTICLE

A Survey on Cloud Computing Simulation and Modeling

Ilyas Bambrik1

Received: 14 April 2020 / Accepted: 27 July 2020 / Published online: 4 August 2020 © Springer Nature Singapore Pte Ltd 2020

AbstractCloud Computing (CC) has attracted a massive amount of research and investment in the previous decade. The economical model proposed by this technology is a viable solution for consumers as well as being a profitable one for the provider. How-ever, deploying real world cloud experiments to test new policies/algorithms is time consuming and very expensive, especially for large scenarios. As a result, the research community has opted to test their contributions is CC simulators. Although the models proposed by these simulators are not exhaustive, each one is made to address a specific process. Alternatively, others tools are made to provide a platform and the necessary building blocks to model any desired sub-component (application/network model, energy consumption, scheduling and Virtual Machine provisioning). In this paper, a detailed survey about the existing CC simulators is made discussing features, software architecture as well as the ingenuity behind these frameworks.

Keywords Cloud computing · Cloud simulation · Application model · Network model · Energy consumption · Virtual machine provisioning

Introduction

Over the last two decades, simulators have been widely used in various scientific fields. A simulator is a software that is able to reproduce the behaviour of a specific system, with a respectable precision. More often than not, reproducing the real event or deploying the target system is very expensive, hard and sometimes impossible. Thus, though the simula-tion tool introduces a small imprecision due to the complex nature of the system, it offers several advantages such as: (a) result reproducibility, (b) cost effectiveness, (c) flexibility.

Due to the expensive cost of computer network compo-nents, simulators have been extensively used by the scientific and industrial communities in this domain. A network simu-lator usage vary from testing network topologies (Packet Tracer [1]), simulating general network functionalities/protocols (such as Network Simulator 2 [2] and Network Simulator 3 [3], OMNet ++ [4]), to simulating specific types of networks such as sensor networks (TOSSIM [5]), P2P

(PeerSim [6]) and Grid Computing (GridSim [7], SimGrid [8]). Generally, in computer networks, a simulator is meant to enable the researchers/industrials to study the: (a) feasi-bility, (b) performance, (c) resiliency/fault tolerance of the solution. Consequently, over the years, network simulators have become a popular and valid platform to test various network configurations, components and protocols before deploying the proposed solution.

Given that Cloud Computing (CC) has attracted massive investments form the major industrial actors in recent years, developing cloud simulation tools became essential to offer designers a way to test their propositions. Limiting opera-tional cost and efficient resources allocation can significantly increase the profitability from a provider perspective. From an energy consumption point of view, reports have shown that Data Centers (DCs) in the USA reached between 1.7 and 2.2% of the total electricity consumption of the country in 2011 [9]. Further studies illustrated that typically, only 30% of the DC resources are used while the rest are in idle state [10]. Thus, introducing a significant waste in operational cost, as well as negative implications on the environment. Evidently, with the constant expenditure of cloud demands and the providers’ infrastructure, this consumption can only inflate. Hence, researchers were motivated to develop provi-sioning methods meant to decrease electricity consumption of the DCs. Also, honouring the Service Level Agreement

* Ilyas Bambrik [email protected]

1 Laboratoire de Recherche en Informatique de Tlemcen (LRIT), Department of Computer Science, Faculty of Science, New University Pole Abou Bekr Belkaid, Tlemcen Mansourah, 13000 Tlemcen, Algeria

Page 2: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 2 of 34

SN Computer Science

(SLA) established with the client can spare the provider income losses. However, trying provisioning algorithms and DCs architectures/components with real hardware is costly, complex and time consuming. Therefore, using cloud simulators to test such propositions has become a popular alternative [11–13].

Cloud computing is relatively similar to grid comput-ing in several ways. A cloud provider offers its customers resources on demand whereas the consumer pays only for the usage time which is obviously less expensive than buying the required hardware/software to accomplish the demanded job. Though, unlike grid computing, CC is meant to offer more flexibility from a perspective of resources allocation/isolation and services customization in addition to a different economical model. Plus, unlike grid computing, cloud ser-vices come in various flavours (Software as a Service, Plat-form as a Service and Infrastructure as a Service) to accom-modate a wide range of client applications/requirements. As a result, it was necessary to develop cloud simulators able to run massive scenarios, supporting important cloud specific functionalities such as virtualisation and service brokering which are unsupported in grid computing simulators.

Though a few surveys [14–18] have been published exploring cloud simulation tools, as far as our knowledge, the features provided have not been sufficiently detailed by these studies to give informative options for developers. In this paper, we propose a comprehensive review of cloud sim-ulation tools discussing their available features, model accu-racy and architectural flexibility. Compared to other existing surveys, the main contributions of this paper are as follows:

• Almost all the available cloud simulators are reviewed in the paper, including the most recent ones. First, the presented simulators are categorised based on their ori-entation and what part of the cloud they model.

• Unlike most of the existing review papers, the paper pre-sents the detailed architecture of the discussed simulators rather than an overview. Especially, the paper highlights the important features of the simulators and how they are implemented.

• Furthermore, emphasis is given to design choices meant to improve efficiency, accuracy or the trade-off between the two.

The remaining content of the paper is organised as fol-lows: Section “Related Works” presents related surveys addressing cloud simulation. Section “Cloud Simulation Features discusses the desired characteristics in a simula-tor and regroups the proposed simulators into six categories based on their specific purposes and features. In Sections “ General Cloud Modelling, “Middleware Supervision”, “Energy Aware Provisioning”, “VM Provisioning”, “Eco-nomical Modelling” and “Application Modelling”, the cloud

simulators in their respective categories are presented in details. Section “Comparison and Discussion” compares the currently available simulators and Section “Conclusion” concludes this paper.

Related Works

As cloud simulation tools are increasingly used by research-ers [19–27], several surveys have been published comparing the available simulation frameworks. The first cloud simu-lation survey was published in 2012 by Zhao et al. [14]. The paper gave a brief overview and a comparison of the available cloud simulators. The discussed simulators are compared mainly by their core platform and programming language. On the other hand, the paper does not give suffi-cient details about the simulator models and their respective inner-working.

The paper presented by Bahwaireth et al. in [15] pro-vides much more details about the selected simulators. The authors illustrate the simulators features, GUI as well as simulation setup instructions. Mainly, the conducted simu-lations in this paper were focused on measuring response time and power consumption for each simulator separately. However, only 8 simulators were surveyed by this paper. In 2017, Byrne et al. [16] published a survey that encompasses most of the existing cloud simulation frameworks. In this work, simulators are compared based on available features such as GUI availability, software portability, built-in sup-ported output formats, size in lines of code and last available update. Nevertheless, the paper gives very little details for each simulator. In [17], a similar study is proposed by Fakh-fakh et al. that compares the features available in the most popular simulators. However, the survey is not exhaustive and the simulators are not discussed sufficiently.

The paper published by Makaratzis et al. [18] in 2018 is the most complete survey currently available. Though, the focus of the paper is energy consumption modeling only rather than general cloud simulation. The authors discuss the power consumption models for each simulator as well as the available features from this perspective (power consump-tion models for CPU, HDD, RAM and network switches). Furthermore, the paper compares power consumption for each simulator with results obtained from real world experi-ments. The authors conclude that all the compared simu-lators display similar tendencies and the finale results are approximately the same. Although, the authors acknowledge that further simulations should be done for larger networks to study how each energy consumption model scale. The paper however discusses only power consumption of the appropriate simulators and the remaining features of cloud simulation are understandably out of scope in this context.

Page 3: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 3 of 34 249

SN Computer Science

The existing cloud simulation surveys are summarized in Table 1.

In this paper, we propose an in-depth review for most of the available cloud simulators. While previous surveys gave only an overview of the cloud simulators, this paper pre-sents important modules and inner-working for each simu-lator. Furthermore, we highlight important features, design choices and how they impact performance, accuracy and other aspects of the simulations.

Cloud Simulation Features

Since a cloud system incorporates several complex compo-nents: (a) user application/workload, (b) Virtual Machine (VM)/host performance, (c) power consumption, (d) resources contention and scheduling, (e) network com-munication, it’s unfeasible to model all elements of the system with a high accuracy. A cloud simulator can be

designed to focus on a specific aspect, functionality or architecture of the cloud. Otherwise, such software can be developed to simulate the general behaviour of the cloud. Thus, enabling the system designer to: (a) identify and fix possible deployment issues, (b) study the system perfor-mance/Quality of Service (QoS) and profitability. Depend-ing on the orientation of the cloud simulator, this type of software can be categorized as shown in Fig. 1.

Furthermore, to estimate how suitable a simulator is for a study, the following factors should be considered:

Flexibility Though most simulators are designed to mimic a specific aspect of the cloud, flexibility is a desired characteristic to enable future extensions. A simulator that is difficult to extend due to its architecture, programming language/paradigm or source code unavailability, has a limited usability and eventually becomes outdated because only its developers can contribute to its model. On the other hand, a flexible architecture goes a long way into

Table 1 Summary of the discussed cloud simulation survey

Authors Number of surveyed simula-tors

Summary

Zhao et al. 11 Short overview of the available cloud simulatorsBahwaireth et al. 6 A more detailed discussion of the available simulators

Illustration of the simulators features through experimentsByrne et al. 33 Overview of most of the available cloud simulators

Comparison of the simulators based on features onlyFakhfakh et al. 22 Comparison of cloud simulators based on available features and modelsMakaratzis et al. 6 Comparison of the power consumption model for cloud simulation frameworks

Fig. 1 Cloud simulator categories

Page 4: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 4 of 34

SN Computer Science

extending the simulator development cycle as it’s more inviting for community contributions.

Scalability As realistic cloud scenarios are massive, how the simulator scales is an important property. Three main factors are influential from a scalability perspective: (a) trade-off between accuracy and runtime speed, (b) program-ming language, (c) distributed/local execution. Enabling the user to deactivate irrelevant features or adjust the accuracy, can improve the scalability of the simulator. However, exe-cuting the simulator in a distributed environment is the best way to run large scenarios.

Model exhaustivity For starters, a cloud simulator must support basic features which are: (a) job scheduling and duration estimation, (b) cost estimation, (c) network com-munication. However, to estimate performance with higher precision and from various perspectives/performance met-rics, several additional features are needed (HDD access, power consumption and heat output, application model and communication between tasks). Including as many features as possible facilitates the user task while enabling the map-ping of the interplay between the sub-components of the cloud system.

Model accuracy It’s practically impossible to model all cloud functions with exact precision as the performance of the VM is unpredictable due to resources contention, multi-tenancy and unpredictable workload patterns. Nevertheless, an estimation of the DC performance with a reasonable rela-tive error is acceptable. Heuristics and experimental results can be used to provide a sufficiently accurate model of the internal cloud procedures without a heavy influence on the simulation execution time.

General Cloud Modelling

Although some simulators model most of the Cloud intrica-cies in a simplified and abstract manner, these tools are often easy to extend and highly flexible. Subsequently, only the features that are of interest are extended while those incon-sequential are ignored. The tools presented in this section fall under this category.

CloudSim

Implemented in Java and initially developed on top of Sim-Java and GridSim, CloudSim [28] was the first simulator designed to mimic the general cloud behaviour. The design-ers of this simulator attempted to provide a general and customizable simulation tool to model any possible cloud functionality (service brokerage, task scheduling, VM allo-cations, etc.). The early version of this simulator was an extension layer that uses predefined functionalities from GridSim such as resources reservation, workload trace and

network procedures. For instance, in addition to the process-ing cost which is offered by GridSim, the initial version of CloudSim introduced: (a) cost per memory, (b) cost per stor-age, (c) cost per bandwidth, to compute the overall service cost. Subsequently, for performance related reasons, a new simulation engine was implemented instead of SimJava and CloudSim became independent of GridSim.

The user layer is implemented on top of CloudSim to ena-ble the definition of various cloud scenarios. Primarily, the user defines the cloud system configuration (hosts capacities and network topology) and cloud requests (number/sizes of applications and VM requirements). Applications that are scheduled in VMs are represented by the Cloudlet class and are mainly defined by: (a) processing length, (b) number of necessary CPUs to run, (c) status, (d) input/output size. A Cloudlet can be associated with UtilizationModel instances that define the evolution of the application requirements for each resource separately. The topology is modelled as a graph loadable from a BRITE format file which contains the latency matrix for the links connecting the cloud com-ponents. Consequently, when a message event is fired from one cloud component to another, the simulation engine uses the specified delay in the topology file to deliver the message accordingly.

Through class inheritance, CloudSim offers the possi-bility to develop customized scheduling/allocation strate-gies and brokerage policies whilst offering simple default implementations. For instance, VMProvisioner is an abstract class through which it’s possible to define the VM alloca-tion policy. This class can be implemented to maximise the system profitability and/or to improve the QoS. In addition to the possibility of implementing scheduling strategies, two default scheduling policies are implemented in CloudSim: (a) time-shared, (b) space-shared. One of these policies or both can be used to manage the VMs running in a host and/or, the tasks running on a VM. In the time-shared policy, the VMs/tasks are allocated time slots to run on the cores of the host/VM. Conversely, in a space-shared strategy, the VMs/tasks are queued while only one VM/task is executed until completion. Besides, the energy consumption model of a processing component can be defined by implementing the PowerModel interface.

Another important class is the DatacenterBroker which can be extended to implement customised/experimental mediating strategies between the providers and customers based on the requirements of the latter. Moreover, coopera-tive behaviour of a federated cloud is customizable through the CloudCoordinator class. This instance monitors proac-tively the current status of the DCs through Sensor objects which are setup with minimum/maximum performance thresholds of the observed performance metric. As a result, it’s possible to perform load balancing or workload migra-tion depending on the status of the DC. Simulation results

Page 5: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 5 of 34 249

SN Computer Science

illustrate the performance benefit of the cloud federation. For instance, this feature allows designers to simulate scenarios of continuous delivery during maintenance and dynamic load dissemination.

VM and Host objects are characterized by a list of Pe (Processing elements) and each Pe is defined by a processing capacity in Million Instruction Per Second (MIPS). Dur-ing runtime, all the DataCenter objects invoke a VM update processing method for each VM in each host. As a result, the hosts update the progresses of the running tasks in their VMs while keeping track of the task closest to completion. Subsequently, before incrementing the simulation clock, the tasks that are completed are deleted and the resources allo-cated to them are freed.

In the initial version, the designers of CloudSim opted to use SimJava Entities (threads) for two components: (a) User, (b) DataCenter. However, to increase the size of the run-able scenarios, this design was revised to use only one thread which is the simulation engine. Objects of classes implementing SimEntity can generate events (with differ-ent priorities), process incoming ones and setup event filters (through Predicate) to wait for a specific type of event. To enable adding entities during runtime, upon creation of a SimEntity, the latter is automatically added to the list/hash-map of CloudSim entities and a creation event is added to the event list. Test results show a significant improvement in simulation duration and memory overhead for the new CloudSim architecture by comparison with the old one.

Several extensions for CloudSim have been proposed. For instance, CMCloudSim [29] enables the user to find an economically viable solution in terms of cloud providers and VM instance type. For each instance type, this simulator attaches a price per hour as well as the instance specifica-tions. In addition to the possibility of defining new instance types, CMCloudSim is preloaded with Amazon, Azure and Google instances. Furthermore, this tool automatically updates the predefined instance prices through the provid-ers’ websites.

GroudSim

GroudSim [30] is a simulation toolkit designed to simulate application execution in cloud and grid systems. The key class of this framework is SimEngine which implements the components of the discrete event simulator (queue of events and clock advance algorithm) and captures the sta-tus of all the system entities. Common elements between the cloud and grid are defined while specific components/procedures are derived through inheritance/override. An abstract GroudEntity class is provided that enables the sim-ulator to alter/trail the status of the user defined entities. This simulator can configure event handling by probability distribution which affects the outcome of the simulation.

Thus, GroudSim enables the user to run deterministic and non-deterministic simulations.

In case failures need to be introduced in the scenario, the user defines the occurrence probability, duration of the failure and mean time between failures. Before initiating the simulation clock, based on the defined simulation scenario, the GroudFailureGenerator adds failure events to the simula-tion entities event list. Afterwards, when the failure occurs, a new event for recovery is scheduled in the event list to re-establish the failed entity after a period of time. GroudJob class, which represents a submitted job, can be executed in a CloudSite or GridSite. Conversely, while in a grid scenario the job is queued until the CPU is free, the jobs are launched in the deployed VMs for the cloud scenarios. According to the triggered event, that implements JobEventTypes, the GroudJob status changes during its life cycle (from un-sub-mitted to finished/failed). Subsequently, the cost is computed based on execution time and CPU core utilization in case of grid scenario. In the cloud scenario on the other hand, the cost is based on the VM requirements multiplied by the number of hours in addition to data transfer cost.

GroudSim offers three possible ways to run the simula-tion: (a) run the scenario until the defined simulation time, (b) run the scenario until there are no queued events, (c) set a random stoppage time. Also, the tracing module can reg-ister in chronological fashion the status of all entities or only statuses changes caused by simulation events. The tracer architecture of GroudSim, which is based on event Han-dlers and Filters, enables the user to define a specific interest in events and disregard the rest. This feature improves the simulation time significantly as unnecessary write opera-tions are avoided. The main objective of the designers of GroudSim was to improve the performance of their simula-tor, by comparison with other simulation tools from a sce-nario size perspective. Therefore, GroudSim uses only one thread which reduces memory requirement and renders it able to run larger scenarios in less time. Simulation of two scientific applications (WIEN2k [31] and MeteoAG [32]) illustrated significant gain in simulation time by comparison with GridSim.

CloudAnalyst

CloudAnalyst [33] is an extension of CloudSim that offers the possibility to model the special/temporal distribution of the cloud application users/DCs. This feature is very impor-tant to study applications hosted in geographically distrib-uted DCs such as social networks. Furthermore, this simula-tor models the Internet characteristics of the geographical regions as well. To ease the creation of cloud scenarios, CloudAnalyst contains a Graphical User Interface (GUI) to setup all simulation parameters. Additionally, the perfor-mance of the solution (min, max and mean response time of

Page 6: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 6 of 34

SN Computer Science

all users requests, response time based on user groups) and various statistical results (usage patterns of the application) can be directly visualised through the graphical output of CloudAnalyst.

The spreading of user requests, geographically and tem-porarily, is modelled through the UserBase class. This class mainly contains: (a) region attribute representing the geo-graphical location of the user group, (b) peak hours and aver-age peak usage, (c) average off peak usage. This enables the configuration of variable usage patterns. Furthermore, the CloudAppServiceBroker is an interface meant to be implemented to define a dispatching strategy for incoming requests towards the DCs. Two predefined implementations are proposed: (a) minimal proximity/response time, (b) load sharing approach.

The VMLoadBalancer is an abstract class that enables the user to implement a personalized load balancing strategy to dispatch the received requests to the running VMs. The default implementation of this class attributes the received requests to the running VMs in a round robin fashion. Also, a secondary load balancing policy is implemented that attributes requests only to the VMs processing a lower number of tasks than the defined threshold. Through a case study of a social network application scenario, the develop-ers of CloudAnalyst illustrate the importance of VM provi-sioning based on usage patterns. By dynamically adjusting the number of VMs allocated to the application in peak/off peak hours, the performance of the deployed system can be significantly increased and the cost can be reduced as well.

iCanCloud

Written in C ++ and built as an extension for OMNET, iCan-Cloud [34] is a cloud simulator mainly useful for comput-ing Cost/Performance (C/P). Simulation scenarios can be generated through configuration scripts or, by the use of iCanCloud GUI. Unlike most of the cloud simulators, iCan-Cloud was designed to run the cloud scenario in a cluster environment if possible (though this feature is incomplete). Thus, providing the user with the possibility to run massive scenarios efficiently. Additionally, iCanCloud is not only extensible to enable adding new specifications, it enables the user to define the degree of details of the simulation by changing the level of details of the hardware components.

The first two bottom layers of iCanCloud are the Hard-ware layer and the Basic System API layer. The Hardware layer elements (CPU, memory, network, etc.) are used to construct the cloud environment, while the Basic System API layer provides a set of functions for the modelled appli-cation to access the corresponding hardware resources. For instance, to reserve memory required of an applica-tion, iCanCloud_allocMemory() function is called with the amount of necessary memory. Additionally, to make

the communication between distributed applications pos-sible, iCanCloud implements a method to enable interfacing between applications. In the upper layer, predefined VMs and custom-made ones, based on resources specifications, are located. Same, above the VM layer, the application layer contains predefined and personalized application models. Managing submitted jobs and attributing them to the appro-priate VM is insured by the global Hypervisor which is the main class of this simulator. This module also includes cost policies depending on the VM specifications. Finally, the top most layer is the cloud system defined by the user.

First, the user must implement the basic Hypervisor class methods. The designers of this simulator put a lot of work and emphasis on the Hypervisor module to permit the defini-tion of any sort of policy. For instance, unlike its cloud simu-lator counterparts, it’s fairly easier to add more job queues for the scheduling algorithm at VM level. Additionally, the Hypervisor builds a VM map containing mainly: (a) the con-figurations/states of the running VMs, (b) the correspond-ence between the running jobs and VMs.

The simulation scenario is composed of the cloud sys-tem configuration and applications/users specifications. Both specifications are dynamically loaded by the simulator instead of recompiling. For the cloud system specification, the Hypervisor instance is selected and the VM characteris-tics are declared in terms of attributed resources/operation latencies. Subsequently, in the applications/users configura-tion, after defining the requirements of the application, the user can either define the types and number of VMs that must run the submitted job or, delegate the matchmaking to the Hypervisor module.

With the possibility of creating multi-core processors, the CPU scheduler can execute blocs of instructions for multiple running applications simultaneously. Furthermore, in addi-tion to traditional task queue management policies (FIFO, priority based, etc.), the user can define new scheduling poli-cies. At memory level, iCanCloud models the RAM charac-teristics and its related management procedures (allocating/freeing memory pages, memory access and disk cache man-agement). At hard disk level, read and write operations are performed based on the specified operations latencies and the size of data. It’s noteworthy that iCanCloud can simulate distributed file management such as RAID. Finally, INET framework is used to simulate network routines.

Two VM provisioning algorithms are implemented in iCanCloud, namely: (a) Min-min algorithm, (b) Max–min algorithm. The Min-min algorithm sorts the execution times of the queued tasks for each possible VM instance. Then, the algorithm searches for a subset of tasks that can be executed with the minimum time in accordance to the selected VM type. At the end, the type of VM producing the lowest cost/execution time (performance) is selected to be instantiated. If more than one VM type offer the same cost/performance,

Page 7: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 7 of 34 249

SN Computer Science

the solution providing the lowest cost or highest perfor-mance, is chosen. As for the scheduling algorithm, after the end of each task, a new task with the minimal execution time is selected. On the other hand, the Max–min algorithm is derived from the Min-min algorithm in a way that the VMs are deployed to run the lengthy tasks first.

In a validation series of experiments, the developers of iCanCloud compare results of a pre-made mathemati-cal model [35] of an astronomical application [36] with the results produced by iCancloud, in terms of C/P. Dur-ing the experiment, the number and types of VMs as well as the repartition of the intervals on sub-tasks are varied. Simulation results produced using small and medium Ama-zon EC2 instances follow closely the mathematical model results. For large and extra large VM instances, C/P spikes in the simulation results are noticed while both simulation and mathematical model results follow a similar pattern. The observed spikes meant that the increase in the number of VMs (which evidently increases the cost) did not improve the performance. Additionally, a comparison is made with the C/P of the same application executed in a public cloud (Amazon EC2) with small instances. Same, the simula-tion results are close enough to the C/P value of the cloud environment. To compare the efficiency of iCanCloud with CloudSim, a series of simulations was conducted where the number of jobs and VMs are varied. Though iCanCloud con-sumes more RAM, it scales better than CloudSim in large scenarios.

Cloud2Sim

To address the scalability issues of cloud simulation, Cloud-2Sim [37] is a reworked version of CloudSim using in-mem-ory data grid to distribute the simulation objects on a cluster. This simulation toolkit is available in two implementations based on two in-memory data grid libraries: (a) Hazelcast, (b) Infinispan. First, the master node running Cloud2Sim distributes the simulation objects on the hosts joining the cluster. This repartition is done in a way that related objects are executed in the same host to avoid frequent remote calls. During runtime, Hazelcast offers the option to dynamically adjust the size of the cluster running the simulation based on the Health Monitor readings. If the user activates dynamic scaling, a separate Cluster-Sub is deployed to resize the simulation cluster based on maximum/minimum resources utilization thresholds. Moreover, a sufficient buffering time should be defined to avoid conflicting resizing decisions in short periods and the difference between minimum and max-imum thresholds should be high enough for the same reason.

Core CloudSim classes (VM, Cloudlet, DataCenterBro-ker) are extended and stored in IMap structure to be distrib-uted for execution in the cluster. HzObjectCollection encap-sulates all correspondences between the simulation objects

(e.g. VMs to Host, Cloudlet to VM, etc.). StreamSerializer is extended for each distributable object to define how encod-ing/decoding of the object are done for transmissions over the network. Furthermore, the Coordinator class implements a thread that periodically monitors the health of instances in each cluster and launches scaling action if required.

In a simulation series, the developers of Cloud2Sim explain that the more complex the simulation is, the more beneficial the distributed execution becomes. However, for a simple cloud scenario run-able on a single machine, run-ning the same simulation in a cluster results in an increase of simulation time. Because the overhead of distributing the simulation objects on the cluster increases the simulation time pointlessly while the simulation can rapidly run in one host. Furthermore, the designers of this simulator define a Common Case parameter which is the number of active workers for which the performance of the simulation stabi-lises (i.e. involving more workers doesn’t lower the simula-tion execution time).

SimIC

Developed in Java, SimIC [38] is a cloud simulator that is able to model collaborative behaviour between meta-brokers. Like the initial version of CloudSim, SimJava is used to manage simulation events in addition to a similar architecture to CloudSim. A meta-broker entity, which com-municates with other meta-brokers to distribute the users’ requests, is placed on top of the cloud and is embodied by the MetaBroker class. The simulator user can define a cus-tomized topology of meta-brokers or use the default one where the meta-brokers are connected sequentially.

User request is defined by hardware (CPU clock and number of cores, etc.), software (job length and cycle per instruction), events delays as constants or as a probability distributions (e.g. VM deployment and initialisation delay, communication delay) and SLA requirements (deadline and priority). Upon request arrival, from a user or forwarded from another meta-broker, an SLA matchmaking is per-formed to verify whether the request can be served which is done based on the local available resources. At the Hyper class (hypervisor), several policies are implemented for job scheduling: (a) First Come First Served (FCFS), (b) Short-est Job First (SJF), (c) Earliest Deadline First (EDF), (d) Priority Scheduling (PS). Conversely, if a request cannot be served locally, the Bucket class is invoked to log this event. This class can be further extended to handle the unfinished job. For instance, failed jobs due to resource contention can be outsourced to the inter-cloud if the sub-cloud load is beyond a predefined threshold. Furthermore, it’s possi-ble to define VM rescheduling and migration procedures. For VMResheduler, it enables quick re-deployment of the VM for a previously requested instance. On the other hand,

Page 8: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 8 of 34

SN Computer Science

the VMMigrationScheduler class is used to determine a VM transfer strategy to another cloud peripheral storage. The migration process is triggered by the MigrationSensor objects which can be set to observe an event or, simply be activated after a defined period of time.

The main contribution of this simulator is the Meta-broker class with which it’s possible to implement request spreading and dynamic resources discovery in inter-cloud. Whilst the simulation scenario is running, the Accounting class is in charge of energy and cost model. In addition to the event log built-in feature of SimJava, SimIC incorporates a CreateResults class that saves performance metrics (makes-pan, energy consumption and host utilization). Subsequently, results are visualized with JFreeChart library [39]. The major drawback of SimIC is that it has no network model.

DesktopCloudSim

To model Desktop Clouds, also known as Community Cloud, DesktopCloudSim [40] is developed as an extension of CloudSim. As the failures/resources availability in unded-icated cloud is unpredictable, DesktopCloudSim simulate node failure/reboot based on the input trace file provided by the user. Subsequently, time-stamped failures are executed by the Failure Injector module. Due to the particularity of the Community Cloud, submitted jobs and running VMs of the failing machines cannot be salvaged. Consequently, VMProvisoning can only restart the failed VMs/jobs. How-ever, the designers of this tool argue that the VM alloca-tion policy plays a role in reducing the effect of physical machine (PM) failures. This hypothesis is demonstrated by implementing three VM allocation methods: (a) FCFS, (b) round robin, (b) greedy mechanism. Particularly, simula-tion results show that the greedy mechanism produces the highest mean percentage of successful tasks. Simply, since the greedy approach attempts to launch as many VMs as possible in a single host, host failures/disconnections had a lower probability to affect the running VMs.

iFogSim

iFogSim [41] is a CloudSim expansion designed to simulate Fog computing. The concept of Fog computing is to extend the resources of the cloud to the intermediate devices relay-ing the user to the DC. Hence, enabling such devices to reduce the load on the cloud whilst lowering the response time. This is especially valuable for smart IoT applications which is the context that this simulator is designed for.

The designers of iFogSim focused on the most popular IoT application type, Sense-Process-Actuate. The Fog topol-ogy, which is editable through a GUI in iFogSim, is com-posed of: (a) Sensor instances that can transmit the sensed data reactively or proactively to their nearest gateway, (b)

Actuators objects that can act on the physical environment depending on the sensed values, (c) FogDevices which are any network component able to process the sensed data. An application in iFog is modelled as a Distributed Data Flow (DDF) composed of multiple application modules repre-sented with the AppModule class whereas the dependency between modules is represented by an AppEdge class. Fur-thermore, the communication that occurs between two appli-cation modules/FogDevices is made through Tuple objects. Tuple, which is a subclass of Cloudlet, denotes the length of a processing task to be executed and the resulting output size that needs to be transmitted to the destination applica-tion module.

Depending on the remaining resources of the FogDevices as well as energy and placement in the topology, the Mod-ulePlacement must be implemented to define how applica-tion modules are attributed to FogDevices. iFogSim presents two default placement policies, (a) Cloud-only placement where application modules are hosted only by the cloud, (b) Edge-ward placement where the FogDevices close to the border of the network are favoured as long as their remain-ing resources are enough to run the module in question. Additionally, it’s possible to define new placement policies depending on one or multiple criteria (cost, energy con-sumption and latency).

CloudSimScale

In [42], CloudSimScale extends CloudSim to run DC objects in a distributed environment. The authors of this simulator use IEEE High Level Architecture (HLA) and Run-Time Infrastructure (RTI) [43] to enable the original CloudSim components to communicate during a distributed execution. The main extended classes are DatacenterBroker, Datacenter and CIS so that they can interact with object running on a remote host through a Local RTI Component (LRC). Dur-ing runtime, object are created and registered in a FED file to enable remote objects to subscribe to events of interest. Same, when an object publishes a new event, all the cor-responding subscribers are notified through HLA-RTI. The developers of this simulator further illustrate how the simu-lator scales by simulating 30000 Cloudlets, up to 200 VMs in a DC composed of 1000 hosts.

DFaaSCloud

To evaluate placement strategies of functions in Functions-as-a-Service (FaaS) clouds, DFaaSCloud [44] was devel-oped as an extension of CloudSim. Similarly to iFogSim, functions can be executed by an intermediate node (edge or network node) or in a DC. Each type of the aforementioned nodes is represented by the NodeGroup class which extends the Datacenter class to define computational resources of a

Page 9: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 9 of 34 249

SN Computer Science

collection of nodes capable of running the requested func-tion. To generate function requests, EventSource class is added to CloudSim to request the execution of a function with a defined profile. A single function is described by: (a) deadline, (b) violation cost, (c) event inter-arrival time, (d) CPU/RAM requirements, (e) processing size, (f) input/output size and respective proportion distribution over the groups of nodes.

Therefore, the execution of a function is modelled to receive data from different types of nodes (edge, network and DC nodes), execute the specified number of instruction and return the output which is represented by a DFaaSFunc-tion RspMsg that is sent to an EventSink object. The latter compares the response arrival time with the function dead-line to update the total violation cost. Furthermore, DFaaS-FunctionInstance is derived from the original Vm class to represent a single function that can run on a NodeGroup.

To enable developers to implement and evaluate their function scheduling strategy, DFaaSCloud adds: (a) the DFaaSFunctionScheduler class which gives a convenient way to implement a placement strategy of functions based on their profiles, (b) the InfraManager class which captures the performance of the placement policy in terms of deadline violation and execution cost, (c) the AnalyticsEngine which summarises the collected performance metrics related to data transfer durations and processing duration. This simu-lator also extends the topology editor from iFogSim and can import/export topologies from JSon file format. On the other hand, the function model cannot be used to model dependen-cies between functions.

Middleware Supervision

The middleware layer is obviously important as decisions of resources management are made at this level. The following simulators were developed to test network and host manage-ment policies developed at the middleware layer.

SPECI

SPECI [45] is a simple simulator that was built to investigate middleware supervision protocols of DCs. This simulator was written in Java and based on SimKit [46] discrete event engine. The developers of this toolkit mapped the host status coherence problem to the DC administration problem. At the simulator level, this mechanism is modelled by status update and failure events. Proactively, update events are executed for each node in the DC to update its subscribers set (nodes that are interested in its status) with the update period ran-domly chosen between 0.8 and 1.2 s. In transitive protocols where the status of a commune subscription is forwarded, a

time to live period is configurable to maintain the coherence of the information.

Failure occurrences are introduced at random to schedule failure events. Subsequently, when the handler comes across a failure event, a DC component is selected randomly to switch its status from alive to failed, or the other way around. Finally, a probing routine is executed to cross-reference the current status of the nodes with the status of the subscrip-tions. During runtime, each encountered inconsistency is captured and the probing event is rescheduled to be executed after 1 s. Through a case study, the designers of SPECI illus-trate the shortcomings of central management protocols by comparison with the distributed approach.

CRest

Implemented in Java, Cloud Research Simulation Toolkit [47] (CRest) is an open source cloud simulator providing a GUI to define the DC proprieties and performance results display. The simulation scenario is processed from a file containing a set of events (e.g. task submission and hardware failure/repair). The two main components of the DC model are Server and AirCon (air conditioner) objects. Both com-ponents implement the Failable interface which means that they can introduce a failure event. Though the DC model offered by CRest is abstract (e.g. service deployment is done by reservation only and the server temperature is adjusted statically), it is much more detailed than the one proposed by SPECI. The user request is represented by the Service class and is essentially described by its duration, number of sequential tasks and minimum resources requirement. Fur-thermore, dependencies between the deployed services are stored in an array of services IDs.

CRest is composed of different modules (thermal, energy, failures, services and subscriptions) that handle the simu-lation events. Distinctively, this simulator is organised in a Model-View-Controller (MVC) architecture to enable its extension without modifying the existing modules. Modules observe the event queue and depending on the nature of the event, new events can be generated. For instance, the deploy-ment and execution of a new service (original event) will result in the increase of the load/temperature/energy con-sumption (resulting events) in the server. This architectural option makes it possible to create a new module commu-nicating with existing ones without modifying them as the communication between modules is made through events. Moreover, depending on the user interests modules can be turned off seamlessly.

The authors of this simulator ran a series of tests to illus-trate the effect of the network topology and communica-tion protocol (P2P, Transitive P2P and Centralised) on the inconsistencies of the nodes status (alive/dead). To do so, the middleware subscription module, used to handle server

Page 10: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 10 of 34

SN Computer Science

fail/fixed events, acts as the Fabric Controller in Microsoft Azure Platform. The obtained results demonstrate that the topology clustering coefficient affects the time-frame neces-sary to correct the DC topology inconsistencies.

CloudSimSDN

CloudSimSDN [48] is an expansion of CloudSim developed to test resources supervision in Software Defined Network-ing (SDN) DCs. As efficient management of the DC com-ponents and data flows amongst VMs becomes increasingly complex, SDN offers the possibility to handle all procedures in a central controller. Hence, data flow management poli-cies can be easily defined in a dedicated and computation-ally powerful central instance. This is obviously a much more flexible approach to configure and handle data flows as the network components, such as switches, are directly manageable.

Network topology can be defined in JavaScript Object Notation (JSON) format, through CloudSimSDN GUI or directly in the source code. The user requests on the other hand are introduced in a CSV file containing a list of sub-mission times, computational lengths and the corresponding network traffic. The Request class, which represents a sub-mitted job, is composed of a sequential list of Processing and Transmission objects. Processing objects represent compu-tational tasks whereas the Transmission instances represent transmissions between VMs. Based on the chronological order, the next Transmission/Processing is launched from the latest invoked VM which is used to model a workflow.

PhysicalTopology, which represents the network topol-ogy, is constituted by a set of Link and Node (Switch and SDNHost) objects. To model the SDN paradigm, the central controller is implemented in the NetworkOperating class. The latter creates a virtual overlay on top of the physical topology through Channel objects. Subsequently, a Channel object is a route between two VMs, passing through physical components (Link and Switch objects), with a given prior-ity that defines the bandwidth allocated to it. Link objects maintain a list of Channels passing through them to manage the available bandwidth accordingly.

The bandwidth of the link is divided between the chan-nels transiting through it depending on the priority of each one. Alternatively, channels without priority setup share the remaining of the link capacity. Accordingly, as long as the bandwidth allocated to a channel does not change, the vol-ume of data transferred through it is equal to the underlined period of time multiplied by the allocated bandwidth. Of course, the transmission time is affected by the dynamically added/deleted channels sharing the same link. Furthermore, the bandwidth of the entire channel is the minimum band-width allocated in aggregate links.

As a validation, four simple scenarios were executed on both Mininet [49] and CloudSimSDN. Mininet is chosen for this task because it’s an accurate network emulator sup-porting SDN. Comparison between the delays for various traffic patterns reveal only a small margin of error for Cloud-SimSDN (less than 4.6% of the total delay). Additionally, to test their simulation tool, the developers implemented a simple priority based traffic management approach to show-case the utility of SDN driven DCs.

Energy Aware Provisioning

Energy provisioning is of paramount importance to the pro-vider as it can yield significant savings. For starters, energy savings can be performed at network equipment level as well as at host level. Predominantly, CPU load is the usual criterion to estimate energy consumption. More sophisti-cated simulators include air-conditioning as this component is a major contributor in the power consumption bill. The following simulators are focused on energy consumption modelling.

MDCSim

Multi-tier DC Simulation Platform [50] (MDCSim) is a commercial cloud simulator built upon CSIM [51], which is a discrete event simulator that was made to analyse inter-actions between the components of a complex system. The designers of MDCSim focused on analysing the multi-tier DCs architectures and how the distribution of servers affects both performance and energy consumption. This simulator is organised in three layers: (a) communication layer, (b) kernel layer, (c) user layer. In the communication layer, predefined network components and protocols are imple-mented in addition to the possibility of adding customised elements/protocols. The kernel layer is responsible of man-aging the running queue of tasks based on tasks priorities. Furthermore, tasks having the same priority are managed in a round robin manner. During task scheduling, I/O tasks are prioritised over CPU intensive tasks. Additionally, the user layer includes main processes (Web Service, Application Service and Data Base Service) and secondary processes (Disk Helper and Sender/Receiver Queues).

The server instances are represented by nodes contain-ing a CPU (kernel layer), Network Interface (communica-tion layer) and the running services. The data packets to be sent/received are queued in the corresponding queue to simulate the communication between server nodes. Based on the results obtained from their DC prototype, the design-ers of MDCSim noticed that 97% of database queries are served through local server cache (mirroring). Thus, the

Page 11: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 11 of 34 249

SN Computer Science

mirroring strategy and locking mechanism were modelled in the simulator.

CPU service time is estimated as the ratio between the uti-lization of the CPU and the application throughput whereas the utilization of a server is expressed by the product of number of dynamic/static requests with average execution time and divided by the monitoring period. Subsequently, the power consumption and application throughput are obtained by a linear regression based on server utilization in accordance to the model proposed by Wang et al. [52]. This simulator was validated by implementing a three-tier DC subjected to RUBIS [53] workload. Simulation results showed that the estimated application throughput, requests latency and power consumption followed closely the empiri-cal values.

GreenCloud

Implemented as an extension for NS2, GreenCloud [54] is a packet-level simulator focused on monitoring energy consumption caused by intra-cloud communication and the infrastructure components. This simulator is implemented in C ++ while cloud scenarios are written in OTcl. It’s also pos-sible to visualise the simulation execution through the Nam module. Based on results showing that the power consump-tion increases linearly with the expenditure of the CPU load while an idle server consumes around 66% of its full load power consumption [55], a technique referred to as Dynamic Network Shutdown (DNS) is implemented in this simulator to turn off idle servers. Besides, GreenCloud implements a power saving technique called Dynamic Voltage/Frequency Scaling (DVFS) [56] that adjusts the CPU frequency and voltage depending on the load. On the other hand, the remaining components such as memory and storage devices, which do not depend on the CPU frequency, consume a con-stant amount of power. Similarly, as the power required for transmitting data through a link depends on the distance and data rate, the power consumed by the switches is lowered by minimizing the data transmission rate [56]. Additionally, through DNS, the data flows can be consolidated to limit the number of involved routers (if the load is not high enough to create network congestion) and the idle routers are switched off to promote energy savings.

The practicality of the aforementioned power saving tech-niques depends on the nature of the workload. Depending on the computational and communicational requirements of the submitted jobs, the designers of GreenCloud categorize a job as a: (a) Computationally Intensive Workload (CIW), (b) Data Intensive Workload (DIW), (c) Balanced Workload (BW). For the computational requirements of a job, a dead-line parameter is introduced that represents the QoS. As for the communicational requirement, its model is a compound of both internal and external communication: (a) the volume

of data transferred to servers to launch the execution of the job, (b) the amount of data that needs to be communicated internally in the DC, (c) the size of data that needs to be transferred to the end user once the execution ends.

In a case study, the developers of GreenCloud compare energy consumptions of three DC topologies: (a) 2T, (b) 3T, (c) 3Ths. For each topology, the same number of serv-ers composing the DC (1536) is deployed; the workload is generated for 30% of the infrastructure occupancy while no energy aware scheduling methods are applied. In a 60 min simulation, the 2T architecture consumes 25 kW less than the 3T whereas the 3Th consumes 5 kW more than the latter. The low 2T consumption is the outcome of the Aggrega-tion layer switches bypass. However, unlike the 2T topol-ogy, the 3T can scale up to 10,000 servers. For the 3Ths, power consumption increases by comparison with the 3T due to the link capacity between the Core and Aggrega-tion layer (100GE for 3Ths vs 10 GE for the 3T). The effi-ciency of DVFS and DNS are studied as well, separately and combined. In a balanced workload between DIW and CIW, results show that DNS is the most effective as all unused resources are turned-off. Combining DVFS and DNS reduced the power bill in this experiment by 64.4%. It’s also worth mentioning that no workload consolidation method was implemented; otherwise, the DNS performance could be improved.

The advantage of using NS2 as a platform for network simulation is its precision in simulating network protocols/components, links characteristics and network events (con-gestion, transmission errors, interferences, etc.) which makes the simulation results very close to a real world experiment. However, the simulation time increases significantly with the increase of the cloud size and the scalability of the sce-narios is lower than flow based network simulators. Moreo-ver, GreenCloud is able to simulate only single core hosts without a VM model.

DCworms

Data Center Workload and Resource Management Simula-tor [57] (DCworms) is a cloud simulator, developed in Java, focused on power consumption modelling. This simulator is an extension of GSSIM [58] which is a distributed compu-tation system evaluation framework. For application speci-fication, DCworms uses Standard Workload Format [47] (SWF) while extending this definition in a supplementary XML file containing the application profile (e.g. SLA related properties). The application is represented with the DNA approach which can be used to model simple applications as well as dependencies between tasks in a workflow. The unit of application specification is called a phase which is represented by a constant set of requirements (CPU, RAM, network communication) over a period of time.

Page 12: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 12 of 34

SN Computer Science

Physical layout of the DC is defined by the components locations, computation/storage capacities as well as periph-eral devices such as air conditioning. Moreover, a power consumption profile for each component can be defined and plugged independently to follow the desired behaviour during the simulation. The power profile of a component is defined as a set of power states with each state specified by a list of properties and the power consumed during the corresponding state. Unlike other application models, the power consumed for running the application with the lowest supported CPU frequency (PCPUbase) can be specified in the application profile.

For power consumption estimation, three methods are possible: (a) static, (b) resource load, (c) application specific. In the static mode, the simulator simply sums the power con-sumed by each component depending on its current power state. Second, in resource load mode, power consumptions for each power state coupled with load values (usually for lowest and full load) must be defined. Subsequently, power consumption of a component is deduced with linear inter-polation depending on the current load of the component. Finally, in application specific mode, the application power consumption is introduced as a constant value and is accu-mulated with the power consumed by the CPU at the cur-rent load and frequency. At the end of the simulation, time-stamped power consumption values recorded by DCworms can be introduced in a Computational Fluid Dynamics (CFD) simulation for thermal behaviour analysis.

In a case study, the developers of this simulator tested several jobs (Tar [59], Abinit [60], etc.) in a Christmann high-density Resource Efficient Cluster Server (RECS) [61] composed of 18 CPU of different types (Intel i7, AMD fusion T40N and Atom D510). To compute the power con-sumed at various frequencies for each type of CPU, the jobs were lunched in exclusive mode (to remove performance noise caused by competing processes); then the duration of the job, PCPUbase and the power consumed at each frequency were extracted. Afterwards, the jobs were submitted at vari-ous intensity levels and the three above mentioned power consumption models were evaluated. Results illustrate that the resource load power consumption estimation mode is the most accurate with 5.2% relative error.

Further experiments were conducted to showcase the per-formances of five power aware task scheduling strategies: (a) random, (b) random with idle machines shutdown, (c) ran-dom with constant Lowest CPU Frequency (LF), (d) Energy Optimisation (EO) with lowest increase in power consump-tion, (e) EO with minimum power consumption over the execution period and idle machines shutdown (EO NMP). The worst performing strategy among the aforementioned ones is the random LF as the jobs makespan is extended due to low CPU operational frequency. Conversely, in EO NMP, the jobs are placed for execution in high capacity CPUs to

lower the product between execution time and power con-sumption (i7 in this case). Consequently, the jobs makespan is lower and CPUs are found in idle state faster which trans-lates in considerable energy savings when DNS is invoked.

E‑mc2

E-mc2 [62] is an energy consumption estimation framework developed as an extension of iCanCloud. The designers of this extension first introduce several core changes to iCan-Cloud architecture. For starters, instead of one global hyper-visor entity, a hypervisor is created for each server node while a global Cloud Manager instance is added to handle the VM requests submitted by the users. Plus, physical nodes are modelled in this extension unlike the initial iCanCloud version. Also, the costumer actions are represented by a set of Tenant Behaviour objects that define how jobs are attrib-uted to the rented VMs.

Each low level hardware resource is equipped with an Energy Meter while access to the resource is insured first by the operating system of the VM, then through the hypervi-sor. Furthermore, an Energy Manager object is in charge of aggregating the energy consumed by the hardware elements of the same physical machine. In turn, the Cloud Manager can be modified to implement provisioning strategies based on the power consumption status.

Compared to other cloud simulation tools focused on energy consumption, E-mc2 presents a highly detailed model in which each hardware component is presented with its power consumption states. At the start of the simula-tion, Energy Meters load the defined power states for each hardware component. Afterwards, energy consumption of whatever device is estimated as the sum of energy consumed in each state. The energy consumed in a given power state is calculated as the product of the time spent in the power state with the corresponding consumed power. For instance, the entire energy consumed by the HDD is the sum of the energy consumed in active and idle state (assuming seek state power consumption is negligible). In active state, the energy con-sumed by an HDD is estimated as the product of read/write operations duration and the power consumed at active state. The same approach is applied for the CPU and RAM. On the other hand, the Power Supply Unit (PSU) depends on an efficiency ratio (energy wasted due to the transformation of alternative current to direct current) which is defined by the manufacturer. Additionally, several power consumption models based on the manufacturers’ specifications are prede-fined with the framework (Maxtor hard drive, Sparkle PSU SPI700ACIG, etc.). Plus, the energy consumption of each physical resource is further customizable or can be deac-tivated while the power consumption of the motherboard/cooling peripherals can be added to the aggregated energy consumption.

Page 13: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 13 of 34 249

SN Computer Science

In a validation experiment, the designers of E-mc2 com-pare their simulation results with empirical values extracted with power meters and multi-meters. A noteworthy obser-vation is that the idle power consumption declared by the manufacturer is usually different than the reported value by the power meter. Transitively, this introduces estimation inaccuracies of energy consumptions in the different power states especially in the case of CPU. Nevertheless, the power consumption by the devices and the aggregated energy con-sumption are close to the corresponding simulation results.

CloudReports

CloudReports [63] is the most sophisticated version of CloudSim. This extension offers an intuitive GUI to cre-ate the simulation scenario and monitor its execution. Simulation results are stored in SQLite database format to enable customized results interpretation. Furthermore, this simulator can launch a series of test runs and gathers all the results in a convenient way. The Report Manager layer collects and structures the results, such as power consump-tion, in a HTML format containing illustrative charts. The most innovative feature of CloudReports is the Extension layer which enables users to add their implementation of core CloudSim classes in a form of plug-ins. Instead of modifying the code source of CloudSim or CloudReports, the users can load their compiled classes in a JAR format. This feature is possible thanks to Java Reflexion which was made publicly available since Java 8. Simply, through Java Reflexion package it’s possible to load methods/class mem-bers (regardless of their access modifiers) from a class file without the need for the code source of the corresponding Java class. For example, this feature is widely used for code hints in Integrated Development Environments (IDEs). In conclusion, the designers of this tool illustrate the usefulness of CloudReports for power efficiency through a case study where the PowerModel interface is implemented to model the power consumption of a Dell PowerEdge R820.

CloudNetSim++

CloudNetSim++ [64] is a cloud simulator built upon OMNET ++. Like GreenCloud, this tool is focussed on power aware management techniques and energy consump-tion modelling. CloudNetSim is presented with the advan-tage of having a visual topology editor. Moreover, it’s possi-ble to model geographically spread DCs. A DC is composed of various infrastructure components connecting the servers in a given topology whilst each component implements a power consumption model. Furthermore, DVFS and Adap-tive Link Rate [65] (ALR) are implemented in CloudNet-Sim ++ as well. Precision wise, CloudNetSim ++ simulates the network performance/procedures with high details

like GreenCloud. Conversely, the modular architecture of CloudNetSim ++ and OMNET ++ renders it more flex-ible than GreenCloud. Nevertheless, the server is modelled only by its processing power without a VM model and the application model is limited to a block of instructions/data transmissions.

GDCSim

Green DC Design and Analysis [66] (GDCSim) is a modular simulator for DCs that models the dependencies between the physical environment (air flow and heat) and servers status (utilization and power consumption). For starters, in addition to the specification of the workload executed dur-ing the simulation and SLA, the user specifies the physical layout, components (e.g. air conditioning) and room archi-tecture of the DC in Computer Infrastructure Engineering Language (CIELA). Based on the specifications, a pre-pro-cessing series of CFD simulations is carried out to gener-ate an approximation of the Heat Recirculation Matrix [67] (HRM). For each chassis in the DC, a simulation is executed where the chassis is running with full power while the others are in idle state. In the final simulation, all servers run at idle state. Further simulations can be added to the pre-processing phase with other settings of the servers to improve the HRM model. Figure 2 depicts how the heat recirculation is mod-elled in this simulator.

The core module of GDCSim is the Cyber-Physical Simu-lation Engine (CPSE) which handles the interplay between simulation elements. The developers modelled the following aspects of the DC: (a) performance as a function dependent on workload arrival rate and number of servers, (b) power consumption depends on CPU utilization, (c) heat genera-tion rate is quantified based on the server power consump-tion. Accordingly, the CPSE module updates power con-sumption depending on the workload and the heat map is updated based on the power consumption. For sufficiently accurate heat recirculation modelling, a heat map of the DC is expressed in inlet and outlet temperature vectors for the set of servers. The vented temperature from a server is estimated based on the air capacity of the chassis, the HRM, the power consumption and the temperature supplied by the Computer Room Air Conditioning (CRAC). On the other hand, the heat circulating back into the server chassis is expressed as the difference between the vented temperature and the air heat capacity of the chassis multiplied by the power consumed.

Resource Manager (RM) module is responsible of run-ning the DC management policies defined by the user. These policies can manage one or multiple aspects affecting the DC performance (workload, power consumption and air conditioning) based on the estimations made by the CPSE module. Afterwards, the new system status affected by the

Page 14: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 14 of 34

SN Computer Science

management policy (e.g. workload consolidation and CRAC thermostat setting) is communicated back to the CPSE.

The defining feature of GDCSim is the accurate tempera-ture estimation that enables the predication of throttling-down events which affect the response time of the server. This feature is supported by no other simulator. Alterna-tively, the HRM generated by GDCSim is approximated which can introduce cooling power and performance esti-mation errors. Thus, the designers of GDCSim demonstrate that with practical parameters settings (e.g. the increase of temperature due to heat recirculation bound between 5° and 10°), the cooling power could vary by 6% at max which is not of great significance. Through a real world experiment series, results show that the error produced by the GDCSim-HRM model is on average  %8.5 from a temperature per-spective. Moreover, the GDCSim-HRM model is compared with the full CFD model from an execution time perspective. Results of the comparison exhibit that GDCSim-HRM simu-lations are significantly less time consuming which makes this model applicable even for online resources management unlike the exhaustive CFD approach.

CloudSimDisk

Implemented as an expansion for CloudSim, CloudSimDisk [68] is dedicated to model HDD operations and their energy consumptions. An abstract HDD model and a HDD power consumption model are implemented on top of CloudSim. Additionally, several HDD power consumption models are implemented based on the manufacturers’ specifications. For starters, this CloudSim extension provides a much more detailed HDD model featuring: (a) manufacturer and model number, (b) volume, (c) disk rotation latency/speed, (d) average seek time, (e) data transmission rate to the buffer.

Then, the HDD energy consumption depends on the power consumed in the idle and active mode.

During the active mode, the duration of a read/write oper-ation performed on the HDD is expressed by the ratio of the file size on the data transmission rate, accumulated with the average seek time and rotation latency. Afterwards, the energy consumed by the operation is computed by multiply-ing the timeframe length with the power consumed in active mode. The designers of this simulator further consider the external effects on the HDD performance by a randomized coefficient multiplied with the average seek time and rotation latency. Figure 3 illustrates the main classes of CloudSim-Disk and how they are linked.

Moreover, the original Cloudlet class is modified to add files read/write requests. In addition to the first found algo-rithm implemented in CloudSim, CloudSimDisk imple-ments the round robin method to dispatch storage requests on HDDs. Each saved file is recorded in a file containing all existing file names and their sizes. As the users file requests arrive to the HDD module in a sequential order, only one request can be managed and a waiting time is computed based on the accumulative transaction times of the requests in front of the queue.

DISSECT‑CF

DIScrete event based Energy Consumption simulator for Clouds and Federations [69] (DISSECT-CF) is a recently proposed cloud simulator, written in java and designed to evaluate scheduling algorithms at every level of the cloud. Unlike other cloud simulators, demands for any cloud resource are represented independently by a resource con-sumption entity. This entity represents a link between a consumer of the resource and its provider which are both

Fig. 2 Heat recirculation in the data center

Page 15: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 15 of 34 249

SN Computer Science

represented by ResourceSpraeder instances. Consequently, at low level scheduling, the scheduler calculates at each simulation tick: (a) how much resources can be allocated to the consumer, (b) update the remaining length of the resource consumption (remaining processing until comple-tion). Furthermore, the simulator maintains dependency sets called influence groups which contain ResourceSpreaders (consumers and providers) that are directly or transitively related through consumptions. As the consumption relation-ship can influence the performance of the provider, develop-ers can use influence groups to build advanced scheduling or improve the model accuracy in terms of how a resource is shared amongst the consumers. For instance, from a net-work perspective, multiple PMs (providers) can be transfer-ring data to the same destination (consumer). Subsequently, whenever a consumption reaches its end (remaining process-ing = 0) or a new consumption is registered between two ResourceSpreaders, DISSECT-CF updates the influence groups affected by the change only. The chart in Fig. 4 sum-marises how DISSECT-CF runs the simulation.

For power consumption modelling, each component has basic power states (defined by maximum/minimum power consumption) plus the transitions between them (switch-ing on, running, switching off and off) and their durations. The user can introduce consumptions that represent a background process during the power state as well. Power consumption is then estimated based on the power state

of the component and the current utilization. To empower energy aware scheduling, DISSECT-CF defines several types of Energy Meters that monitor energy consumption: (a) Directed Energy Meters which are associated with a ResourceSpreader power consumption, (b) Energy Meter Aggregator which encapsulate Directed Energy Meters that constitute the same entity, (c) Energy Meter Indirect which captures power draws of the DC components that are not represented with a ResourceSpreader (e.g. air conditioning).

From a network perspective, VMs sharing the same PM use their host Id while communications between PMs/VMs are flow based. In each network node, incom-ing and outgoing bandwidths are represented by two sepa-rate ResourceSpreaders. Then, when a transmission (con-sumption) is made between an outgoing and incoming ResourceSpreader, the transmission is registered to be pro-cessed after the link delay to introduce the network latency.

Upon reception of a task, the VMs create the correspond-ing resources consumption (CPU/Network). However, RAM occupation is only simulated by a static reservation and local disk access is not modelled. Remote disk access on the other hand is simulated as a data transfer. Moreover, in addition to a few predefined simple scheduling policies at various lev-els, DISSECT-CF offers support for scheduling procedures development through various APIs which enable designers to: (a) extract resources utilization/performance at any given moment of the simulation, (b) manage the PMs power sate

Fig. 3 HDD model in CloudSimDisk

Page 16: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 16 of 34

SN Computer Science

and utilization based on the workload, (c) VM state man-agement (offline migration, serialisation, suspension, etc.), (d) PM registration/deregistration (can be used to simulate component failure).

To validate DISSECT-CF, several series of simulations were conducted for: (a) CPU sharing, (b) disk memory uti-lization, (c) network operation, (d) power consumption. For CPU sharing, a set of tasks with increasing length are exe-cuted simultaneously in a VM. The VM is setup to occupy entirely the resources of the host which is of two types: (a) AMD Opteron with four independent CPU cores, (b) Intel i7 with two cores and hyper-threading (HT) (which is equal to four logical CPUs). Whilst the simulation of the AMD Opteron reveals high accuracy (0.1% relative error), the simulation of processors with HT need to be calibrated depending on the concurrency level and a processing limit must be defined instead of assuming maximum (or mini-mum) processing capacity of the logical CPU. Transitively, power consumption accuracy depends on the accuracy of the ResourceSpreaders defined by the simulator user. The

network model on the other hand is validated by a compari-son with a small real world experiment. The relative error of the latter was close to 0.5%.

Then, several series of simulations were executed to com-pare DISSECT-CF with CloudSim and GroudSim from a scalability and execution time perspective. The developers of this simulator point out that GroudSim and CloudSim lack in precision in several aspects. For instance, both simulators do not account for the VM image transfer time and assume that the VM is instantaneously running after the arrival of the request. Despite the higher level of details in DISSECT-CF, the simulator scales better than its counterparts when the number of parallel tasks is increased. Same, the simulation time of DISSECT-CF is generally lower than GroudSim and CloudSim. Though activating power meters increases the simulation time depending on the monitoring periods, the execution time of DISSECT-CF remains lower than those of CloudSim and GroudSim. In a simulation series under various concurrency levels and tasks length, CloudSim time-shared/space-shared and GroudSim CPU sharing policy are

Fig. 4 Influence Group update algorithm

Page 17: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 17 of 34 249

SN Computer Science

compared with the way DISSECT-CF simulate task execu-tion. The results show that CloudSim time-shared mode increases the simulation drastically due to recurrent calls of tasks/VM processing method. Similarly, GroudSim CPU sharing strategy increases the execution time because all task finishing events in queue must be updated everytime a task ends. Conversely, the DISSECT-CF approach to use ResourceSpreders and Consumptions limits significantly the complexity of task completion estimation.

VM Provisioning

VM provisioning strategies can change the performance of the DC significantly. Algorithms proposed for this task can: (a) lower the probably of task failure and SLA breaches, (b) lower the number of occupied hosts to promote DNS. After deployment, the workload might be consolidated in fewer hosts to conserve energy. Though, most simulators model this process as a transfer of the VM over the network, empirical evidence show that this operation is more com-plex in case of live migration. The following simulators are focussed on VM Provisioning.

DCSim

DCSim [70] is a CC simulation tool written in Java and designed to test VM Placement Policy (VMPP) algorithms. The main simulated object, which is an instance of the Data-Center class, is composed of Host objects representing the physical machines responsible of running VMs. Each host is characterized mainly by: (a) utilization percentage, (b) power consumption model depending on CPU usage, (c) current status showing whether the host is On, Off, Sus-pended or in a transition between states. The VMPP can dictate on hosts to over-commit their resources. This strategy aims to run as many VMs as possible on each powered-on Host without violating SLA. In such a strategy, in case a host reaches 85% of its capacity, the VMPP algorithm intervenes to offload VMs from the host under stress. Alternatively, if a host is under-used, VMs of the host in question are migrated towards other hosts. Subsequently, the under-used host is instructed to operate in low power mode.

In each host, each type of resource is handled separately. Afterwards, in case of allocation of a new VM, or incoming merger, each resource manager determines if the required resources can be supplied. Scheduling wise, DCSim support both simple allocation of resources and workload conserv-ing. In the latter, while the scheduled VM is occupying its share of a specific resource, the remaining available amount of the resource in question is used by another competing VM. As a result, this strategy is meant to avoid resource wastage and lower task execution time.

The application model of DCSim can represent a multi-tier application running in multiple VMs. Incoming work-load is introduced through a trace file specifying the time and number of requests. Then, in case one of the VMs run-ning the application is unable to process the requests due to resources shortage, the VMPP can transfer the VM to a more resourceful host or replicate it to maintain SLA. How-ever, the migration process supported by DCSim is an offline one which introduces a violation of SLA. Alternatively, a replication process creates a copy of the overloaded VM in another host to receive a share of the workload. Ultimately, to evaluate the performance of the running VMPP, DCSim memorises: (a) SLA violations, (b) combined time the hosts were in On state, (c) average host utilization, (d) power/energy consumption.

CloudShed

CloudShed [71] is an independent cloud simulator written in Java and developed to evaluate the efficiency of multi-crite-ria VM allocation algorithms. This simulator incorporates a GUI to define the number and capacities of the servers com-posing the DC. Likewise, user requests are setup in a form of number and type of VMs. Otherwise, the requests can be introduced in a structured file containing a set of: (a) lifetime interval, (b) VM type. The designers of this tool argue that considering only the CPU of the server for VM allocation is ineffective. As an example, Least Imbalance-level First (LIF) is implemented which searches for the server having the low-est average usage of resources (CPU, RAM and network). Although considering other resources for VM placement can improve the performance of the DC, giving the same prior-ity for all three resources is far from ideal as the CPU is the most important resource, load balance and power consump-tion wise. Furthermore, the GridSim model [7] is used to estimate CPU power consumption which is expressed by the utilization ratio and idle/max power consumption. Then, the total energy consumption is computed as the sum of the duration of each CPU utilization state multiplied by the corresponding power consumption. Overall, this simulation tool can run massive simulation scenarios due to its simple model although it can be used only to evaluate the VM allo-cation algorithms.

SimGrid VM

SimGrid VM [72] is an extension of the well-known grid computing simulator, SimGrid. The main focus of this simu-lator is to enable VM management in cloud scenarios while modelling realistically live migration procedures. This simu-lator proposes both a C ++ and Java API for developers to implement the desired cloud mechanisms. Most importantly, while most simulators supporting VM migration assume that

Page 18: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 18 of 34

SN Computer Science

this procedure is as simple as transmitting the VM bitmap to the new host, the designers of SimGrid VM demonstrate that live migration is strongly correlated with the memory update speed.

By extending SimGrid, this simulator incorporates all its exiting functionalities (result tracer, MPI application model, etc.). To introduce virtualisation, a VM workstation model is added. The VM workstation inherits all PM workstation call-backs while implementing VM specific call-backs (destroy, migrate, etc.). The user can also define the maximum usage of a resource for a VM as well as attributing a number of CPU cores to a VM through the affinity property.

Tasks execution is simulated based on the same SimGrid mechanism. Resources shares (CPU/network) for each task are computed to estimate execution times whereas there’s no HDD model implemented in this simulator. Both PM and VM contain resources objects. On the other hand, the VM resource object has a pointer towards the host corresponding resource object. The resources shares for co-located tasks/VMs can be either distributed equally or on priority basis. As a result, by representing the VM as a task, the virtuali-sation model is guaranteed. Additionally, this architecture supports nested VMs.

Experiment results highlight a significant loss in preci-sion when the live migration process is simulated without accounting for memory pages updates. The developers of this simulator demonstrate the correlation between CPU usage and memory update speed. Transitively, the effect of memory updating speed on the migration time is illustrated. The live migration model implemented in this simulator is based on the pre-copy algorithm [73] which initially cop-ies all memory pages of the migrated VM. Since in a live migration the migrated VM remains running in the original host, memory pages will be updated after the first trans-fer. Thus, the updated memory pages must be transferred until the number of updated memory pages becomes small enough. Finally, the VM is halted and restarted at the new host after the transfer of the VM devices states. Accord-ingly, during live migration in SimGrid VM, the size of the updated memory pages is estimated based on: (a) CPU share attributed to the migrated VM, (b) the memory update inten-sity which is defined by the simulator user. The memory update intensity is the number of memory pages updated after one CPU cycle.

VMPlaceS

VMPlaceS [74] is an extension of SimGrid VM intended for VMPP algorithms evaluation. This tool inherits the VM modelling and the live migration procedure proposed in SimGrid VM. Initially, VMs are deployed in a round robin fashion on the available PMs. Afterwards, the simulation is executed through two processes: (a) load injector, (b)

scheduling algorithm. The load injector controls the CPU usage of the deployed VMs whilst the scheduling algorithm proactively verifies the validity of the configuration. The VM load is introduced by the user as three parameters that define a Gaussian distribution over the simulation time by: (a) time after which the VM load will change, (b) mean value, (c) standard deviation. For a configuration to be via-ble, the cumulative CPU capacity required by the hosted VMs must be inferior to the maximum capacity of the host. Otherwise, once a non-viable configuration is detected, the VMPP searches for a reconfiguration plan to resolve the problem. The authors quantify the quality of the solution based on: (a) computation time, (b) reconfiguration time, (c) number of required migrations. Also, VMPlaceS extends the trace module of SimGrid to generate a JSON trace file and enable personalized user queries.

To validate their simulation tool, the developers imple-mented the Entropy scheduling approach in a Grid 5000 test-bed and compared its performances to the ones produced by the same algorithm in VMPlaceS. In a 3600 s simulation interval, the average difference between the reconfiguration time produced by VMPlaceS and the real world experi-ment was 12%. The developers attribute this error margin to having simultaneous migrations towards the same PM. Subsequently, three VMPP algorithms were implemented and compared: (a) Entropy (centralized), (b) Snooze [75] (hierarchical), (c) Distributed Virtual Machine Scheduler [76] (DVMS) which is a distributed method.

Results of the comparison show that the central method performs the worst (almost as bad as having no running VMPP algorithm). This is particularly due to the fact that the computation time increases significantly in the central approach as the number of simultaneous non-viable con-figurations is raised whilst all solutions are computed by a single PM. The hierarchical and the distributed methods perform significantly better. However, DVSM has the best performance as this algorithm promotes parallel resolution of non-viable configurations. Snooze performance on the other hand depends on the number of group leaders i.e. nodes responsible of computing the reconfiguration plan.

DynamicCloudSim

DynamicCloudSim [77] is an extension of CloudSim dedi-cated to mimic variance of VMs performances in a public cloud. Based on observations in previous study [78], the developers of DynamicCloudSim introduce three types of performance instability: (a) Heterogeneity of physical hard-ware (Het), (b) Dynamic Changes at Runtime (DCR), (c) Stragglers and Failures (SaF). From a VM provisioning perspective, since the policy that a public cloud provider applies is unknown, DynamicCloudSim deploys a new VM

Page 19: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 19 of 34 249

SN Computer Science

randomly in a host with enough remaining resources rather than search for the fittest host.

According to previous VM performance studies in Ama-zon EC2 [78, 79], the Relative Standard Deviation (RSD) of execution times, of the same CPU intensive task in homo-geneous VMs, is between 0.35-0.4. Moreover, further stud-ies show that the RSD of I/O performance is around 0.15 and 0.2 from a network performance perspective. Thus, to randomize VM performance, the amount of resources sup-plied to it is sampled from a normal distribution rather than a stable value as in CloudSim. The mean value is set to the default value as in CloudSim whilist the RSD is provided by the user.

For DCR modelling as reported in [80, 81], the VM per-formance is sampled from an exponential distribution with a rate parameter that defines the evolution of VM perfor-mance during runtime. Additionally, the noise introduced by deploying a new task in a VM is also accounted for as a decrease of the performance in a form of a normal distribu-tion with a small RSD value. The straggler VMs phenome-non is incorporated by a probability value (0.015 by default) which represents the probability of a VM to constantly perform at 50% of its real capacity. Finally, task failure is represented by a simple probability value in addition to an execution time coefficient to extend the failed task duration.

To demonstrate the effects of performance variation on the throughput of a workflow schedule, several series of simulations were conducted with different types/RSD of performance variation on two workflow applications: (a) Montage [82], (b) Genomic Sequencing [83]. Four work-flow scheduling algorithms are applied: (a) round robin, (b) Heterogeneous Earliest Finishing Time [80] (HEFT), (c) Greedy, (d) Longest Approximate Time to End [81] (LATE). Particularly, LATE shows great resiliency against all types of performance variation, even in extreme conditions. This is due to the fact that LATE tracks the progress of running tasks to detect underperforming VMs. Accordingly, tasks with slow progressions are replicated in VMs having a per-formance above average. Which makes this scheduling algo-rithm less affected by task failures, VM struggling and DCR.

In a validation experiment, a series of simulations for each scheduling algorithm was carried out with the same configuration in DynamicCloudSim and in a public cloud environment (Amazon EC2). In accordance to the obser-vations made in [84], the cloud hardware composing the Amazon EC2 and the DC simulated by DynamicCloudSim, are Intel Xeon E5430 and AMD Opteron 2218/Opteron 270 machines. Though the average execution time of the Montage workflow in DynamicCloudSim and Amazon EC2 is somewhat similar, the variance in execution times in DynamicCloudSim is high by comparison with the obtained Amazon EC2 output. The authors attribute this outcome to the fact that in the public cloud experiment, there was not

as much struggler VMs and task failures as in the simula-tion. Besides, the RSD parameter used to introduce SaF in this experiment is extracted from a larger experiment setting (20 VMs vs 200VMs). Nonetheless, performances obtained with DynamicCloudSim follow the same patterns of Ama-zon EC2 experiments.

Atac4cloud

ATAC4Cloud [85] (Agent Technology and Autonomic Computing for Cloud) is an extension of CloudSim sup-porting intelligent DC auto-management. This simulator introduces: (a) user interface for scenario configuration and to view results, (b) persistence layer to save simulation data. The key contribution of this framework is the inclusion of a dynamic resource provisioning component built with agent technology. Resources (DC, PM and VM) are coupled with Multi Agent Systems (MAS) where each resource MAS is composed of a controller and a problem solver. Furthermore, each resource MAS is registered in the Data Facilitator agent to enable the Cloud Infrastructure Autonomic Manager Agent to find the appropriate resources when a new request is received. This extension also introduces environmental data (e.g. power consumption in the region) with which the provisioning agent can adapt its strategy.

NutShell

Nutshell [86] is built on top of NS3 and comes preloaded with many helper classes that enable the user to create a simulation easily. This simulator benefits mainly from the level of details offered by the NS3 packet level simulator. In addition to existing NS3 helpers, like PointToPointHelper that defines the link/channel proprieties, Nutshell adds a helper class to create the DC architecture as a Fat Tree or Three Tier. For either topology, the user must create a con-figuration object from ThreeTierConfig or FatTreeConfig. This configuration object defines: (a) minimum/maximum processing power for hosts and VMs, (b) storage capacities, (c) sizes of the requested data per VM, (d) splitting ratio for the VM, (e) VM scheduling algorithm. For the Three Tier, the topology is simply defined by setting the number of hosts per access switch as well as the number of switches at each layer. After the complete setup of the configuration object, the latter is passed as a parameter to the constructor of ThreeTier/FatTree to create the DC before launching the NS3 simulator.

A VM is deployed to run only one application and can transfer data to other VMs to simulate a distributed applica-tion. A base class is implemented in VirtualMachine that is defined mainly by: (a) processing power in Tera FLOPS, (b) size of the user application, (c) local storage share. For convenience, multiple classes of VM are predefined in the

Page 20: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 20 of 34

SN Computer Science

simulator where each one is useful in a given context: (a) ComputationalLocalDataVM, (b) NetworkVM. Computa-tionalLocalDataVM is derived from VirtualMachine whilst adding the following attributes: (a) size of data to read from HDD, (b) quantity of RAM used to read data, (c) reading/writing rate for both RAM and HDD. This class is ideal for an application running on a VM that does not require trans-missions over the network. On the other hand, to model a VM that requires data transfer/reception over the network, the NetworkVM is used. This class adds source IP and port to the VirtualMachine class. Furthermore, NetworkVM is derived in three classes. First, ConsumerProducerVM is used for a VM that transfers and receive data over the network. Moreover, ProducerVM and ConsumerVM are used to represent VMs that either transfer or receive data respectively.

To estimate the running time of an application, the simu-lator computes the time required to transfer data from HDD to RAM and from RAM to CPU. To do so, the simulator leverages the characteristics of the VM introduced by the user. To compute the time to transfer data from HDD to RAM, the size of the data is divided by the amount of RAM dedicated to data fetching. To compute the time to transfer data to RAM, this ratio is multiplied by the sum between: (a) the access delay to HDD, (b) the size of dedicated RAM divided by read/write rate from HDD. Similarly, the time required to transfer data from RAM to CPU is computed by multiplying the number of accesses to RAM with sum of: (a) delay to access RAM, (b) dedicated RAM for data fetch-ing divided by the product of the read/write rate from RAM and the average number of accesses to the CPU. Finally, the sum of the two values is the estimate for the duration of data transfer from HDD to CPU.

Initially, the user defines in the configuration object whether the requested VM can be split into multiple VMs of smaller sizes during scheduling. This is applied if the requested processing power for the VM cannot be accommo-dated in a single host. If this option is enabled, the user must specify the number of sub-VMs and the ratio of processing power for each. Moreover, the simulator comes with three VM scheduling policies: (a) First Come First Serve–First Fit, (b) Shortest Job First–First Fit, (c) Longest Job First- First Fit. Each of the algorithm checks if the VM can be deployed in a single host and if not, whether the splitting option is set. If splitting is set and the VM cannot be deployed in a sin-gle host, multiple sub-VM are created based on the defined ratios. Afterwards, the selected scheduling algorithm is applied for each sub-VM.

The advantage of this simulator is that it can be setup with less code, capture the network events with a realistic level of details whilst providing a lot of features to the user. Additionally, some features that are neglected with flow based network models are accessible with NS3 like network

queues and packet drops. Connectors, like MacTxDrop and MacRxDrop that are defined in NS3, can be implemented to register other events during simulation. The developers offer NutshellDataCollector as an event trace class that can be configured to capture individual/collective events at VM, host and network level. However, packet level simulation takes substantially more time to run. Moreover, the applica-tion model proposed by this simulator is limited.

GPUCloudSim

GPUCloudSim [87] is another extension of CloudSim devel-oped to study the provisioning policies for GPU-enabled VMs. Unlike the remaining frameworks, this simulator models resources sharing at GPU level in terms of memory, processing as well as PCI bandwidth. To model the effect of competing tasks accessing critical resources (memory and PCI bandwidth), the authors used the model proposed in [88].

To do so, the simulator is structured in a layered architec-ture. The first layer added on top of CloudSim extends Pe to define Pgpu, which represents a single GPU element com-posed of multiple Streaming Multiprocessors (SMs) where each SM is represented by a Pe. Same, GPUCloudlet extends Cloudlet to add GPU processing requirements. GPUTask class is added which is composed of multiple blocks of instructions as well as required memory transfers (repre-sented by MemoryTransfer class) for the task to be executed.

GpuHost and GpuVm extend the base CloudSim classes to represent GPU enabled hosts/VMs. Furthermore, each GpuVm is associated with a Vgpu instance that represents a virtual GPU. Furthermore, a ParformanceGpuHost is derived from GpuHost to introduce performance degradation caused by multiple vGPU sharing the physical GPU resources. Like the built-in CloudSim schedulers, time shared (VgpuSched-ulerTimeShared) and space shared schedulers (VgpuSched-ulerSpaceShared) are implemented to run vGPUs on a single GPU. Additionally, a VgpuSchedulerFairShare scheduler is implemented to give the same share to vGPUs in a round robin fashion. Likewise, GpuTaskScheduler interface is added to schedule GpuTasks on vGPUs. The developers further implement a scheduling policy to share resources between tasks called GpuTaskSchedulerLeftover.

Each GPU has a separate provisioning policies to provide bandwidth, processing power and RAM to the competing vGPUs. In order for additional cost incurred by competing vGPUs to be accounted for by the model, the Interferenc-eModel interface must be extended by the memory/proces-sor scheduler. The interference is computed using the model presented in [88], which is expressed by: (a) the number of Instructions Per Cycle (IPC) executed when GPU shared divided by the total IPC of the physical GPU, (b) PCI band-width provided per vGPU divided by the entire bandwidth.

Page 21: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 21 of 34 249

SN Computer Science

Furthermore, power model for the GPU is introduced as a linear model expressed by: (a) power consumed at idle state, (b) frequency of the GPU, (c) utilization.

Economical Modelling

Modelling the outcome of the commercial strategy is of course important for client and provider alike. Relatively, this is the simplest aspect to model. The tools discussed in this section are centred on the economical model of the cloud.

SimGrid Cloud Broker

SimGrid Cloud Broker [89] (SGCB) is a cloud simulator based on SimGrid. The main feature offered by this simula-tor is the possibility to represent a flexible pricing model and detailed cost estimation of the user requested services. The AWS (Amazon Web Service) cloud is modelled as distrib-uted regions connected through 100MBs links. Each region is composed of availability zones (data canters). Further-more, for each region, a S3 (Simple Storage Service) and EBS (Elastic Block Service) are made available for the cor-responding availability zones.

During runtime, the Spot Instance Management (SIM) module proactively verifies whether a spot instance should be terminated due to an increased bid. The initial price of a spot instance is setup to a minimum value and then incom-ing bids are generated randomly, loaded from a trace file or generated by a defined model (e.g. depending on resource availability and temporal demand pattern). Further, SGCB offers a detailed model of the AWS pricing policies. Based on the instance type (on-demand, reserved, on-spot), life cycle and used resources (network traffic and storage), the Accounting module proactively updates the user bill until the instance is terminated.

Infrastructure description as well as the VM instance specifications are defined in XML as in SimGrid. Also, SGCB offers APIs similar to the ones proposed by AWS to engineer cloud scenarios (e.g. request storage/VM instance, terminate/restart instance, inquire about the cur-rent bill, etc.). Simulation results are recorded in terms of tasks makespan for each phase (read, write, execute, etc.), price for each service (network, computing and storage) and instance life cycle. Though initially, this simulator was made to model the AWS specifically, its network model can be modified to represent other cloud infrastructures. Same, the variable pricing model of AWS can be modified to imple-ment other cloud providers’ policies. More importantly, based on the variable pricing model and resources availabil-ity, advanced cloud brokerage algorithms can be developed.

Bazaar‑Extension

Bazaar-Extension [90] was proposed to model the cloud as a marketplace with dynamic negotiation strategies. The designers of this extension map the cloud pricing model to virtual value chains as the information about the current market is used to adjust the values/prices of resources. As it’s unlikely that a cloud provider can sell all its resources with a fixed price [91], flexible pricing must be considered to sell the remaining resources. To fully model this economical aspect of the cloud, a Negotiation Manager and a Negotia-tion Strategy are added to CloudSim to handle exchange of offers/counter-offers between the client and the provider.

Offers and counteroffers are messages containing the VM characteristics and price. The provider initial offer is called a template which represents the starting price of a given VM. Subsequently, the client can start a negotiation by sending an offer to the provider with the VM characteristics and the value i.e. the price that the consumer is willing to pay. The offer is then handled by the Negotiation Manager by either creating a new Negotiation instance or forwarding the offer to the corresponding Negotiation if it’s already created. Upon offer/counteroffer arrival, the implemented Nego-tiation Strategy makes a decision (accept, reject or make a counteroffer). The offer is evaluated with a utility func-tion and the response depends on user defined thresholds which subsequently evolve with the market status (remaining unsold resources).

Additionally, the designers of Bazaar Extension pro-posed a Negotiation Strategy where counteroffers are gen-erated with the Genetic Algorithm (GA). In the approach in question, the initial population is generated based on the client offer. Mutations (changing one characteristic of the requested VM) and Crossovers (taking half the VM char-acteristics from each of the two parents) are applied on the population for a defined number of iterations before send-ing back only the individuals with the highest fitness values to the corresponding negotiating instance. Obviously, the higher the utility value of an offer is, the higher the probabil-ity of the offer being accepted. However, based on the prem-ise that the provider does not know how the consumer com-putes the utility and vice versa, each negotiating instance use an estimation utility function of its corresponding partner. Hence, the fitness value for each individual is the cumulative value of its residual utility with the estimated utility value for the partner. The estimated utility value for the partner is multiplied by a weight coefficient to introduce a cooperative-ness parameter. The higher the weight is, the higher the fit-ness value depends on the partner utility. Finally, the market is evaluated based on the Bazaar Score which reflects the sum of the clients’ surplus and the provider surplus. Addi-tionally, the VM characteristics generated by offers/counter-offers are visualised by f(x)yz [92].

Page 22: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 22 of 34

SN Computer Science

Application Modelling

Modelling application execution and estimating comple-tion time is a very complex task due mainly to concurrency and multi-tenancy. Moreover, communication between tasks is needed for distributed applications. Simulators task representation is primarily based on the amount of computation required to complete the latter as well as the amount of data that needs to be transferred upon comple-tion. Though this is sufficient to represent a simple appli-cation, more sophisticated application models are required if, for instance, the task is data intensive. The subsequent simulators are centred around application modelling.

NetworkCloudSim

Based on the observation that CloudSim models an appli-cation as a number of instructions only, NetworkCloudSim [93] was proposed as an extension for CloudSim to model communication between tasks. Furthermore, CloudSim has a limited network model where each VM is supposed to be connected directly with all other VMs without any intermediate network components. Hence, this extension also presents a DC interconnection model.

The original Cloudlet class from CloudSim has been extended in a new class called NetworkCloudlet which represents a task unit that composes a complex/distributed cloud application (represented by the AppCloudlet class). During the execution of a NetworkCloudlet, if a packet needs to be sent, the packet is first queued by the running VM. Afterwards, the packet is transferred to the VM run-ning the destination NetworkCloudlet either directly (if the latter is running within the same host) or forwarded through the switches until the destination host is reached. Moreover, NetworkCloudSim can run user-tailored routing protocols as well.

For network modelling, the designers opted to model the traffic as a flow for performance reasons. When a communication needs to be made, the flow duration is estimated as the ratio of the flow size divided by the bandwidth. If multiple flows are transmitted simultane-ously through the same link, than the bandwidth of the corresponding link is shared by the concurrent flows either equally or through some user-defined priority rules. Three main classes are added to model the DC topology/intra-communications: (a) Switch, (b) HostPacket, (c) Net-workPacket. First, the Switch object is added to model the topology of the DC. HostPackets are introduced to model communications between VMs running within the same host. Thus, this type of flows is subjected to no delay. On the other hand, NetworkPacket represents the

communication between VMs running on separate hosts. This type of flow travels through Switch objects which will generate delay evidently.

To validate this simulation toolkit, the designers com-pared the execution time of a distributed application in a small infrastructure (8 VMs and 4 hosts) with the corre-sponding simulation scenario output. The tested application is composed of multiple tasks that generate periodically ran-dom numbers and send them to the rest of the tasks. First, the number of communicating tasks and sent messages are varied in two series of simulations. The output of the experi-ment show that execution times in the simulation match the empirical values. In a secondary case study, two task sched-uling strategies are tested: (a) random non-overlap where the scheduled NetworkCloudlet is executed until it is fin-ished, (b) random overlap where if the scheduled Network-Cloudlet is waiting for a packet, the next NetworkCloudlet is executed. Results illustrate that random overlap performs significantly better as the response time is lowered by shift-ing from a task on I/O standby to a task that can continue its execution.

SmartSim

SmartSim [94] introduces a Smart Mobile Device (SMD) model to simulate Mobile Cloud Computing (MCC) on top of CloudSim. The SMD model is composed of two proces-sors, one dedicated for network communication tasks (Base-band Processor) and the other runs operating system besides the user applications. Each application is a compound of Operation objects whereas each one of these objects is defined based on its CPU and RAM requirements. Further-more, whether an Operation can be executed only locally or outsourced to the MCC, is defined by a pattern attribute. In a scenario where the running applications requirements exceed the SMD capacity, the application scheduler (DAp-plicationScheduling class) is able to delegate Operations to the cloud. Though the model proposed in SmartSim is very simple, it enables developers to test custom-made schedul-ing policies for SMD applications without having hardware/programming language knowledge specific to the SMD.

WorkflowSim

WorkflowSim [95] is a cloud simulator designed to investi-gate workflows scheduling and job clustering effect on the workflow duration. This extension of CloudSim includes layered overhead (queuing/clustering delay) for horizontal job clustering to estimate more accurately the entire work-flow execution time. From a fault tolerance perspective, this toolkit contains a job/task failure generator and a failure monitor. The failure generator is responsible of introduc-ing task/job failures depending on a probability distribution

Page 23: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 23 of 34 249

SN Computer Science

defined by the user while the monitor feeds the informa-tion about the failure event to the workflow scheduler. Then, through the workflow scheduler, it’s possible to define an adaptive strategy to lower the effect of sub-tasks failures on the workflow duration.

To validate their work, the designers of WorkflowSim compare the runtime of the Montage workflow in a real dis-tributed environment with values of three series of work-flow simulations: (a) without task/job dependencies, (b) with task/jobs dependencies but without layered overhead, (c) with all delays accounted for (overhead and dependencies). Initially, clustering delays are extracted from real world experiment values. The results show significant inaccuracies of workflow durations when the overhead is not accounted for. Alternatively, when all features of WorkflowSim are acti-vated, the execution times are close to the empirical values. Further tests show that the clustering overhead depends on k: the maximum number of jobs at a horizontal level; which means that the extracted clustering delay for an experiment cannot be used for a simulation with a different k value. Moreover, the effect of tasks re-clustering on the execution time is illustrated. Results show that decreasing the number of tasks in a job decreases the effect of tasks failures on the duration of the workflow.

EMUSIM

EMUSIM [96] is an application performance prediction toolkit based on emulation and simulation. First, the appli-cation responses times, under various concurrency levels (simultaneous requests) and number of sub-tasks dispatched on the available VMs (parallelism), are gathered through emulation. To do so, the user application is launched in a cluster using Automated Emulation Framework [97] (AEF). Based on the specification of the user, a number of VMs with a predefined configuration are deployed in the cluster. Then, latencies are adapted as well depending on the configuration input file. Subsequently, the workload is directed to QoS Aware Application Deployer [98] (QAppDeployer) module which is responsible of deploying the tasks composing the emulated application on the running VMs. Depending on the emulation configuration, a VM can only serve a limited number of requests simultaneously. Accordingly, QApp-Deployer deploys tasks only on VMs handling fewer tasks than the maximum threshold. Otherwise, if one of the tasks composing the application cannot be deployed, the request is rejected and this event is recorded. Ultimately, once the results of all the subtasks of the same request are delivered to the front-end, the response time is recorded in addition to the level of concurrency and number of deployed VMs. This process is repeated while increasing the number of avail-able VMs (until the maximum number of deployed VMs is attained) and with an incrementally increased concurrency.

Using CloudSim and the application model generated by emulation, EMUSIM launches simulation scenarios iden-tical to the emulation configurations for model validation. This step showcases the accuracy of the application model extracted from the emulation step. Next, the simulation phase is launched with workloads and resources different than the tested configurations during the emulation phase. During this phase, the execution time of a submitted applica-tion is estimated based on extrapolation of concurrency level and parallelism multiplied by the allocated CPU capacity.

Simulation results showed significant accuracy for the application model provided by emulation. In a real world experiment, the difference by comparison with the response time of EMUSIM simulation was between 3% and 7%. Alter-natively, in a real cloud environment (AWS), the application model displayed slightly less accuracy especially when the concurrency level increases. The designers of EMUSIM argue that this is due mainly to the black-box nature of the provider infrastructure (hardware configuration, intra-net-work latency and multi-tenancy). Nonetheless, the model is sufficiently accurate from this perspective as well.

EMUSIM offers the SaaS provider a significant insight into how the application will cope under various workloads and how much resources should be leased. As a result, the cloud provider and customer can agree on how the resources attributed to the application will adjust dynamically. The authors of EMUSIM also predict that future SLA advances might enforce hardware disclosure which empowers accu-rate emulation/simulation further. However, the applica-tion model in EMUSIM is limited to a Bag of Tasks (BoT) model. Besides, the simulation results only showed the accu-racy of the emulation phase when requests are submitted in bursts/simultaneously. Further investigation must be done to evaluate the resulting application model under random workload.

CloudExp

CloudExp [99] is yet another CloudSim extension that introduces a GUI to define the cloud scenario and network topology. The two main features of this extension are the Rain workload generator [100] and the implementation of the MapReduce application model. Usually, workload gen-eration is coupled with the request execution which makes generating realistic workload pattern unfeasible. Because, the workload generation thread must wait for the submitted request to be served before submitting the subsequent ones. Instead, the designers of the Rain workload generator sepa-rate the submission of requests from their execution and use multiple threads for each user submitting request and sepa-rate threads to handle the collection of the requests execu-tion results. Thus, any pattern of workload can be generated through this tool.

Page 24: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 24 of 34

SN Computer Science

For SLA definition, CloudExp includes a visual inter-face to determine the agreed demands (service availability, network performance, maximum number of simultaneous users, etc.). Furthermore, a MapReduce application model is included as the Map and Reduce tasks are defined through inheritance from the Cloudlet class. The Master class encap-sulates all Map and Reduce tasks scheduled while memo-rizing their status and their tasks/output files placement. The workload scenario defined by the user must contain the file size input/output and task length for each Map/Reduce. Once all Map tasks are finished, the Master object feeds the output files to the Reduce tasks. Afterwards, the result is assembled once all Reduce tasks are finished. Besides, from a network topology point of view, CloudExp also supports popular cloud network topologies (e.g. BCube [101] and DCell [102]).

GloudSim

Written in java, GloudSim [103] is built to simulate task behaviour based on Google trace and to enable researchers to study cloud application check-pointing for fault tolerance support. Two simulation profiles are available in GloudSim: (a) numerical, (b) real system setting. Numerical mode is executable in a single computer to simulate applications behaviour based on Google trace. On the other hand, real system setting enables the user to run the simulation in a cluster where each task is represented by a thread executed in a slave host.

In real system settings mode, each slave host serves in a Network File System (NFS). Thus, when a checkpoint opera-tion is executed, a slave host is selected randomly to save the checkpoint which limits I/O congestion. In the server side, the job emulator creates the corresponding task to be executed based on the trace events (kill, fail, etc.) which are sent to the slave host. In the slave host end, multiple VMs are deployed to emulate the received tasks where each VM is equipped with: (a) task processor, (b) memory sensor, (c) research algorithm (task check-point policy). The task processor is in charge of executing/restarting the tasks based on the events specifications of the server as well as per-forming periodic check-pointing. To avoid over-allocation of memory, which leads to VM crashes, the VM remaining memory value is synchronised between the VM sensor and the server collected value by taking the minimum between the two values.

Unlike other simulators, GloudSim puts much more details into simulating task check-pointing since Google trace shows that task failures are very frequent. At the server level, a Task Failure Monitor is responsible of checking jobs that have failed and trigger a restart procedure when a failure transpires. Hence, the slave host is notified to restart the task from its last checkpoint. Furthermore, the Resource Monitor

at the server level is responsible of monitoring the host/VM aliveness state and their memory. In case a VM/host fails, the tasks running on the corresponding instance are migrated to another VM/host.

Since check-pointing is the process of saving the progress (memory values) of a job, this process must be optimised to limit its overhead. Through analysis of Google trace, the designers of GloudSim illustrated that jobs with low pri-ority tend to fail much more frequently due to resources contention with jobs having higher priorities. Based on this observation, the authors proposed to compute a check-point interval depending on the Mean Time Between Failures, task length and priority. Additionally, the authors explain that memory over-allocation is the main cause of VM failure. Thus, the designers focused on memory allocation instead of CPU usage. For memory allocation, the authors exhibit that the memory size of a Java program can be controlled by the size of the file loaded by it. Hence, this approach is used to simulate memory allocation. Further, a task is simu-lated with a while-loop thread while accounting for the 3% increase in the length of the task due to the additional sleep instructions. Same, the checkpoint overhead is emulated through a while-loop thread.

CEPSim

Due to the limited application model presented in most CloudSim extensions, CEPSim [104] was proposed to model Complex Event Processing (CEP) queries for data streams and BigData. Unlike the traditional fixed size Cloudlet, the CEP queries are implemented to run continuously and pro-cess incoming events. To represent any sort of CEP query, a Directed Acyclic Graph (DAG) is used. A user query is com-posed of a network of vertices where each vertex has input and output event set queues. Each vertex contains an ipe (Instructions Per Event) value which represents how many instructions need to be executed to process a single event. Moreover, a link between two vertices has a selectivity parameter that determines how many events are sent to the following vertex. Also, the user defines a queue maximum capacity and a backpressure procedure can be implemented to adjust the selectivity of the link towards overloaded input queues.

Communication between vertices is made by exporting an event set from the output queue of one vertex to the input queue of the successor vertex. Vertices containing incoming and outgoing links, called Operators, are of two types: (a) stateless operators, (b) windowed operators. Stateless opera-tors process events ordinarily whereas windowed operators combine/aggregate a list of events accumulated during a given time frame. Vertices from one or multiple queries are deployed in a VM as a Placement instance. This enables the repartition of a single query on multiple VMs. The way

Page 25: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 25 of 34 249

SN Computer Science

vertices of a query are deployed in VMs can be either speci-fied manually by the user or by implementing a mapping procedure.

Event generation is done at the entry of the DAG topol-ogy either uniformly (constant number of events is generated after each time frame) or uniformly increasing (the number of events generated increases after each time frame until it reaches a limit then becomes constant). During Operator scheduling and simulation, the number of instructions in the query is divided amongst the vertices. The allocation of instructions to vertices is done either uniformly or with a weighted approach based on the ipe of each vertex. In each simulation tick and for each vertex, the number of events processed is the minimum between the number of instruc-tions allocated divided by the ipe, and the total number of input events. Subsequently, the events are de-queued in a balanced way from the predecessors’ output queues. To do so, the number of de-queued events from each predecessor queue is proportional to its queue size. The latency of the de-queued event set is updated based on the current time and then the resulting events are en-queued in each successors input queues based on the links selectivity. Particularly, a windowed operator accumulates event sets of the predeces-sors in separate slots. Once the event accumulation window closes, the gathered events from each predecessor are pro-cessed with a given combination function and en-queued

in the successors input queues. The chart in Fig. 5 summa-rises how CEPSim updates the progression of a distributed application.

As a Placement containing multiple vertices from poten-tially different queries is considered as the processing unit (same as Cloudlet), the time slot allocated to it (simulation tick length) is multiplied by the VM processing capacity to infer the total number of instructions distributed amongst the vertices. Accordingly, during the CepQueryCloudlet execu-tion, the PlacementExecutor class is invoked to execute the Operators in a parallel fashion depending on their share of instructions. Moreover, events sets transferred to a different host are subjected to network delay.

To validate CEPSim, a comparison is made with real world experiments processing Powersmiths’ WOW [105] feed using Apatche Storm CEP for two types of queries. During the experiments the number of injected events is increased gradually to compare latency and throughput (quantity of events per second reaching the back end ver-tices of the query, known as consumers) between the real experiments and the simulation results. Generally, the simu-lation results are very close to the empirical values. How-ever, under a high rate of input events (22,500 event/sec), the latency in CEPSim simulation is much lower than the real experiment. This result is due to the difference between the management strategies of overloaded input queues.

Fig. 5 Vertices execution in a placement

Page 26: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 26 of 34

SN Computer Science

In a separate series of simulations, latencies of four deployed queries, subjected to various placement scenar-ios, are compared for real world experiments and CEPSim simulations. Same, the results of the simulations and the empirical values are close as well. From a performance per-spective, simulation time and memory overhead are meas-ured under various numbers of queries/VMs. With 10,000 queries running simultaneously, CEPSim can execute the simulation scenario in 7 min using less than 1 GB of RAM. Finally the effect of simulation tick length on the simula-tion accuracy is illustrated. As the simulation tick length increases, the number of Placement executions invoked are lowered which decreases the simulation time. However, the event sets transferred between vertices is accumulated over long periods which translate in less accurate modelling of interactions between vertices. This accuracy degradation is especially highlighted when two vertices situated in two hosts communicate as the destination vertex can be already executed by the time the new event set is en-queued in its input queue. For the same reasons, when the simulation tick length decreases the simulation accuracy and simulation time are increased.

BigDataSDNSim

BigDataSDNSim [106] combines CloudSimSDN and IoT-Sim [107] to implement MapReduce application model running on top of an SDN-enabled network. This simulator is organised in a layered architecture where new modules extending existing CloudSim classes are stacked. Users interested by basic features of BigDataSDNSim must define only the DC architecture and resources (network and host layout/capabilities) in JSON format while the MapReduce application specifications (number and sizes of mappers/reducers as well as the size of the transmitted data) are defined in a CSV format. Storage Area Network (SAN) is simulated as a host transferring data to mappers or receiving data after completion from the reducers. At network layer, a central controller class is implemented with a default Dijk-stra routing algorithm.

Similarly to previous extensions of CloudSim, newly added classes implement SimEntity to fire events. To man-age the application scheduling and resources scaling, a new ApplicationMaster class is added. The original Datacenter-Broker from CloudSim is extended in the RessourceMan-ager class which analyses the remaining resources before deploying the user application. A NodeManager class is cou-pled with each host to inform the RessourceManager about the slave node resources/status. Same, BigDataTask class expands the Cloudlet class to differentiate between mappers and reducers whilst adding a task identifier to enable com-munication between tasks.

At network level, the SDNController class extends the NetworkOperatingSystem from CloudSimSDN. This class implements the behaviour of a central controller in an SDN-enabled network. In addition to the features already defined in CloudSimSDN, this implementation enables traffic shap-ing and transfer prioritisation through TrafficPolicy class. Furthermore, NetworkNIC interface is implemented by net-work components to make them manageable by the control-ler as in an SDN southbound API. Like CloudSimSDN, data transfer duration is estimated as the quantity of data divided by the minimum bandwidth amongst the used Links objects. As a result, the total completion time of an application is estimated as the sum of: (a) the maximum data transfer time from the SAN to the mappers, (b) the maximum time taken to complete a mapper task, (c) the maximum time taken by a reducer task. To complete this paper, the authors conducted a series of simulations to illustrate the benefit of applying SDN techniques on: (a) job completion time, (b) transfer duration, (c) power consumption.

IoTSim‑Stream

IoTSim-Stream [108] is an extension of CloudSim designed to model big data applications processing continuous streams across a multicloud environment. First, the Cloud-let class is extended in a ServiceCloudlet which represents a part of a service that can be launched in parallel. The application is modelled by its computational requirements and a DAG composed of instances from the Service class. Each Service has pointers towards one or multiple parent/child Services where child Services receive data for fur-ther processing and parent Services produce the data to be processed. A Service itself is composed of one or multiple ServiceCloudlets that can be deployed in separate VMs. The authors of this simulator add the GraphAppEngine module to deploy a distributed application parsed from an XML file in VMs potentially running in different DCs.

Moreover, an abstract VMOffers class is added to rep-resent the different types of available VMs for each pro-vider. The original VM class is inherited in SVM (Stream VM) class which adds an input and output queue for the latter. Furthermore, the data queued and processed by the ServiceCloudlet is represented by a Stream class. Like in CloudSimSDN, Channel class represents the transmission medium between SVMs, whether they belong to the same DC or are hosted by different providers. StreamTransmission on the other hand represents a transmission between two SVMs. In addition to an abstract Polcy class that enables the user to define how a ServiceCloudlet is scheduled in SVMs, the SimpleSchedulingPolicy is implemented as a default scheduling strategy. In a simulation scenario with two DCs, the developers compare the amount of data processed by simple three nodes graph application by comparison with a

Page 27: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 27 of 34 249

SN Computer Science

Table 2 Summary of the discussed cloud simulators

Simulator Features VM/PM model Network model Power consumption Application model Resource scheduler model

CloudSim Federated cloud/intra cloud model-ling

Yes Yes Abstract Yes CPU and Network

iCanCloud Detailed application model

Flexible hardware resources model

Only VM Yes No Detailed Full

Cloud2Sim Distributed execu-tion of CloudSim scenarios

Auto-scalability of the cluster running the simulation

Yes Yes Abstract Yes CPU and Network

GroudSim General Grid/cloud modelling

Job failure simula-tion

Yes Limited No Limited CPU

SimIC Federated cloud modelling

Yes Limited Limited No CPU

DesktopCloud Sim Simulation of com-munity cloud vola-tility and random crashes

Yes Yes Abstract Yes CPU and Network

iFogSim Fog network modelNetwork topology

editor

Yes Yes Abstract Detailed CPU and Network

CloudAnalyst Temporal/geograph-ical distribution of users’ requests/DCs

Internet modelling

Yes Internet model Abstract Limited CPU

CloudSimScale Distributed execu-tion of CloudSim scenarios.

Yes Yes Abstract Yes CPU and Network

DFaaSCloud Model of FaaS cloud

Cost and perfor-mance evaluation for function execu-tions

Network topology editor

Yes Limited Abstract Limited CPU and Network

SPECI Hosts status super-vision trough communication protocol

Only PM Limited No No None

CRest DC supervision for multiple perfor-mance aspects

MVC architecture

Yes Limited Yes Yes Resources reservation

CloudSim SDN SDN management of the DC compo-nents

Yes Yes Abstract Yes CPU and Network

MDCSim Power consump-tion modelling and performance estimation

Only PM Yes Yes Limited CPU and Network

Page 28: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 28 of 34

SN Computer Science

Table 2 (continued)

Simulator Features VM/PM model Network model Power consumption Application model Resource scheduler model

GreenCloud Accurate network model

Accurate energy consumption for intra-cloud com-munication

Only PM Packet-Level Detailed Limited CPU and Network

GDCSim Heat recirculation modelling

Interplay between power consump-tion and heat generation

Server throttling-down events

Only PM No Detailed Limited CPU

E-mc2 Detailed/customiz-able power con-sumption model for all hardware components

Yes Yes Detailed Detailed Full

DCworms Provides multiple power consump-tion models

Each component can be attached to a pluggable power consumption profile

Only PM No Detailed Limited CPU

CloudNetSim ++ Accurate network model.

Accurate energy consumption for intra-cloud com-munication

Only PM Packet-Level Detailed Limited CPU and Network

CloudSim Disk Detailed HDD model

HDD Energy con-sumption model

Yes Yes Concrete HDD power model

Detailed CPU, Network and HDD

DISSECT-CF Highly detailed resource schedul-ing

Yes Yes Yes Detailed CPU, Network and RAM

CloudReport Easy extensibility through a plug-gable model

Yes Yes Yes Yes CPU and Network

DCSim VMPP performance analysis

Yes Limited Limited Limited CPU

CloudShed Multi criteria VM placement

Yes No Yes No Resources reservation only

SimGrid VM Live VM migration model

Yes Yes No Yes CPU, Network and RAM

VMPlaces Dynamic VM provi-sioning

Yes Yes No Yes CPU, Network and RAM

Dynamic CloudSim VM performance changes model in public cloud

Yes Yes Abstract Yes CPU, Network and HDD

ATAC4Cloud Introduction of an intelligent auto-management layer

Yes Yes Abstract Yes CPU and Network

Page 29: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 29 of 34 249

SN Computer Science

Table 2 (continued)

Simulator Features VM/PM model Network model Power consumption Application model Resource scheduler model

Nutshell Packet level cloud simulator

Easy simulation setup

Yes Detailed No Yes CPU and Network

CMCloudSim VM cost estimation. Yes Yes Abstract Yes CPU and NetworkGPUCloudSim GPU VM schedul-

ing modelDefines over-

head caused by resources sharing at GPU level

Yes Detailed Yes Detailed GPU, CPU and Network

Bazzar-Extension Negotiation heuris-tic implementation

Yes Yes Abstract Yes CPU and Network

SimGrid Cloud Broker

Customizable pric-ing model

Yes Yes No Yes Full

GloudSim Task emulationDistributed simula-

tionApplication check-

pointing simula-tion

No No No Emulation CPU and RAM

EMUSim Accurate applica-tion performance estimation under different levels of concurrencies/parallelism

Yes Yes Abstract Emulation CPU and Network

Network CloudSim Extended network model for Cloud-Sim

Tasks communica-tion

Yes Yes Abstract Yes CPU and Network

SmartSim Smart device/mobile application model

Yes Limited Abstract Yes CPU, Network and RAM

CloudExp Mapreduce applica-tion model.

SLA/customizable workload pattern definition

Advanced network topology models

Yes Yes Abstract Yes CPU and Network

WorkflowSim Workflow applica-tion model

Clustering overhead inclusion for bet-ter performance estimation

Yes Yes Abstract Yes CPU and Network

CEPSim Big data/stream processing model

Yes Yes Abstract Detailed CPU and Network

BigDataSDNSim Application model for Mapreduce

Includes SDN net-work model

Yes Yes Abstract Yes CPU and Network

IoTSim-Stream Application model for stream process-ing

Yes Yes Abstract Yes CPU and Network

Page 30: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 30 of 34

SN Computer Science

theoretical model. The produced results are nearly identical with the expected values. Furthermore, the scalability of IoTSim-Stream is illustrated in a series of tests investigat-ing the influence of the simulated time and size of the data streams on the execution time. The results show that the execution time scales linearly with the simulated time whilst increasing the size of the processed data streams increases the execution time slightly.

Comparison and Discussion

None of the aforementioned simulators offers a complete model of the cloud as it is too complex to implement and can probably violate a primary purpose of simulation: time gain. However, each simulator presents an advantage in a given orientation and some simulation tools are relatively more complete than others. For general CC modelling, Cloud-Sim offers a flexible platform to model the cloud. But, due to the numerous extensions proposed for this toolkit, it’s theoretically possible to combine elements from various implementations depending on the requirements which has been done in BigDataSDNSim. For instance, all Network-CloudSim classes (NetworkCloudlet, Switch, Host/Network Packet, etc.) have been included in subsequent releases of CloudSim. On the other hand, using implementations from different extensions is not straightforward and requires addi-tional programming efforts. Conversely, iCancloud offers a rich cloud model (except for physical server modelling) but isn’t widely extended.

From a power consumption model perspective as well as middleware provisioning, the most complete toolkit is GDCSim. Though this simulator offers fewer details from a general cloud model perspective, it’s founded on two very important concepts which are neglected by its counterparts: (a) air conditioning power consumption, (b) dynamic ther-mal effects. As stated in most of the studies oriented towards limiting the operational cost of the DCs [109], the energy consumed by the air conditioning system constitutes a sig-nificant portion of the provider power bill. Some simula-tors consider the effect of heat generation but the latter is predominantly modelled as a static value. Alternatively, GDCSim models all elements affecting the temperature (heat recirculation and power consumed by the air conditioning system) while considering the event of throttling-down. Other than GDCSim, E-mc2 provides the most detailed power consumption model for a server with all compo-nents accounted for (CPU, PSU, RAM and HDD). Simi-larly, DCworms provides a flexible framework where each resource can be associated with a power consumption model.

Relatively, the most accurate application model is the one proposed in EMUSim. The effect of different levels of concurrency and parallelism are combined to predict how

the application will respond rather than trying to model the resources sharing at high accuracy. Plus, due to the black box nature of the cloud infrastructure and multi-tenancy, the approach proposed by EMUSim is simple and provides enough accuracy. Moreover, from a VM provi-sioning perspective, SimGrid VM and VMPlaces are the most accurate as they model live VM migration which is a very important process when maintaining SLA is in stake.

Performance wise, most of the cloud simulators are implemented in Java which hiders their scalability and performance. Though implementing the simulation frame-work in Java has several advantages (portability, ease to extend, etc.), an implementation in C++ improves the simulator scalability. Therefore, though iCanCloud offers a more detailed simulation framework by comparison with CloudSim, iCancloud outperforms the latter when the cloud topology expands. On the other hand, executing the simulation in a cluster, as in Cloud2Sim and Cloud-SimScale, is the best way to run large scenarios rapidly. Conversely, CloudSim has the upper hand from a flexibil-ity perspective. For instance, it’s possible to implement a definition of the PowerModel and set it up in a PowerHost, without modifying the core CloudSim code, through poly-morphism. Table 2 is a summary of the features presented by the cloud simulators discussed in this paper.

Conclusion

This work presents how the existing cloud simulators model different processes and features. As stated, each was designed for a specific objective and none presents an exhaustive model. Currently, the most used simula-tor is CloudSim because of its flexibility. On the other hand, for developers specifically interested in workload consolidation and power consumption, more appropri-ate solutions are currently available (SimGrid VM and DCworms/GDCSim). Same, for large scenarios that cannot be launched in a single host, running said simulation in a distributed environment is a valuable option (Cloud2Sim and CloudSimScale).

Author Contributions The author conducted the entire survey solely. The proposed work encompasses collecting and reviewing papers as well investigating simulators architectures.

Funding The author received no funding to accomplish this work.

Availability of Data and Material Most of the simulators discussed in this paper are open source. Some simulators are proprietary such as MDCSim.

Page 31: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 31 of 34 249

SN Computer Science

Compliance with Ethical Standards

Conflicts of interest/Competing interests The authors declare that they have no conflict of interest.

References

1. Packet Tracer Website. www.cisco .com/web/learn ing/. Accessed 10 July 2017.

2. Network Simulator 2 Website. http://www.isi.edu/nsnam /ns. Accessed 20 Sep 2019.

3. Network Simulator 3 Website. https ://www.nsnam .org/. Accessed 18 Sep 2019.

4. OMNeT ++ Website. https ://omnet pp.org/. Accessed 20 Mar 2019.

5. Levis P, Lee N, Welsh M, Culler D. TOSSIM: Accurate and scal-able simulation of entire TinyOS applications. In: Proceedings of the 1st international conference on Embedded networked sensor systems, 2003, https ://doi.org/10.1145/95849 1.95850 6.

6. Montresor A, Jelasity M. PeerSim: A scalable P2P simulator. In: IEEE Ninth International Conference on Peer-to-Peer Computing (P2P’09), 2009, https ://doi.org/10.1109/P2P.2009.52845 06.

7. Buyya R, Murshed M. Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurr Comput Pract Exp. 2002. https ://doi.org/10.1002/cpe.710.

8. Legrand A, Marchal L, Casanova H. Scheduling distributed applications: the simgrid simulation framework. In: Proceedings of 3rd IEEE/ACM International Symposium on Cluster Comput-ing and the Grid (CCGrid 2003) IEEE., 2003, May, https ://doi.org/10.1109/CCGRI D.2003.11993 62.

9. Koomey J. Growth in data center electricity use 2005 to 2010. In: A report by Analytical Press, completed at the request of The New York Times, 9, 2011.

10. Liu J, Zhao F, Liu X, He W. Challenges towards elastic power management in internet data centers. In: Proceedings of 29th IEEE International Conference on Distributed Computing Sys-tems Workshops (ICDCS Workshops’ 09), 2009, June, https ://doi.org/10.1109/ICDCS W.2009.44.

11. Kaur G, Kaur K. An adaptive firefly algorithm for load balanc-ing in cloud computing. In: Proceedings of Sixth International Conference on soft computing for problem solving, Springer, Singapore, 2017, https ://doi.org/10.1007/978-981-10-3322-3\_7.

12. Sun DW, Chang GR, Gao S, Jin LZ, Wang XW. Modeling a dynamic data replication strategy to increase system availabil-ity in cloud computing environments. J Comput Sci Technol. 2012;27:256–72. https ://doi.org/10.1007/s1139 0-012-1221-4.

13. Khosravi A, Garg SK, Buyya R. Energy and carbon-efficient placement of virtual machines in distributed cloud data cent-ers. In: European Conference on parallel processing. Springer, Berlin, Heidelberg, 2013, August, https ://doi.org/10.1007/978-3-642-40047 -6\_33.

14. Zhao W, Peng Y, Xie F, Dai Z. Modeling and simulation of cloud computing: a review. In: Cloud Computing Congress (APCloudCC), IEEE, Asia Pacific, 2012, November, https ://doi.org/10.1109/APClo udCC.2012.64865 05.

15. Bahwaireth K, Benkhelifa E, Jararweh Y, Tawalbeh MA. Experi-mental comparison of simulation tools for efficient cloud and mobile cloud computing applications. EURASIP J Inf Secur. 2016. https ://doi.org/10.1186/s1363 5-016-0039-y.

16. Byrne J, Svorobej S, Giannoutakis K, Tzovaras D, Byrne PJ, Ost-berg PO, Gourinovitch A, Lynn T. a review of cloud computing

simulation platforms and related environments. CLOSER. 2017. https ://doi.org/10.5220/00063 73006 79069 1.

17. Fakhfakh F, Kacem HH, Kacem AH. Simulation tools for cloud computing: A survey and comparative study. In: 2017 IEEE/ACIS 16th International Conference on Computer and Infor-mation Science (ICIS), 2017, pp. 221–226. IEEE. https ://doi.org/10.1109/ICIS.2017.79599 97.

18. Makaratzis AT, Giannoutakis KM, Tzovaras D. Energy modeling in cloud simulation frameworks. Future Gener Comput Syst. 2018. https ://doi.org/10.1016/j.futur e.2017.06.016.

19. Mahmud R, Srirama SN, Ramamohanarao K, Buyya R. Qual-ity of Experience (QoE)-aware placement of applications in Fog computing environments. J Parallel Distrib Comput. 2019;132:190–203. https ://doi.org/10.1016/j.jpdc.2018.03.004.

20. Abdel-Basset M, Abdle-Fatah L, Sangaiah AK. An improved Lévy based whale optimization algorithm for bandwidth-efficient virtual machine placement in cloud computing environment. Clust Comput. 2019;22(4):8319–34. https ://doi.org/10.1007/s1058 6-018-1769-z.

21. Alresheedi SS, Lu S, Elaziz MA, Ewees AA. Improved multiob-jective salp swarm optimization for virtual machine placement in cloud computing. Human-centric Comput Inf Sci. 2019;9(1):15. https ://doi.org/10.1186/s1367 3-019-0174-9.

22. Zhang H, Shi J, Deng B, Jia G, Han G, Shu L. MCTE: minimizes task completion time and execution cost to optimize scheduling performance for smart grid cloud. IEEE Access. 2019;7:134793–803. https ://doi.org/10.1109/ACCES S.2019.29420 67.

23. Singh BP, Kumar SA, Gao XZ, Kohli M, Katiyar S. A study on energy consumption of DVFS and Simple VM consolida-tion policies in cloud computing data centers using CloudSim Toolkit. Wirel Pers Commun. 2020. https ://doi.org/10.1007/s1127 7-020-07070 -2.

24. Garg D, Kumar P. Evaluation and improvement of load bal-ancing using proposed cuckoo search in CloudSim. In: Inter-national Conference on advanced informatics for computing research, 2019, pp 343–358, Springer, Singapore, https ://doi.org/10.1007/978-981-15-0108-1_32.

25. Annamalai S, Udendhran R (2019) Role of Cloud Computing in On-Line Social Networking and In-Depth Analysis of Cloud-Sim Simulator. In: Novel Practices and Trends in Grid and Cloud Computing, p. 34–46. IGI Global. https ://doi.org/10.4018/978-1-5225-9023-1.ch003 .

26. Xue F, Su Q. Intelligent task scheduling strategy for cloud robot based on parallel reinforcement learning. Int J Wirel Mobile Comput. 2019;17(3):293–9. https ://doi.org/10.1504/IJWMC .2019.10225 7.

27. Naranjo PGV, Pooranian Z, Shojafar M, Conti M, Buyya R. FOCAN: a Fog-supported smart city network architecture for management of applications in the Internet of Everything envi-ronments. J Parallel Distrib Comput. 2019;132:274–83. https ://doi.org/10.1016/j.jpdc.2018.07.003.

28. Calheiros RN, Ranjan R, Beloglazov A, De Rose CA, Buyya R. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provision-ing algorithms. Softw Pract Exp. 2011. https ://doi.org/10.1002/spe.995.

29. Alves DC, Batista BG, Leite Filho DM, Peixoto ML, Reiff-Marganiec S, Kuehne BT. CM cloud simulator: a cost model simulator module for Cloudsim. In: 2016 IEEE World Congress on Services (SERVICES), 2016, June, https ://doi.org/10.1109/SERVI CES.2016.20.

30. Ostermann S, Plankensteiner K, Prodan R, Fahringer T. Groud-Sim: an event-based simulation framework for computational grids and clouds. In: European Conference on parallel process-ing, Springer, Berlin, Heidelberg, 2010, August, https ://doi.org/10.1007/978-3-642-21878 -1\_38.

Page 32: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 32 of 34

SN Computer Science

31. Blaha P, Schwarz K, Luitz J. A full potential linearized aug-mented plane wave package for calculating crystal properties, WIEN97. Wien: Karlheinz Schwarz, Techn. Universitat Wien, Austria; 1999.

32. Cotton WR, Pielke RA Sr, Walko RL, Liston GE, Tremback CJ, Jiang H, Nicholls ME, Carrio GG, McFadden JP. RAMS 2001: current status and future directions. Meteorol Atmos Phys. 2003. https ://doi.org/10.1007/s0070 3-001-0584-9.

33. Wickremasinghe B, Calheiros RN, Buyya R. Cloudanalyst: a cloudsim-based visual modeller for analysing cloud computing environments and applications. In: Proceedings of 24th IEEE International Conference on advanced information networking and applications (AINA), 2010, April, https ://doi.org/10.1109/AINA.2010.32.

34. Nunez A, Vazquez-Poletti JL, Caminero AC, Castané GG, Carretero J, Llorente IM. iCanCloud: a flexible and scalable cloud infrastructure simulator. J Grid Comput. 2012. https ://doi.org/10.1007/s1072 3-012-9208-5.

35. Romero P, Barderas G, Vazquez-Poletti JL, Llorente IM. Spa-tial chronogram to detect Phobos eclipses on Mars with the MetNet Precursor Lander. Planet Sp Sci. 2011. https ://doi.org/10.1016/j.pss.2011.06.020.

36. Vazquez-Poletti JL, Barderas G, Llorente IM, Romero P. A model for efficient onboard actualization of an instrumental cyclogram for the mars metnet mission on a public cloud infrastructure. in: proceedings of the international workshop on applied Parallel Computing, Springer, Berlin, Heidelberg, 2010, June, https ://doi.org/10.1007/978-3-642-28151 -8\_4.

37. Kathiravelu P, Veiga L. An adaptive distributed simulator for cloud and mapreduce algorithms and architectures. In: Pro-ceedings of the 2014 IEEE/ACM 7th International Conference on utility and cloud computing (UCC), 2014, December, https ://doi.org/10.1109/UCC.2014.16.

38. Sotiriadis S, Bessis N, Antonopoulos N, Anjum A. SimIC: Designing a new inter-cloud simulation platform for integrat-ing large-scale resource management. In: Proceedings of the 2013 IEEE 27th International Conference on Advanced Infor-mation Networking and Applications (AINA), 2013, March, https ://doi.org/10.1109/AINA.2013.123.

39. jFreeChart package. http://www.jfree .org/jfree chart /. Accessed 25 July 2017.

40. Alwabel A, Walters RJ, Wills, G. DesktopCloudSim: simula-tion of node failures in the cloud. In: Cloud Computing 2015: the sixth international conference on cloud computing, GRIDs, and virtualization. Nice, France; 2015.

41. Gupta H, Dastjerdi AV, Ghosh SK, Buyya R. ifogsim: A toolkit for modeling and simulation of resource management techniques in internet of things, edge and fog computing envi-ronments. Softw Pract Exp. 2016. https ://doi.org/10.1002/spe.2509.

42. Elahi B, Malik AW, Rahman AU, Khan MA. Toward scal-able cloud data center simulation using high-level architecture. Softw Pract Exp. 2020;50(6):827–43. https ://doi.org/10.1002/spe.2769.

43. 1516.1-2010. IEEE Standard for modeling and simulation (M&S) High Level Architecture (HLA)–federate interface specification. https ://stand ards.ieee.org/stand ard/1516_1-2010.html. Accessed 04 June 2020.

44. Jeon H, Cho C, Shin S, Yoon S. A CloudSim-Extension for Simulating Distributed Functions-as-a-Service. In: 2019 20th International Conference on parallel and distributed computing, applications and technologies (PDCAT), 2019, pp 386–391, IEEE,. https ://doi.org/10.1109/PDCAT 46702 .2019.00076 .

45. Sriram I. SPECI, a simulation tool exploring cloud-scale data centers. Cloud Comput. 2009. https ://doi.org/10.1007/978-3-642-10665 -1\_35.

46. Buss A. Component based simulation modeling with Simkit. In: Proceedings of the Winter Simulation Conference, 2002, December, https ://doi.org/10.1109/WSC.2002.11728 91.

47. Cartlidge J, Cliff D. Comparison of cloud middleware protocols and subscription network topologies using CReST, the cloud research simulation toolkit-the three truths of cloud comput-ing are: hardware fails, software has bugs, and people make mistakes. In: CLOSER, pp 58–68, May 2013.

48. Son J, Dastjerdi AV, Calheiros RN, Ji X, Yoon Y, Buyya R. Cloudsimsdn: modeling and simulation of software-defined cloud data centers. In: Proceedings of 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Com-puting (CCGrid), 2015, May, https ://doi.org/10.1109/CCGri d.2015.87.

49. Lantz B, Heller B, McKeown N. A network in a laptop: rapid prototyping for software-defined networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, 2010, October, https ://doi.org/10.1145/18684 47.18684 66.

50. Lim SH, Sharma B, Nam G, Kim EK, Das CR. MDCSim: a multi-tier data center simulation, platform. In: Proceedings of IEEE International Conference on Cluster Computing and Work-shops (CLUSTER’09), August 2009. https ://doi.org/10.1109/CLUST R.2009.52891 59.

51. Mesquite Software, CSIM, http://www.mesqu ite.com. Accessed 17 Sep 2017.

52. Wang Z, Zhu X, McCarthy C, Ranganathan P, Talwar V. Feed-back control algorithms for power management of servers. In: Third International Workshop on Feedback Control Implemen-tation and Design in Computing Systems and Networks, 2008, June.

53. Amza C, Cecchet E, Chanda A, Elnikety S, Cox A, Gil R, Mar-guerite J, Rajamani K, Zwaenepoel W. Bottleneck characteriza-tion of dynamic web site benchmarks, TR02-388, Rice Univer-sity, 2002, https ://hdl.handl e.net/1911/96296 . Accessed 16 Sep 2017.

54. Kliazovich D, Bouvry P, Audzevich Y, Khan SU. GreenCloud: a packet-level simulator of energy-aware cloud computing data centers. J Supercomput. 2012. https ://doi.org/10.1007/s1122 7-010-0504-1.

55. Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F. Energy-aware server provisioning and load dispatching for connection-intensive internet services. In: 5th USENIX Symposium on net-worked systems design and implementation, Vol. 8, pp. 337-350, 2008, April.

56. Shang L, Peh LS, Jha NK. Dynamic voltage scaling with links for power optimization of interconnection networks. In: Proceed-ings of the Ninth International Symposium on high-performance computer architecture, (HPCA-9 2003), 2003, February, https ://doi.org/10.1109/HPCA.2003.11835 27.

57. Kurowski K, Oleksiak A, Piatek W, Piontek T, Przybyszewski A, Weglarz J. DCworms—a tool for simulation of energy efficiency in distributed computing infrastructures. Simul Model Pract The-ory. 2013. https ://doi.org/10.1016/j.simpa t.2013.08.007.

58. Bak S, Krystek M, Kurowski K, Oleksiak A, Piatek W, Waglarz J. Gssim–a tool for distributed computing experiments. Sci Progr. 2011. https ://doi.org/10.3233/SPR-2011-0332.

59. Tar- data archiving software, http://www.gnu.org/softw are/tar/. Accessed 12 Dec 2017.

60. Abinit. http://www.abini t.org/. Accessed 12 Dec 2017. 61. vor dem Berge M, Da Costa G, Kopecki A, Oleksiak A, Pierson

JM, Piontek T, Volk E, Wesner S. Modeling and simulation of data center energy-efficiency in coolemall. In: International Workshop on energy efficient data centers, 2012, https ://doi.org/10.1007/978-3-642-33645 -4\_3.

62. Castané GG, Nunez A, Llopis P, Carretero J. E-mc2: a formal framework for energy modelling in cloud computing. Simul

Page 33: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249 Page 33 of 34 249

SN Computer Science

Model Pract Theory. 2013. https ://doi.org/10.1016/j.simpa t.2013.05.002.

63. Sa TT, Calheiros RN, Gomes DG. CloudReports: An extensi-ble simulation tool for energy-aware cloud computing environ-ments. In: Cloud Computing, Springer International Publish-ing, 2014. https ://doi.org/10.1007/978-3-319-10530 -7\_6.

64. Malik AW, Bilal K, Aziz K, Kliazovich D, Ghani N, Khan SU, Buyya R. Cloudnetsim ++: a toolkit for data center simula-tions in omnet ++. In: Proceedings of the 2014 11th Annual High-capacity Optical Networks and Emerging/Enabling Tech-nologies (HONET), 2014, December, https ://doi.org/10.1109/HONET .2014.70293 71.

65. Bilal K, Khan SU, Madani SA, Hayat K, Khan MI, Min-Allah N, Kolodziej J, Wang L, Zeadally S, Chen D. A survey on green communications using adaptive link rate. Cluster Com-put. 2013. https ://doi.org/10.1007/s1058 6-012-0225-8.

66. Gupta SK, Banerjee A, Abbasi Z, Varsamopoulos G, Jonas M, Ferguson J, Gilbert RR, Mukherjee T. Gdcsim: a simulator for green data center design and analysis. ACM Trans Model Com-put Simul (TOMACS). 2014. https ://doi.org/10.1145/25530 83.

67. Tang , Mukherjee, T, Gupta SK, Cayton P. Sensor-based fast thermal evaluation model for energy efficient high-perfor-mance datacenters. In: Proceedings of the Fourth International Conference on intelligent sensing and information processing (ICISIP 2006), 2006, October, https ://doi.org/10.1109/ICISI P.2006.42860 97.

68. Louis B, Mitra K, Saguna S, Ahlund C. Cloudsimdisk: Energy-aware storage simulation in cloudsim. In: Proceedings of the 2015 IEEE/ACM 8th International Conference on utility and cloud computing (UCC), 2015, December, https ://doi.org/10.1109/UCC.2015.15.

69. Kecskemeti G. DISSECT-CF: a simulator to foster energy-aware scheduling in infrastructure clouds. Simul Model Pract Theory. 2015. https ://doi.org/10.1016/j.simpa t.2015.05.009.

70. Tighe M, Keller G, Bauer M, Lutfiyya H. DCSim: A data center simulation tool for evaluating dynamic virtualized resource management. In: Network and service management (cnsm), 2012 8th international conference and 2012 workshop on systems virtualiztion management (svm), pp. 385–392, 2012, October.

71. Tian W, Zhao Y, Xu M, Zhong Y, Sun X. A toolkit for mod-eling and simulation of real-time virtual machine allocation in a cloud data center. IEEE Trans Autom Sci Eng. 2015. https ://doi.org/10.1109/TASE.2013.22663 38.

72. Hirofuchi T, Lebre A, Pouilloux L. SimGrid VM: virtual machine support for a simulation framework of distributed sys-tems. IEEE Trans Cloud Comput. 2015. https ://doi.org/10.1109/TCC.2015.24814 22.

73. Clark C, Fraser K, Hand S, Hansen JG, Jul E, Limpach C, Pratt I, Warfield A. Live migration of virtual machines. In: Proceedings of the 2nd Conference on Symposium on networked systems design & implementation-volume 2, USENIX Association, pp. 273–286, 2005, May.

74. Lebre A, Pastor J, Sudholt M. VMPlaceS: a generic tool to inves-tigate and compare VM placement algorithms. In: Proceedings of Euro-Par 2015: parallel processing, 2015, August, https ://doi.org/10.1007/978-3-662-48096 -0\_25.

75. Feller E, Rilling L, Morin C. Snooze: A scalable and autonomic virtual machine management framework for private clouds. In: Proceedings of the 2012 12th IEEE/ACM International Sym-posium on Cluster, Cloud and Grid Computing (ccgrid 2012), IEEE Computer Society, 2012, May, https ://doi.org/10.1109/CCGri d.2012.71.

76. Quesnel F, Lèbre A, Südholt M. Cooperative and reactive sched-uling in large-scale virtualized platforms with DVMS. Concurr Comput Pract Exp. 2013. https ://doi.org/10.1002/cpe.2848.

77. Bux M, Leser U. Dynamiccloudsim: simulating heterogeneity in computational clouds. Future Gener Comput Syst. 2015. https ://doi.org/10.1016/j.futur e.2014.09.007.

78. Schad J, Dittrich J, Quiané-Ruiz JA. Runtime measurements in the cloud: observing, analyzing, and reducing variance. In: Proceedings of the VLDB Endowment, 2010, https ://doi.org/10.14778 /19208 41.19209 02.

79. Dejun J, Pierre G, Chi CH. EC2 performance analysis for resource provisioning of service-oriented applications. In: Proceedings of 2009 Workshops Service-Oriented Computing (ICSOC/ServiceWave), Springer Berlin/Heidelberg, 2010, https ://doi.org/10.1007/978-3-642-16132 -2\_19.

80. Topcuoglu H, Hariri S, Wu MY. Performance-effective and low-complexity task scheduling for heterogeneous com-puting. IEEE Trans Parallel Distrib Syst. 2002. https ://doi.org/10.1109/71.99320 6.

81. Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica, I. Improving MapReduce performance in heterogeneous envi-ronments. In: 8th USENIX Symposium on Operating Systems Design and Implementation, Vol. 8 No. 4, p. 7, 2008, December.

82. Berriman GB, Deelman E, Good J, Jacob J, Katz DS, Kesselman C, Laity A, Prince TA, Singh G, Su M. Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand. In: Proceedings of the SPIE Conference on Astronomical Tel-escopes and Instrumentation, Glasgow, Scotland, 2004, https ://doi.org/10.1117/12.55055 1.

83. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014. https ://doi.org/10.1093/bib/bbs08 6.

84. Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wright NJ. Performance analysis of high performance computing applications on the amazon web services cloud. In: IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), 2010, November, https ://doi.org/10.1109/Cloud Com.2010.69.

85. Chainbi W, Chihi H, Azaiez M. ATAC4Cloud: a framework for modeling and simulating autonomic cloud. Soft Comput. 2017. https ://doi.org/10.1007/s0050 0-016-2451-0.

86. Rahman UU, Bilal K, Erbad A, Khalid O, Khan SU. Nut-shell—simulation toolkit for modeling data center networks and cloud computing. IEEE Access. 2019;7:19922–42. https ://doi.org/10.1109/ACCES S.2019.28947 25.

87. Siavashi A, Momtazpour M. GPUCloudSim: an extension of CloudSim for modeling and simulation of GPUs in cloud data centers. J Supercomput. 2019;75(5):2535–61. https ://doi.org/10.1007/s1122 7-018-2636-7.

88. Hu Q, Shu J, Fan J, Lu Y. Part-time performance estimation and fairness-oriented scheduling policy for concurrent GPGPU appli-cations. In: 2016 45th International Conference on Parallel Pro-cessing (ICPP), 2016, pp. 57–66, IEEE, https ://doi.org/10.1109/ICPP.2016.14.

89. Desprez F, Rouzaud-Cornabas J. SimGrid Cloud Broker: simu-lating the Amazon AWS Cloud (Doctoral dissertation, INRIA), 2013. https ://hal.inria .fr/hal-00909 120/. Accessed 05 Sep 2017.

90. Pittl B, Mach W, Schikuta, E. Bazaar-extension: A cloudsim extension for simulating negotiation based resource allocations. In: Proceedings of the 2016 IEEE International Conference on Services Computing (SCC), 2016, June, https ://doi.org/10.1109/SCC.2016.62.

91. Bonacquisto P, Di Modica G, Petralia G, &Tomarchio O. A strategy to optimize resource allocation in auction-based cloud markets. In: Proceedings of the 2014 IEEE International Con-ference on Services Computing (SCC), 2014, June, https ://doi.org/10.1109/SCC.2014.52.

Page 34: A Survey on Cloud Computing Simulation and Modeling

SN Computer Science (2020) 1:249249 Page 34 of 34

SN Computer Science

92. F(X)yz by Birdasaur.http://birda saur.githu b.io/FXyz/. Accessed 28 Sep 2017.

93. Garg SK, Buyya R. Networkcloudsim: Modelling parallel appli-cations in cloud simulations. In: Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing (UCC), 2011, December, https ://doi.org/10.1109/UCC.2011.24.

94. Shiraz M, Gani A, Khokhar RH, Ahmed E. An extendable simu-lation framework for modeling application processing potentials of smart mobile devices for mobile cloud computing. In: Pro-ceedings of the 2012 10th International Conference on Frontiers of Information Technology (FIT), 2012, December, https ://doi.org/10.1109/FIT.2012.66.

95. Chen W, Deelman E. Workflowsim: A toolkit for simulating scientific workflows in distributed environments. In: Proceed-ings of the 2012 IEEE 8th International Conference on E-Sci-ence (e-Science), 2012, October, https ://doi.org/10.1109/eScie nce.2012.64044 30.

96. Calheiros RN, Netto MA, De Rose CA, Buyya R. EMUSIM: an integrated emulation and simulation environment for modeling, evaluation, and validation of performance of cloud computing applications. Softw Pract Exp. 2012. https ://doi.org/10.1002/spe.2124.

97. Calheiros RN, Buyya R, De Rose CAF. Building an automated and self-configurable emulation testbed for grid applications. Softw Pract Exp. 2010. https ://doi.org/10.1002/spe.964.

98. Emeakaroha VC, Calheiros RN, Netto MAS, Brandic I, De Rose CAF. DeSVi: an architecture for detecting SLA violations in cloud computing infrastructures. In: Proceedings of the 2nd International ICST Conference on Cloud Computing (Cloud-Comp’10), 2010.

99. Jararweh Y, Jarrah M, Alshara Z, Alsaleh MN, Al-Ayyoub M. CloudExp: a comprehensive cloud computing experimen-tal framework. Simul Model Pract Theory. 2014. https ://doi.org/10.1016/j.simpa t.2014.09.003.

100. Beitch A, Liu B, Yung T, Griffith R, Fox A, Patterson DA. Rain: A workload generation toolkit for cloud computing applications. University of California, Tech. Rep. UCB/EECS-2010-14, 2010.

101. Guo C, Lu G, Li D, Wu H, Zhang X, Shi Y, Tian C, Zhang Y, Lu S. BCube: a high performance, server-centric network

architecture for modular data centers. ACM SIGCOMM Comput Commun Rev. 2009. https ://doi.org/10.1145/15925 68.15925 77.

102. Guo C, Wu H, Tan K, Shi L, Zhang Y, Lu S. Dcell: a scalable and fault-tolerant network structure for data centers. In: ACM SIGCOMM Computer Communication Review, 2008, August, https ://doi.org/10.1145/14029 58.14029 68.

103. Di S, Cappello F. GloudSim: Google trace based cloud simula-tor with virtual machines. Softw Pract Exp. 2015. https ://doi.org/10.1002/spe.2303.

104. Higashino WA, Capretz MA, Bittencourt LF. CEPSim: mod-elling and simulation of complex event processing systems in cloud environments. Future Gener Comput Syst. 2016. https ://doi.org/10.1016/j.futur e.2015.10.023.

105. Powersmiths, Powersmiths WOW—build a more sustainable future. http://www.power smith swow.com/. Accessed 11 Aug 2017.

106. Alwasel K, Calheiros RN, Garg S, Buyya R, Ranjan R (2019) BigDataSDNSim: a simulator for analyzing big data applica-tions in software-defined cloud data centers. arXiv preprint arXiv :1910.04517 .

107. Zeng X, Garg SK, Strazdins P, Jayaraman PP, Georgakopoulos D, Ranjan R. IOTSim: a simulator for analysing IoT applications. J Syst Architect. 2017;72:93–107. https ://doi.org/10.1016/j.sysar c.2016.06.008.

108. Barika M, Garg S, Chan A, Calheiros RN, Ranjan R. IoTSim-stream: modelling stream graph application in cloud simula-tion. Future Gener Comput Syst. 2019;99:86–105. https ://doi.org/10.1016/j.futur e.2019.04.004.

109. Moore JD, Chase JS, Ranganathan P, Sharma RK. Making Scheduling “Cool”: temperature-aware workload placement in data centers. In: USENIX Annual Technical Conference, General Track, 2005, Apr, pp 61–75.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.