resourceallocation ina network-based cloud computingenvironment

2 | Page

ResourceAllocationinA Network-based Cloud ComputingEnvironmentResearch Proposal)

Table of ContentsI. Chapter-1 Introduction3

II. Chapter-2 Research Problem 1: Resource Allocation in a Network Based Cloud Computing Environment

2.1- Related Work...5

2.1.1 Efforts with a focus on a Data Center Processing Resources5

2.1.2 Efforts with a focus on a Data Center Network Resources .5

2.2- Motivation.6

2.2.1 A comprehensive Solution for Network/Processing Resource Allocation6

2.3- Network Aware Resource Allocation Methodology and design challenges2.3.1 Research Strategy/Methodology2.3.2 Methodology for External Challenges2.3.3 Methodology for Internal Challenges. 2.4- Objective..2.4.1 Static Case..2.4.2 Dynamic Case.2.5- Preliminary Results.. 2.5.1 Detailed Formulation78891111121212

2.5.2 Client Request Types12

III. Chapter-3 Research Problem 2: Energy Efficient Network Based Resource Allocation

3.1- Related Work14

3.2- Motivation.15

3.2.1 A Comprehensive Solution for Energy Efficient RA 15

3.3- Energy Efficient Network: Methodology and Challenges.16

3.3.1 Research Strategy/Methodology .16

3.3.2 Common Solutions and tradeoffs.3.3.3 Energy Consumption Vs Optimal Performance1718

3.3.4 Idle State as a Major Source of Wasted Power.18

3.4- Objective3.4.1 Static Case..1818

3.4.2 Dynamic Case.19

IV. Bibliography

Chapter 1

Introduction

Cloud computing is an increasingly popular computing paradigm, now proving a necessity for utility computing services. Several providers have cloud computing (CC) solutions available, where a pool of virtualized, and dynamically scalable computing power, storage, platforms, and services are delivered on demand to clients over the Internet in a pay as you go manner. This is implemented using large data centers (DCs) where thousands of servers reside. Clients have the choice between using private clouds which are data centers specialized for the internal needs of a certain business organization and public clouds which are open over the Internet to the public for use on a pay as you go or pay per demand manner. A practice that is getting more momentum is applying "surge computing" concepts in which clients use their private clouds mainly and outsource tasks to public clouds only when private clouds are saturated [32][1]. The plethora of clients moving to the cloud partially or fully has attracted IT leading providers with solid computation technology and a software base like Google, Microsoft and Amazon to implement diverse solutions. Services are offered under several deployment models, Infrastructure As A Service (IAAS) , Platform As A Service (PAAS) , Software As A Service (SaaS) and Network As A Service (NAAS) which is sometimes referred to as Software Defined Networks (SDN). [18] Each provider offers a unique service portfolio with a range of options that include Virtual Machines (VMs) instance configuration, nature of network services, degree of control over the rented machine, supporting software/hardware security services, additional storage, etc. More recently, the emphasis has shifted to more comprehensive solutions. To move completely to the cloud, clients demand guarantees with regards to achieving the required improvements in scale, cost control and reliability of operations. Despite its importance, providing computation power alone is not sufficient as a competitive advantage. Other factors have gained more weight recently such as the networking solution offerings. The network performance and resource availability could be the tightest bottleneck for any cloud. As proven in [1], an under-performing network could delay applications with heavy data requirements to the degree that sending data using a mail courier would be a more viable solution. This is seen as an opportunity for network service providers as a lot of them are planning and building their own clouds using distributed cloud (D-Cloud) architecture [31].

Here we see the need for a comprehensive resource allocation and scheduling system for cloud computing data center networks. This system would handle all the resources in the cloud providers data center network and would manage client requests, dictate resource allocation, ensure network QoS conditions, and eliminate performance hiccups. This system would execute the mentioned tasks while minimizing the service provider cost and controlling the level of consumed energy. The diversity in instance types in multiple geographically distributed data centers makes resource management even more of a complicated matter. However , the resource management of the data centers servers and the network resources while scheduling and serving tens of thousands of client requests on virtual machines residing on data center servers, is a critical success factor. First, it is a main revenue source to the service provider as excess resources translate directly to revenue. Second, it is a key point that will make or break potential clients decision to move fully to the cloud.Chapter 2

Research Problem I:Resource Allocation in A Network-based Cloud Computing Environment

In the First problem we would like to propose a model for Network based cloud computing environments. This consists of a mixed network of public and private clouds where clients have the freedom to use public cloud resources on demand. Clients have the option to reserve Virtual machines of multiple types where the types are based on the functionality or primary application or use of the VMs (high memory VMs, High CPU VMs, etc). Clients also have the ability to request data connections to move data between their private clouds, from a private cloud to a public cloud or in the other direction. For connection requests, clients define the connection requirements like source, destination, requested start time, duration, and performance or QoS constraints. This can be in an advance reservation manner (i.e. requested connection start time is in the future) or an immediate reservation manner (I.e. requested connection start time is equal to the request arrival time or as soon as the network controller can schedule it). In our work, we aim to couple the resource allocation with the concepts of software defined network (SDN). SDN is a networking paradigm in which the forwarding behaviour of a network element is determined by a software control plane decoupled from the data plane [34]. SDN leads to many benefits such as Increasing network and service customizability, supporting improved operations and increased performance [35] [36]. The software control plane can be implemented using a central network controller. We are proposing that the central controller will handle the task of resource allocation in the data center network. This can be done by directing all the client requests to the SDN controller. The SDN controller will execute the resource allocation algorithms then send the allocation commands across the network.

Figure 2.1: A sample network of private and public clouds connected through the Internet or VPNs

2.1 Related Work 2.1.1 Efforts with a focus on Data Center Processing Resources:

The problem of resource allocation in a CC was discussed before. Multiple models were proposed where resources are scheduled based on user requests. In [3], a queuing model is proposed where a client requests virtual machines for a predefined duration. VM pre-emption one and multiple job queues are considered. Network resources are not considered at all. Jobs are assumed not to communicate with each other or transmit or receive data. No preference is required as to where the VMs are to be scheduled. In [7], an algorithm is proposed to optimally distribute VMs in order to minimize the distance between user VMs in a data center grid. The only network constraint used is the Euclidean distance between data centers. No specific connection requests or user differentiation is used. In the same paper, an algorithm is proposed to schedule VMs on racks, blades and processors within one data center to minimize communication cost. In [6], three consecutive phase queues are used to schedule prioritized job categories .No network topology is used. Rather, only the monetary cost of transmitting data is considered for network requests .The proposed heuristic results are compared to the equal allocation method where resources are divided equally between computation , scheduling and transmission tasks.

2.1.2 Efforts with a focus on Data center Network Resources: In [4], the authors tackle the problem where a client may have multiple jobs being processed at the same time but not necessarily on the same server. These requests are abstracted as a virtual network (VN) where every request (put on a separate VM) is considered a node and the path between two nodes is considered a link (edge) in the VN. The problem then turns into provisioning a virtual network. Also, a revenue maximization objective is introduced. The problem is formulated as a mixed integer linear problem and a genetic algorithm is proposed as a heuristic method to solve the problem. The time factor is not provisioned for since no reservation start time or duration is introduced. Also, Users are assumed to request computation and network resources in one step. The scenario where a user requests more connectivity for an already reserved VM is not considered. In [8], the authors tackle the problem of proposing the best virtual network with IP over wavelength division multiplexing (WDM) network. Only network demands are considered with three different network profile demands evaluated .constraints are based on propagation delay, flow conversion constant and capacity. User demand for processing resources is not considered. The authors target minimizing the network power consumption by minimizing the number of network components that are switched on.

2.2 Motivation

2.2.1 A comprehensive Solution for Network/processing resource allocation

Provisioning for cloud services in a comprehensive way is of crucial importance to any resource allocation model. Any model should consider both computational resources and network resources to accurately represent practical needs. First, excluding the processing (computational) resources while designing the RA model deprives the model of the main and most important cloud service. Cloud data centers are built first and foremost as ways to outsource computational tasks. Any model that optimizes data center resources should include answers to questions like: how are VMS allocated? how are processing resources modeled? what is the resource portfolio that is being promoted to clients? and how are the data center resources distributed physically? [10]. The other side of the coin is networking services. Network services are the backbone of the cloud computation services. As clients ask for tasks to be processed in the data center, they need networking service with adequate QoS standards to send and receive their application data. Network services seem to be getting less attention than necessary and "Bandwidth bottlenecks loom large in the cloud"[40]. In the report prepared by Jim fray [41], only 54% of the IT professionals surveyed about their use of cloud services indicated that they involve network engineering/operations personnel, down from 62% in 2009. [41] This means to directly affect the implementation of network best practices and the attention to the health of overall traffic delivery. Frey mentions that "only 28% of survey respondents believe collecting packet traces between virtual machines for monitoring and troubleshooting is absolutely required. And only 32% felt that collecting data about traffic, from virtual switches for monitoring and troubleshooting is absolutely required". There is a clear lack of insight into how the network is performing.

This oversight does not only affect performance. Bandwidth costs deeply affect the cloud clients' financial structure too. A study performed by the authors of [11] , shows that for a client who downloads a relatively small amount of 10 GB per day (1.83 Mb/s assuming all clients were in north America) he would be charged 30 $ /month when using MS Azure. This is equal to 16$/Mb. While the market price per Mb is around 8$[11]. This margin more than covers for operational network costs for the provider. Therefore, optimizing the bandwidth cost represents an opportunity of profit for providers and an opportunity for saving for clients.

The weight of network resource importance in the cloud market has pushed network service providers [33], to start building their own distributed data centers with a vision to enter the cloud computing market. Their idea is based on replacing a large data center with multiple smaller data centers with the aim of being closer to clients. The report explains: "cloud is about delivering multi-tenant IT services. [Network] Service providers already know how to sell multi- tenant communications services. Important advantages will come from service providers ability to deliver services that tie the network together with compute and storage. The network infrastructure effectively becomes a distributed cloud that helps to reduce costs and increase service differentiation. Service providers can offer compute and storage service options as well as network capabilities that ensure application performance in the cloud"[33]. The report summarizes a group of very promising savings (from both CAPEX and OPEX sides) for network service providers when they build their own distributed clouds and use them to operate ( network and cloud ) services to their clients.

CAPEX savings in the cloud:

1- Investments in network hardware that can be virtualized can be reduced by 25% to 80%2- Virtualization of customer premises equipment can deliver 30% savings3-Base station virtualization can reduce civil works costs by 60%4-Incremental capital investment for adding a subscriber can be reduced by 70%

OPEX savings in the cloud:

1- Data center operations costs can be reduced by 40%2-Services operations costs can be reduced by 25%3-Network planning and engineering expenses can be reduced by 20%4-100% savings in energy and real estate expenses is achieved for the network elements eliminated from the network through hardware reduction. 90% reduction in maintenance charges, and 45% reduction in network operations, is also realized for these elements.5-Base station virtualization leads to 60% savings in site rental costs and a 50% reduction in power consumption expenses The Alcatel-Lucent researchers also found a connection between internal transformation and incremental revenue opportunities in the cloud

As mentioned earlier, a cloud service provider would have to cater network services to clients to support one of three functions:

A- Connecting the clients' private cloud to the VMs the client reserved in the data centers. this could be using the Internet or VPNs as shown in figure 2.1.

B- Connecting the VMs on different Public clouds together to facilitate data exchange between two VMs reserved by the same client.

C- Connecting VMs on the same public cloud together.

It is no use to the clients if their application is producing the results needed in the required time if these results cannot be delivered to the client base through a stable network connection. In [1], data transfer bottlenecks are stated as one of the main obstacles cloud client base growth is facing. In that paper, the authors show that when moving large amounts of data in a distributed data center environment, the network service performance will be a critical point for the whole process. In the example mentioned, the authors reached the conclusion that for some configurations, the data transmission tardiness would cause the client to prefer options like sending data disks with a courier (FedEx, for example).

2.3 Network-aware resource allocation: methodology and design challenges

Targeting a network-aware resource allocation system brings to the front multiple challenges that faced the cloud computing community. Addressing those issues would be of utmost importance to form a complete solution. We can classify these design challenges to two categories:

External challenges: enforced by factors outside the resource allocation process

1- Regulative and Geographical challenges 2- Charging model issues

Internal Challenges:

1- Data Locality: An opportunity and a challenge (Combining compute and data management) 2- Reliability of network resources inside a data center: 3- SDN design challenges inside the data centers

Before we discuss these challenges one by one, we will discuss in brief the general research strategy that we will use.

2.3.1 Research Strategy/Methodology

Our first goal is to focus on the design and validation of a cloud resource allocation system that tackles these challenges. We will execute a well defined set of tasks that leverage optimization techniques, graph theory, stochastic analysis, and data center network simulation. We will generally follow these steps which will slightly differ from a phase to phase throughout the project

1- In our work, we will not assume any specific networking substrate technology. Instead we will build a solution for a generalized network. 2- Also, no specific restrictions on the data center architecture or the build/type of serves will be imposed. 3- As a preliminary step, we will simulate the data center environment in order to get the key research components up and running. 4- Next, for each one of these challenges, we will to adjust the environment parameters in order to separate the dominant factor that causes the challenge. 5- Then, Exploratory solutions will be tested and their effectiveness recorded. 6- As we tackle these challenges one by one, we will gain a better understanding of the data center moment to moment dynamics. This will help pursuing and constructing a complete solution that gets the best combined result for all the issues as per the needed combination function.

2.3.2 Methodology for External challenges

1- Regulative and Geographical challenges

In the virtualization model used In cloud offerings , the client does not manage the physical location of data . There is no guarantee given by the provider as for the data physical location in a certain moment[1]. In fact ,It is a common practice to distribute client data over multiple geographically distant data centers with Internet as the communication medium more often than not. Splitting the data will enhance fault tolerance but it will present us with regulative and security challenges . An example would be the regulative obligation of complying with the U.S. Health Information Portability and Accountability Act (HIPAA) (the Health Information Protection Act (HIPA) is Canada's version) . Both were enacted to ensure security and confidentiality of personal health information in order to protect patient privacy[37].

If we take HIPAA for example , HIPAA does not apply directly to third party service providers, it is imperative that health care organizations require the third-party providers to sign contracts which require them to handle all patient data in adherence with HIPAA standards.

Complying with HIPAA raises some constraints to handling and storing data :

A- Geographical constraints: HIPAA requires that patient data does not leave US soil.[38] This constraint would limit the choice of data centers to allocate a VM to and data movement manoeuvres while trying to optimize performance. Additionally , "When data is stored in the cloud, you need to make sure there is a way for you to know exactly where the data is physically stored, how many copies of the data have been made, whether or not the data has been changed, or if the data has been completely deleted when requested."[39]

B- Client actions : To get more assurance about data security ,clients may require guarantees like instant wiping(writing over byte by byte) of the data instead of deletion. They might also require storing encrypted data on the cloud[39] . This would pose an extra pressure on the performance and will make it harder to comply with QoS requirements.

C- Under HIPPA, patients have a right to access any information stored about them[38]. A careful study of the locations of the patients and the usage distribution of these patients would be crucial for the resource allocation system. considering this factor when placing the data would minimize the distance patient data will travel in the network. . Here, making a decision where the data has to be located will have a direct effect on minimizing the cost.2- Charging model issuesThe Resources management system should incorporate the clients charging model. For example , when using Amazon EC2 , A client has the choice to pay for the instances completely on demand, reserve an instance for term contract (1 year or 3 years) or even choose spot Instances that enable you to bid for unused Amazon EC2 capacity. Spot instance Price is set by Amazon EC2 and fluctuates periodically depending on the supply of and demand for Spot Instance capacity[2]. We would like to investigate issues like :

A- Finding the service portfolio design /offering that maximizes the revenue weight of excess resources in the data center. Examining the options available in the market - and from the example above- we observe that cost is not calculated based on static consumptions.

B- Finding the best way to integrate the virtual network usage into the cost analysis. Challenges would arise because a virtual link length/distance (and in turn cost) vary from link to link . A virtual link could even change to use another physical path on the substrate network based on methodology used . More detailed examples are found in [4].

2.3.3 Methodology for Internal challenges

1- Data Locality : An opportunity and a Challenge ( Combining compute and data management)

There is a pressing need for systems to implement data locality features "the right way". we mean by that how to combine the management of compute(processing ) and data(network) resources using data locality features to minimize the amount of data movement and in turn improve application performance/scalability while meeting end users performance/security concerns. It is very important to schedule computational tasks close to the data, and to understand the cost of moving the work as opposed to moving the data [10].

To have a full view of how to use data locality we need to investigate the following: A- A Data aware-scheduler is critical in achieving good scalability and performance. A more specific perspective needs to be reached. Questions like : ( How much would the scheduler know at a certain moment , what are the policies and decision criteria for moving data ,what data integration policies should be enforced )are to be investigated.

B- Analyzing the behaviour of data intensive applications is a very good starting point to understand data locality and data movement patterns.

C- Also an idea to be investigated is moving the application itself to servers in the data center where the needed data is. This raises questions about the availability of servers in the other data center, policy/ algorithm specifications on when to move considering future demand might need data stored in the original location, decision criteria as to migrate the whole VM or just move the concerned application .

D- As discussed previously ,While moving data to increase data locality regulative and Security issues should be considered .

2- Reliability of network resources inside a data center While discussions of the network from data center to data center are abundant, we notice that discussions of the data center internal network is often ignored. it is vital for consumers to obtain guarantees on service delivery inside a data center. we aim to investigate:

A-The design of a resilient virtual network inside a data center.B- Moreover, we aim to investigate and analyze ways in which we can design a data center network that provides different types of service level agreements (SLAs) that can be negotiated between the providers and the consumers. These SLAs must be enforced by the resource manager. The network controller inside the data centre should always have updated and precise network and cloud resource usage. The resource manager should employ fast and effective optimization algorithms.

C- A critical subject that must be analyzed here is the trade-off between complexity and efficiency.

D- As discussed previously ,While moving data to increase data locality regulative and Security issues should be considered .

3- SDN design challenges inside the data centers

As discussed earlier ,using the new concepts of Software defined networks would enhance the network performance. But since it is a relatively new idea , the community has yet to tackle deeply these issues regarding SDN :

A-Reliability - As a centralized method, using SDN controllers affects reliability if the controller fails. Although solutions like stand by controllers or using multiple controllers for the network are suggested, practical investigation is needed to reveal the problems ,find the decision criteria and analyze the trade-offs of using such solutions.

B- Scalability -[21] determines that when the network scales up in the number of switches and the number of end hosts, the SDN controller can become a key bottleneck. For example , [22] estimates that a large data center consisting of 2 million virtual machines may generate 20 million flows per second. But the current controllers can support about 105 flows per second in the optimal case [18]. Extensive scalability results in losing visibility of the network traffic, making troubleshooting nearly impossible.

C- Visibility - Prior to SDN, the network team could quickly spot, for example, that a backup was slowing the network. The solution would then be to simply reschedule it to after hours. Unfortunately with SDN, only a tunnel source and a tunnel endpoint with UDP traffic is visible, but one cannot see who is using the tunnel. There is no way to determine whether the problem is the replication process, the email system or something else. The true top talker is shielded from view by the UDP tunnels, which means that when traffic slows and users complain, pinpointing the problem area in the network is not possible. With the loss of visibility, troubleshooting is hindered, scalability is decreased and a delay in resolution could be quite detrimental to the business [19][23].

D- The Controller placement problem influences every aspect of a decoupled control plane, from state distribution options to fault tolerance to performance metrics. This problem includes placement of controller with respect to the available topology in the network and the number of needed controllers. This placement is related to certain metrics defined by the user, and they could vary from latency to increasing number of nodes in the network, etc. According to [20], random placement for a small value of k medians will result in an average latency between 1.4x and 1.7x larger than that of the optimal placement. As a result, cloud clients will see network service specification as decisive factor in their choice to move to the cloud or to choose which cloud provider. Factors like bandwidth options , port speed , number of IP addresses , load balancing options, availability of VPN access among others should be considered by any comprehensive model.

The proposed model tackles the resource allocation challenges faced when provisioning computational resources (CPU, memory), storage and network resources. Advance or immediate reservation requests for resources are sent by clients. A central controller manages these requests with the objective of minimizing average tardiness and request blocking. The solution aims at solving the provider's cost challenges and the cloud applications performance issues. To the best our knowledge, this proposal is the first to address the resource allocation problem in cloud computing data centers in this form.The proposed model schedules network connection requests for the network between data centers and client nodes. A future step will be tackling internal data center networks scheduling.

2.4 ObjectiveWe have two cases in which we can tackle this problem :

2.4.1 Static case

Here, the system configuration, data center design parameter and the network topology are known. Also , All client requests are known in advance for a certain period and the objective becomes to maximize capacity results of the data center network. The problem will be modeled using optimization theory. The main objective would be to find the maximum performance point. This could be in the shape of maximum client requests served , minimum blocking percentage , minimum average request service tardiness (latency) or maximum revenue generated for a certain period. An important objective also is to understand the dynamics of the system and find the bottlenecks of the network and computational resources of the data center network resources inside and outside the data center will be used.

2.4.2 Dynamic Case

In this case, client requests are not known in advance. The objective is to get the best possible result based on the comparison with the optimal case. We aim at finding heuristic algorithms that achieve good levels of performance at dynamic scenarios. Until now , we have suggested five heuristics that can be used in the process of network based resource allocation and scheduling. Detailed steps and comparison results of each are within the attached paper. We aim at enhancing the functionality and improving the performance of these algorithms along with suggesting more heuristic algorithms that serve the ideas that we discussed earlier.

2.5 Preliminary Results

In the first step, we modeled the optimization problem of minimizing the average tardiness of all advance reservation connection requests. In this case we will try to reach the least average tardiness possible regardless of what path is used . That should be done while satisfying the requirements for the different clients virtual connection requests .

2.5.1 Detailed formulation

First, The underlying physical network of data centers and client nodes can be described as N(Q,L,M) where:Q : is the set of servers in all the data centers. L : the set of physical links. We assume each link will be divided with granularity set before the experiment .All links will also be bidirectional. M: is the server resource matrix .The server resources we consider here are : memory size, number of CPUs and Storage. Therefore, Mq1 is the amount of memory units on server q. Also, Mq2 is the number of CPUs on server q while, Mq3 is the amount of storage units on server q.

2.5.2 Client request types

As discussed earlier , Client requests are divided to 3 categories : 1-A request to reserve or create a VM . The VM type is defined in the request.. 2- A request to establish a connect between a Client node and a VM .. 3- A request to establish a connect between a VM and another VM .To describe VM reservation requests we will use the following notions: V is the set of vertices. They represent the requested VMs. K is set that describes the amount of resources needed for every requested virtual machine. Kvm is the amount of resource m requested by VM v. For example, required memory by VM v is Kv1 = 7GB.

Also, we will need to use the set P which represents the set of all paths that can be used in the network. The connection requests come as a part of the problem. Every connection request will specify the source s, destination d, requested start time r and a connection duration (time needed ) t. F will represent our scheduled starting time for connection request i . Other required variables for the problem modulation include: alp which is = 1 if link l is on path p and 0 otherwise. bqcp Which is =1 if path p is an alternate path from server q to server c and 0 otherwise . TARD represents the allowed tardiness level measured by time units for the experiment. This is modeled as MILP as follows:

In (2), we ensure that a VM will only be assigned to one server. In (3), we ensure that a connection request will be assigned exactly one physical path. In (4), we guarantee that VM resource requirements won't exceed these of the servers they reside on. In (5), we ensure that a connection is established only on one of the alternate legitimate paths form one VM to another. In (6) and (7), we ensure that at most one request can be scheduled on a certain link at a time and that no other request will be scheduled on that link until the duration is finished. In (9), we ensure that the scheduled time for a request is within the tardiness window allowed in this experiment.

Chapter 3Research Problem II: Energy Efficient Network-based resource allocation

As data centers number and average size expand, so does the power or energy consumption . Electricity used by servers doubled between 2000 and 2005, from 12 billion to 23 billion kilowatt hours[9]. This is not only due to the increasing amount of servers per DC, but also the individual server consumption of energy has increased too. Before the year 2000, servers drew on average about 50 watts of electricity. By 2008, they were averaging up to 250 watts [9]. The increase in energy consumption is of major concern to the data center owners because of its effect on the operational cost . It is also a major concern of governments because of the increase in data centers' carbon footprint . As the cloud service providers aim for and expect, the cloud client base is expanding by the day and this demand will lead to building new data centers and developing the current data centers to include more servers and upgrade the existing servers to have more functionality and use more power. Power-related costs are estimated to represent approximately 50% of the data center operational cost and they are growing faster than other hardware costs like server or network costs [24]. Thus , energy consumption is proving to be a major obstacle that would limit the providers ability to expand. Recently, the response to this fact is seen in the practical landscape as major players in the cloud market are taking more serious steps. Companies as large as Microsoft and Google are aiming to deploy new data centers near cheap power sources to mitigate energy costs [24]. Recently, leading computing service providers have formed a global consortium known as The Green Grid that aims at tackling this challenge by advancing energy efficiency in data centers [13]. This is also pushed by governments in an attempt to decrease the carbon footprints and the effect on climate change. For example, the Japan Data Center Council has been established by the Japanese government to mitigate the soaring energy consumption of Data Centers in Japan[17].

We intend to investigate available data center energy consumption optimization models. Moreover, We will provide a model of our own to tackle the problem of energy consumption in distributed clouds. This will be integrated with our solution of the first problem. We aim at providing an algorithm to distribute the cloud resources in a way that minimizes the energy consumed by these resources. Resources involved will not just include data center computational resources (processors , memory , disks, etc) but also the power consumption of the Cloud network resources. Power needed to establish connections between virtual machines in different data centers or in the same data center will be provisioned for. There is a lack of a comprehensive solution for energy efficient network based resource allocation as most of the solutions concentrate on the architecture and power usage of the computational resources.

3.1 Related Work

Multiple solutions were proposed with the aim of reaching an energy efficient resource allocation scheme. A common concept is the idea used in [15]. The algorithm in [15] is proposed to execute the consolidation of different applications on cloud computing data center servers. The idea is to consolidate tasks or VMs on the least amount of servers and then switch the unused servers off or change them to the idle state. That problem is modeled as a bin-packing problem with the assumption that the servers are the bins and they are full when their resources reach a predefined optimal utilization level. This utilization level is calculated and set beforehand. Resources used are processor and disk space. A heuristic is used to allocate workloads to servers or bin. This heuristic tries to maximize the Euclidean distance between the current allocations of the servers and the optimal point of each server. There were no comparisons to the optimal solution. Also , power consumption by network components is not considered. Another issue here is it is debatable whether finding an optimal point for each server is only based on utilization without considering the type of the application.

Other works took a hardware planning approach to the problem. Instead of trying to reach the highest performance possible ,the aim is to execute a certain work load with as little energy as possible . The concept used in [27] for example was to build a cluster of embedded devices that use little energy .The results were acceptable for tasks with low computational content. This would not suit the cloud clients needs as this architecture cannot support applications with high computational demands. In [24], authors try to produce a hybrid design that mixes low power platforms with high performance platforms. They have shown an improvement over the low power platforms .This was tested on different categories of tasks (compute intensive , non compute intensive) . The power usage break down to system components was not discussed . Network components energy consumption was not considered.

An economic approach to manage shared resources and minimize the energy consumption in hosting centers is described in [28]. The authors present a solution that dynamically resizes the active servers and responds to the thermal or power supply events by down grading the service based on the SLA. In practical scenarios ,this approach alone would be not be sufficient. With tens of thousands of requests arriving every time unit, and with the scheduling component already allocating the requests at the lower limit of the SLAs to have enough resources , it will not be very easy to find active requests that can tolerate their services being downgraded. In [25], Authors consider heterogeneous clusters where a number of servers of different type are deployed in each cluster. They aim at allocating resources while considering the consumed energy. The operation cost of a server is modeled as a constant operation cost plus a cost factor linearly related to the utilization of the server in the processing domain. The same calculation method is considered in [26].

in [16] , the authors suggest heuristics for dynamic adaption of VM allocation at run-time according to the current utilization of resources. The authors apply live migration, switching idle nodes to the sleep mode, and thus minimizing energy consumption. That approach can handle a heterogeneous infrastructure and heterogeneous VMs. "The algorithms do not depend on a particular type of workload and not require any knowledge about applications running in VMs."[16]. This approach only considers the CPU among the energy consuming parts of the system. It also does not consider the energy consumed by network components .

3.2 Motivation

3.2.1 A comprehensive Solution for Energy Efficient Network-based resource allocation

Provisioning for cloud services in a comprehensive way is of crucial importance to any resource allocation model. Any model that aims at allocating resources while minimizing energy consumption in a distributed cloud should consider all sources of energy consumption. The model should include analysis for power used by CPU, Memory, Hard disks, and power supply unit which are the main power consuming components in a server. An illustration of the power consumption of the possible components is shown in figure 3.1 (based on results of studies performed by the authors of [9]).

Figure 3.1: Server Power Consumption (Source: Intel Labs, 2008)

Also the model should investigate power consumed by network components to transmit data both inside the data center and outsider the data center (connecting data centers together). Any energy gain that can be gotten from any of these components is an important achievements since one data center's operational cost effect and impact on environment are both very high. An average data center is estimated to consume as much energy as 25,000 households [29].

3.3 Energy Efficient Network-based resource allocation: methodology and design challenges

3.3.1 Research Strategy/Methodology

Our goal for this problem is to focus on the design and validation of a cloud resource allocation system that is energy efficient. We will generally follow the same strategy and apply the same methodologies used in the first problem. We summarize the main adjustments in our approach as follows :

1- No specific restrictions on the data center architecture or the build/type of hardware will be imposed.

2- Next, for each one of these energy efficiency challenges discussed hereafter, we will to adjust the environment parameters in order to separate the dominant factors that cause the challenge. The aim here is to find the main points of energy leakage in a data center. This could involve design problems or specific situations/circumstances that maximize the power consumptions

3- Then, exploratory solutions will be tested and their effectiveness recorded. As we tackle these challenges one by one, we will gain a better understanding of the data center moment to moment dynamics which will help pursuing and constructing a complete solution that gets the best combined result for the issues as per the needed combination function.

4- Our final step will be to integrate this solution with the solution of the first problem to arrive at a full network-based energy-aware resource allocation component that can be deployed as part of any distributed cloud computing management system.

3.3.2 Common solutions and common trade-offs

1-A solution with a lot of variations in the literature is consolidation of applications on fewer servers. This concept , despite its simplicity, has the potential to impact the performance negatively. There are three main issues here :

A- A Consolidation could quickly cause I/O bottlenecks. Concentration of VMs increases the competition for physical server resources which causes the performance to suffer as it has a high probability of having IO bottlenecks. This would threaten the performance level . it can cause more power consumption because of the latency in task completion.

B- Network bottlenecks : Connection blocking would increase visibly as connections from and to all the consolidated VMs compete for the links available to the physical node where the server is. This will be clear for applications with heavy data transaction as higher blocking percentage would be found around the servers carrying the consolidated VMs. This would cause even more latency and would consume more network related power.

C- The method used to hibernate or shut down the unused servers should be considered. there is the latency caused by the time needed for system hibernating and waking up. There is also power consumed. If used , consolidation should be part of a more complicated solution that takes in consideration those issues along with client priorities.

2-Also, what about VM migration ? this is the core of the consolidation process. The methodology might differ based on the VM size and configuration variations. Nevertheless, trade-offs have to be considered between the power gained by moving the VM and hibernating the machine it is on and the total losses caused by this migration .

These losses include:

A- Time lost moving VM through the network.

B- Power consumed by network components during the move

C- Latency of the task completion caused by the changed node on the network and the need to provision new network resources.

D- Cost of bandwidth in case of large sets of data.

Clear decision criteria are needed for knowing when is it beneficial to migrate a VM, considering not only short term gains but also the long term situation.

3.3.3 Energy consumption Vs. optimal performance : hardware contradictions The way processors work currently, a higher performance (execution speed) is achieved by maximizing the use of the processor cache memory and minimizing the use of the main memory and disks. Also, The number and capacity of cache memory modules is going to increase in the future. In addition, using mechanisms like out-of-order execution, high speed buses, and support for a large number of pending memory requests increases the transistor counts which leads to more wasted power. Thus, questions arise as to where is the optimal point between performance and power consumption is cases like this one?

3.3.4 Idle state as a major source of wasted power

Buying or adding new resources will not solve the problem, it rather complicates it. As calculated by authors of [15] the power consumption of the main server components (CPU, memory, etc) starts with a constant then increases linearly as the utilization goes up. In [30], the authors explain the energy waste that happens because of idle servers. "Even at a very low load, such as 10% CPU utilization, the power consumed is over 50% of the peak power." We should beware of this idea especially when there is a bottleneck since all the other idle resources are wasting power. Therefore, we should pay special attention to optimizing all the resources in the DC. We can see that energy consumption in this case is not really additive. Adding more resources (servers, etc) might support performance but generally, it will not help minimizing the power consumption of the distributed clouds. On the contrary, unused servers will consume significant power while being idle.

3.4 Objective

We have two cases in which we can tackle this problem:

3.4.1 Static case

Here, the system configuration, data center design parameter and the network topology are known. Also, all client requests are known in advance for a certain period and the objective becomes one of two:

A- To maximize capacity results of the data center network and minimize the total consumed energy.

B- To minimize the consumed energy while maintaining a certain level of request service rate. For example, minimizing the total energy used to allocate and serve N requests such that the blocking rate does not exceed 5% of the total requests arrived. An important objective also is to understand the dynamics of the system and find the causes for bottlenecks of the power consumption in the network. A detailed tracking of the network and computational resources would be crucial.

3.4.2 Dynamic Case

In this case, client requests are not known in advance. The objective is to get the best possible result based on the comparison with the optimal case. We aim at finding heuristic algorithms that achieve good levels of performance in dynamic scenarios.

Bibliography

[1] M. Armbrust, A. Fox, R. Gri_th, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A.Rabkin, I. Stoica, and M. Zaharia, "Above the Clouds: A Berkeley View of Cloud Computing," Tech. Rep. UCB/EECS-2009-28, EECS Department, U.C. Berkeley, Feb 2009.

[2] "Amazon Elastic Compute Cloud (Amazon EC2),'' available online from : http://aws.amazon.com/ec2/.

[3] S. Maguluri, R. Srikant, and L. Ying , " Stochastic Models of Load Balancing and Scheduling in Cloud Computing Clusters," IEEE INFOCOM 2012 Proceedings. pp.702 710, 25-30 Mar,2012.

[4] G. Sun, V. Anand, H. Yu, D. Liao, and L.M Li, " Optimal Provisioning for Elastic Service Oriented Virtual Network Request in Cloud Computing," IEEE Globecom 2012

[5] T.D.Wallas, a, Shami, and C.Assi, "Scheduling Advance reservation requests for wavelength division multiplexed networks with static traffic demands, " IET communications, 2008, Vol. 2,No.8. pp. 1023-1033

[6] X. Nan, Y. He, and L. Guan, " Optimal resource allocation for multimedia cloud in priority service scheme, " IEEE International Symposium on Circuits and Systems (ISCAS),2012 , pp. 1111- 1114

[7] M. Alicherry, and T.V. Lakshman. "Network aware resource allocation in distributed clouds," IEEE INFOCOM 2012 Proceedings. pp.963-971, 25-30 Mar,2012.

[8] B. Kantarci, and H.T. Mouftah. "Scheduling Advance reservation requests for wavelength division multiplexed networks with static traffic demands, " IEEE Symposium on Computers and Communications (ISCC), 2012. pp. 806-811,1-4 Jul 2012.

[9] L. Minas and B. Ellison. "The Problem of Power Consumption in Servers ," Prepared in Intel Lab , Dr. Dobbs Journal, Mar 2009.

[10] I. Foster, Yong Zhao, I. Raicu, and S. Lu. "Cloud computing and grid computing 360-degree compared," InGrid Computing Environments Workshop, 2008. GCE'08, pp. 1-10. IEEE, 2008.[11] A. Leinwand, The Hidden Cost of the Cloud: Bandwidth Charges, http://gigaom.com/2009/07/17/the-hidden-cost-of-the-cloud-bandwidth-charges/, 2009[12] Dillon, Tharam, C. Wu, and E. Chang. "Cloud computing: Issues and challenges," InAdvanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, pp. 27-33. IEEE, 2010.[13] The green grid consortium, 2011. URL: http://www.thegreengrid.org. [14] I. Raicu, , Yong Zhao, I. T. Foster, and A. Szalay. "Accelerating large-scale data exploration through data diffusion," InProceedings of the 2008 international workshop on Data-aware distributed computing, pp. 9-18. ACM, 2008.[15] S. Srikantaiah, A. Kansal, and Feng Zhao. "Energy aware consolidation for cloud computing," InProceedings of the 2008 conference on Power aware computing and systems, pp. 10-10. USENIX Association, 2008.[16] A.Beloglazov, J. Abawajy, and R. Buyya. "Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing,"Future Generation Computer Systems28, no. 5 (2012): 755-768.[17] Ministry of Economy, Trade and Industry, Establishment of the Japan data center council, Press Release.

[18]G. Ferro, OpenFlow and Software Defined Networking, SDN and OpenFlow Webinar by Big Switch Networks, December 2011[19] V. Yazc, O. Sunay and A.O. Ercan, Controlling a Software-Defined Network via Distributed Controllers, NEM Summit in Istanbul, Turkey, October 2012 [20] B. Heller, R. Sherwood and N. McKeown, The Controller Placement Problem, ACM 2012

[21] A. Voellmy and J. Wang, Scalable Software Defined Network Controllers, SIGCOMM12, August 2012, Helsinki, Finland.

[22] A. Tavakoli, M. Casado et al., Applying NOX to the Datacenter, Hot Topics in Networks Workshop 2009 [23] H. Bae, SDN promises revolutionary benefits, but watch out for the traffic visibility challenge, http://www.networkworld.com/ , January 2013 [24] B.G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, L. Niccolini, " An energy case for hybrid datacenters, ACM SIGOPS Operating Systems Review", v.44 n.1, Jan. 2010 [25] H. Goudarzi and M. Pedram, "Maximizing profit in the computing system via resource allocation," Intl workshop Center Performance, Minneapolis, MN, Jun. 2011. [26] Goudarzi, H.; Pedram, M.; , "Multi-dimensional SLA-Based Resource Allocation for Multi-tier Cloud Computing Systems," Cloud Computing (CLOUD), 2011 IEEE International Conference on , vol., no., pp.324-331, 4-9 Jul. 2011.[27] D. g. Andersen et al. "FAWN: A fast array of wimpy nodes," In SOSP, 2009.

[28] J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat and R. P. Doyle, " Managing energy and server resources in hosting centers," Presented at 18th ACM Symposium on Operating Systems Principles (SOSP'01), October 21, 2001.

[29] J. Kaplan, W. Forrest, N. Kindler, "Revolutionizing Data Center Energy Efficiency,"McKinsey & Company, Tech. Rep.

[30] G. CHEN, , et al. "Energy-aware server provisioning and load dispatching for connection-intensive internet services," In NSDI (2008).

[31] SCOPE Alliance. Telecom grade cloud computing. www.scopealliance.org, 2011.

[32] R. Van den Bossche, K. Vanmechelen, J. Broeckhove, , "Cost-Optimal Scheduling in Hybrid IaaS Clouds for Deadline Constrained Workloads," IEEE 3rd International Conference on Cloud Computing (CLOUD),2010, pp.228-235, 5-10 July 2010.

[33] "The carrier cloud, driving internal transformation and new cloud revenue, Strategic white paper , Alcatel-Lucent white papers , 2011.

[34] I. Monga, Software-Defined Network: A view from Summer Joint Techs focus-day, Energy Sciences Network, August 14th, 2012 -[ONF].

[35] G. Ferro, OpenFlow and Software Defined Networking, SDN and OpenFlow Webinar by Big Switch Networks, December 2011.

[36] A. Tootoonchian, S. Gorbunov et al.,On Controller Performance in Software-Defined Networks, USENIX Association Berkeley, CA, USA 2012.

[37] Ontario Ministry of Health and Long-Term Care: Health Information Protection Act, 2004

[38] E. Moyle, "Why Cloud Computing Changes the Game for HIPAA Security,"http://www.technewsworld.com/story/72291.html

[39] P. Rudo," How Cloud Computing Affects HIPAA Compliance," http://enterprisefeatures.com/2011/08/how-cloud-computing-affects-hipaa-compliance/ , an article published on 28/08/11

[40] S. Gittlen "Bandwidth bottlenecks loom large in the cloud", http://www.computerworld.com, Jan 4, 2012

[41] J. Frey, "Network Management and the Responsible, Virtualized Cloud," research report , Feb 2011.

resourceallocation ina network-based cloud computingenvironment

Documents