chapter 5 parallel genetic algorithm and coupled...
TRANSCRIPT
124
CHAPTER 5
PARALLEL GENETIC ALGORITHM AND COUPLED
APPLICATION USING COST OPTIMIZATION
5.1 INTRODUCTION
Cloud Computing provides on demand access of resources over the
network. The main characteristics of virtualization technologies employed in
Cloud atmosphere is the consolidation and proficient management of the
resources. The current work employs an optimized scheduling algorithm,
which concentrates on the efficient utilization of the resources for the cloud
scheduling problems. A Parallel Genetic Algorithm with the Dynamic Deme
model is used for scheduling the resources dynamically. The investigation
shows that the scheduling procedure improves the utilization rate of the
system resources and also the pace of allotment of the resource. The user
could access the computing resources as general utilities, which can be
acquired and released at any time.
Access to Cloud resources easily enables the simultaneous use of
many clouds. The system analyzes the viability from the view point of
scalability, performance, and cost of deploying large virtual cluster
infrastructures distributed over different cloud providers for solving loosely
coupled Many Task Computing (MTC). The performance of different cluster
configurations can be evaluated using the cluster throughput as a performance
metric.
125
5.2 PARALLEL GENETIC ALGORITHM FOR RESOURCE
SCHEDULING IN CLOUD
The resource scheduling is a crucial process in cloud applications
such as IaaS. The existing approach worked with Parallel Genetic Algorithm
(PGA) used for resource allocation and utilization of system resources.
Thereby in the proposed model addressed a novel approach called PGA with
Dynamic Deme, efficiently scheduling the resources in cloud environment
dynamically.
The most important advantage of PGAs is that in many cases they
provide better performance than single population-based algorithms, even
when the parallelism is simulated on conventional machines. The reason is
that, multiple populations permit speciation, a process by which different
populations evolve in different directions. For these reasons PGAs are not
only an extension of the traditional GA sequential model, but they represent a
new class of algorithms in which, they search the space of solutions
differently.
This proposed work focuses on the analysis of the performance of
the Dynamic Demes algorithm for cloud resource scheduling in an efficient
manner. The investigation shows that, the scheduling procedure improves the
utilization rate of the system resources and also the pace of allotment of the
resource.
5.2.1 Architecture Diagram
The above architecture explains the architecture new of Parallel
Genetic Algorithms (PGA) which is scalable to the large systems, commonly
found in clouds. The initial step begins with analyzing the process using the
simulation kit. From that, the input resources and instead request can be
126
divided which are the resource of PGA with Dynamic Deme Model. The next
block consist of PGA with coarse grained and Dynamic Deme model which
well do the optimization of resource allocation and the allocated resources are
located in the allocation sequence block. These block resources performance
parameters may be gathered and sent to the performance analysis block. The
actual performance evaluation can be carried out in the performance analysis
model. The next block is called as performance report block which is meant
for logging performance. After taking the performance report, the allocation
of resources will be terminated which is referred in the termination block
Figure 5.1 Architecture Diagram of Parallel Genetic Algorithm
Initiate the Process using the Simulation tool
Input Resource & Instance Request
PGA Scheduler with Coarse-Grained & Dynamic Deme model
Allocation Sequence Performance Analysis
Performance Report
Termination
127
5.2.2 Methodology
The management of a cloud leaves providers with difficult tasks of
dynamically provisioning a large-scale system to meet customer’s demands.
Traditional optimization techniques cannot properly handle the scale of
leading cloud environments. The research examines stochastic optimization
strategies using Parallel Genetic Algorithm which are scalable to the large
systems and commonly found in clouds to optimize utilization of available
servers and improve the timely service of customer requests.
The basic idea behind most parallel programs is to divide a task
into chunks and to solve the chunks simultaneously using multiple processors.
This divide-and-conquer approach can be applied to GAs in many different
ways, and the literature contains many examples of successful parallel
implementations. Some parallelization methods use a single population, while
others divide the population into several relatively isolated subpopulations.
Some methods massively exploit parallel computer architectures, while others
are better suited to multicomputer with fewer and more powerful processing
elements. A novel attempt of implementing Parallel Genetic Algorithm with
the Dynamic Deme model has been absorbed in the current work for
scheduling the resources.
This method principally aims at allocating the resources, in a more
competent way by utilizing the available resources in Cloud Environment
(IaaS). Allocation of the resource is based on the instance request, provided
by the user. The PGA Scheduler uses the Dynamic Deme Model of the
Parallel Genetic Algorithm for scheduling the resources. The project is
implemented in java language with the help of the Integrated Development
Environment (IDE) Jcreator.
128
5.2.2.1 Genetic Algorithm
Genetic algorithms are inspired by Darwin's theory about evolution.
Genetic Algorithms (GAs) are efficient search methods based on principles of
natural selection and genetics. GAs are generally able to find good solutions
in reasonable amount of time, but as they are applied to harder and bigger
problems, there is an increase in the time required to find adequate solutions.
As a consequence, there have been multiple efforts to make GAs faster, and
one of the most promising choices is to use parallel implementations.
Components of a Genetic Algorithm
• Encoding technique
• Initialization procedure
• Evaluation function
• Selection of parents
• Genetic operators
A typical algorithm might consist of the following:
A number of randomly chosen guesses of the solution to
problem - the Initial Population.
A means of calculating how good or bad each guess is within
the population - a Population Fitness Function.
A method for mixing fragments of the better solutions to form
new and on average even better solutions - Crossover.
An operator to avoid permanent loss of (and to introduce new)
diversity within the solutions - Mutation.
129
5.2.2.2 Parallel Genetic Algorithm
For some kind of problems, the population needs to be very large
and the memory required to store each individual may be considerable. In
some cases this makes it impossible to run an application efficiently using a
single machine, so some parallel form of GA is necessary. Fitness evaluation
is usually time-consuming and the only practical way to provide the required
CPU power is to use parallel processing. The most important advantage of
Parallel Genetic Algorithms (PGA) is that in many cases they provide better
performance than single population-based algorithms, even when the
parallelism is simulated on conventional machines. The reason is that,
multiple populations permit speciation, a process by which different
populations evolve in different directions. For these reason Parallel GAs are
not only an extension of the traditional GA sequential model, but they
represent a new class of algorithms in which they search the space of
solutions differently.
Master-Slave Parallelisation: Master Slave Parallelisation
method, also known as distributed fitness evaluation, is one of
the first successful applications of parallel GAs. It is also
known as global parallelisation, master-slave model or
distributed fitness evaluation.
The algorithm uses a single population and the evaluation of
individuals and the application of genetic operators are
performed in parallel. The selection and mating is done
globally, hence each individual may compete and mate with
any other individual. The operation that is most commonly
parallelised is the evaluation of the fitness function, because,
normally it requires only the knowledge of the individual
being evaluated (not the whole population), and so there is no
130
need to communicate during this phase. This is usually
implemented using master slave programs, where the master
stores the population and the slaves evaluate the fitness, apply
mutation, and sometimes exchange bits of the genome (as part
of crossover). Parallelisation of fitness evaluation is done by
assigning a fraction of the population to each of the processors
available (in the ideal case one individual per processing
element). Communication occurs only as each slave receives
the individual (or subset of individuals) to evaluate and when
the slaves return the fitness values, sometimes after mutation
has been applied, with the given probability.
The algorithm is said to be synchronous, if the master stops
and waits to receive the fitness values for all the population,
before proceeding with the next generation. A synchronous
master-slave GA has exactly the same properties as a simple
GA, except for its speed, i.e. this form of parallel GA carries
out exactly the same search as a simple GA.
An asynchronous version of the master-slave GA is also
possible. In this case, the algorithm does not stop to wait for
any slow processors. For this reason the asynchronous master
slave PGA does not work exactly like a simple GA, but is
more similar to parallel steady-state GAs. The difference lies
only in the selection operator. In an asynchronous master-
slave algorithm, selection waits until a fraction of the
population has been processed, while in a steady-state GA
selection does not wait, but operates on the existing
population. A synchronous master-slave PGA is relatively
easy to implement and a significant speedup can be expected,
if the communication cost does not dominate the computation
cost.
131
Drawback: However, there is a classical bottle-neck effect.
The whole process has to wait for the slowest processor to
finish its fitness evaluations. After that, the selection operator
can be applied. The asynchronous master-slave PGA
overcomes this, but as stated before, the algorithm changes
significantly the GA dynamics, and as a result it is difficult to
analyse.
Subpopulations with Migration: The important
characteristics of the class of static subpopulations with
migration parallel GAs are the use of multiple demes and the
presence of a migration operator. Multiple-deme GAs is the
most popular parallelization method, and many concepts have
been proposed describing details of their implementation.
These algorithms are usually referred to as subpopulations
with migration, static subpopulations, multiple-deme GAs,
coarse-grained GAs and even just parallel GAs. This
parallelisation method requires the division of a population
into some number of demes (subpopulations). Demes are
separated from one another (geographic isolation), and
individuals compete only within a deme. An additional
operator called migration is introduced from time to time;
some individuals are moved (copied) from one deme to
another. If individuals can migrate to any other deme, the
model is called an island model. If individuals can migrate
only to neighbouring demes, it is termed as stepping stone
model. There are other possible migration models.
The migration of individuals from one deme to another is the
topology that defines the connections between the
subpopulations. Commonly used topologies include:
132
hypercube, two-dimension, three-dimensional mesh, torus,
etc. The migration rate controls how many individuals migrate
in a migration scheme, and also controls which individuals
from the source deme (best, worst, random) migrate to another
deme, and which individuals are replaced (worst, random,
etc.). A migration interval determines the frequency of
migrations.
Coarse grained algorithms are a general term for a
subpopulation model with a relatively small number of demes
with many individuals. These models are characterised by
relatively long time as they require for processing a generation
within each (“sequential”) deme, and by their occasional
communication for exchanging individuals. Sometimes coarse
grained parallel GAs is known as distributed GAs, because
they are usually implemented on distributed memory Multiple
Instruction Multiple Data (MIMD )computers. This approach
is also well suited for heterogeneous networks.
Fine grained algorithms function is the opposite way. They
require a large number of processors, because the population
is divided into a large number of small demes. Inter-deme
communication is realised, either by using a migration
operator, or by using overlapping demes. Recently, the term
fine-grained GAs, was redefined and is now used to indicate
massively parallel GAs.
Constraint: The multiple-deme model presents one problem:
scalability. If one has only a few machines, it is efficient to
use a coarse grained model. However, if one has hundreds of
machines available at a time, it is difficult to scale up
efficiently the size and number of subpopulations, to use the
133
hardware platform efficiently. Despite this problem, the
multiple-deme model is very popular. From the
implementation point of view, multiple-deme GAs are simple
extensions of the serial GA. It’s enough to take a few
conventional (serial) GAs, run each of them on a node of a
parallel computer, and to apply migration, at some
predetermined times.
Dynamic Demes: Dynamic Demes is a new parallelization
method for GAs which allows the combination of global
parallelism with a coarse-grained GA. In this model, there is
no migration operator as such, because the whole population
is treated during evolution, as a single collection of
individuals, and information between individuals is exchanged
via a dynamic reorganization of the demes, during the
processing cycles. From the parallel processing point of view,
the dynamic demes approach fits perfectly the MIMD
category (Flyn classification) as an asynchronous multiple
master-slave algorithms. The main idea behind this approach
is to cut down the waiting time for the last (slowest)
individuals to arrive in the master-slave model, by
dynamically splitting the population into demes, which can
then be processed without delay. This is efficient in terms of
processing speed.
In addition, the algorithm is fully scalable. Starting from a
global parallelism with fitness-processing distribution, one can
scale up the algorithm to a fine grained version, with few
individuals within each deme and big numbers of demes. The
algorithm can be used on shared and distributed memory
parallel machines. Its scalability can prove vital in systems
134
with a few Processing Elements, as well as, in massively
parallel systems with large number of Processing Elements,
and everything in between. Dynamic Demes (DD) is scalable
and an easily implemented method of GA parallelization.
Advantage: The main advantages of dynamic demes are:
High scalability and flexibility (DDs can be used to
implement a broad range of algorithms from coarse
grained to highly fine grained models)
Fault tolerance (some of the processors can crash, but the
algorithm will correctly continue the operation)
Dynamic load balancing and Easy monitoring.
Algorithm Description: Each individual is represented by a
separate process called as slave, which is capable of
performing the following:
i) Fitness evaluation
ii) Applying mutation to itself (with a predefined mutation
rate)
iii) Performing crossover with another individual (this is
done by passing to each individual, the process ID of
another individual, with which it should perform
crossover)
All the individuals run concurrently. The ideal case is when a single
processing element processes a single individual. There are additional
processes, called masters, which are responsible for selection and mating.
Masters handle a fixed fraction of the population and apply selection and
mating on it. Therefore, each master represents a separate deme. However,
135
unlike other PGAs, in Dynamic Deme (DD) the individuals belonging to each
deme, change dynamically. The number of masters is a parameter of the
algorithm. If there is only one master, DDs function as a classic distributed
fitness evaluation algorithm. Each master process performs selection and
mating concurrently with the other master.
Mating requires sending the appropriate slave ID to the individuals
chosen for crossing over. When the slaves receive a partner ID they perform
crossover, and then proceed with fitness evaluation and mutation. In addition
to masters and slaves, there is also a process (possibly more) responsible for
load balancing, called counters. After crossover, fitness evaluation and
mutation, each individual is dynamically assigned to a deme (possibly
different from the one it belonged to previously). This happens when the
individual notifies the counter process. The counter process knows which
master processes are currently, idle waiting for their subpopulation to be filled
and it sends to the individual the process ID of one such master.
The last process within the system is called sorter. This process is
informed by all of the individuals finishing their evaluation, takes their
genotype and fitness, and saves them in appropriate log files. The sorter
process is also responsible for stopping the search, when a termination
criterion is met.
5.2.2.3 Modules
The flow of the work consists of three different modules, which
comprises of resource specification followed by the execution of the PGA,
resulting in the allocation sequence. The virtual machine request is provided
as the input parameter to the system, along with the number of iterations to be
carried out for the execution of the genetic algorithm. Based on the virtual
machine, specified cloudlets are created. The scheduling of the resource is
136
carried out using the parallel genetic algorithm and the final allocation
sequence of the resources to the instance request, is obtained when the
simulation ends.
5.2.2.4 Modules Description
i) Creation of Cloud Environment: The initial process is used
to create the Cloud Environment for the execution of the
algorithm with the help of the simulation tool.
ii) Resource Listing: The available resource list, is updated
when allocation or de-allocation of the resources takes place.
The request from the clients are collected and updated in the
VM request list, when ever new VM requests makes it arrival.
Each request is identified by a separate VM id. The instance
request is provided in terms of Cloudlet id’s. The Cloudlet id
is defined in the system based on the Virtual Machine request.
iii) Sequencing: A Parallel Genetic Algorithm is implemented
using Dynamic Deme Model, to calculate the fitness and to
find the optimal allocation sequence, among the available pool
of resource. The Genetic operation is performed using the
concept of threads. A thread, called Slave, performs the,
Fitness evaluation, Mutation and Crossover operation.
Another thread, called Master, performs the Selection
operation and mating process. The third thread is used to
perform the load balancing operation and the last thread
indicates the process of termination. Based on the instance
request, the PGA finds the optimal resource from the available
resources.
iv) Allocation: This module focus on launching the optimal
resource provided by the PGA to the corresponding Instance
137
requests. The VM id which is more optimal to the instance
request is being assigned. Likewise all the VM id’s are
assigned to the corresponding instance requests based on the
expected resource.
Figure 5.2 Modules of the Simulator
5.2.2.5 Procedure for PGA Implementation
The implementation of the proposed concept is in the form of a
simulation using the simulation tool CloudSim.In the simulation, a PGA;
simulator is simulated, which acts as a scheduler for the cloud. The goal of the
“Scheduler” is to find out the allocation sequence to each computing node, in
a cloud, so that, the instances run on proper physical computers. The
automated scheduling model is being divided into three steps. First, the
scheduler updates the available resource list, when allocation or de-allocation
happens and update the VM request list when each time new VM requests
come. Then, the scheduler uses a PGA to find out a fitness and economical
allocation. Finally, the cloud launches the corresponding VMs at the physical
resource for the VM request.
IRs Request
SchedulerPGA Algorithm
Allocation Listing
Computing Resource
138
To run a GA, the two most important factors are Chromosome
Representation and Fitness Function Evaluation.
Chromosome representation: The integer notation is used to
represent the computing resources. The chromosome pattern
represents the corresponding IRs with the VMs. (7 2 1). The
first request is assigned to VM id 7 and the second to the VM
id 2 and so on.
Fitness Function Evaluation: Fitness function provides the
mechanism for evaluating each chromosome in the problem
domains. It is calculated as the summation process and is used
to select the best resource series. Upon selecting the best
resource series the basic genetic operations are performed.m n
ij ijj 1 i 1
F C X where i= 1, 2,…n, j=1,2,..m; (5.1)
F is the total fitness of an allocation scheme, m represents the
node and n represents the instance request. The value of Xij is
either 0 (if Ith IR is not assigned to Jth node) or 1 (if Ith IR is
assigned to Jth node)
3
ij KK 1
C P (5.2)
where PK = a if VMK / node K = 1
b if VMK / node K < 1
c if VMK / node K > 1
K is a label, when k is equal to 1, it represents the CPU, and
when k is equal to 2, it represents the Memory capacity,3
represents the capacity of the disk. C is the fitness of assigning
ith IR to jth node, and F is the total fitness of an allocation
139
scheme. Finally, the calculated fitness value is added with a
big number to get a positive value.
Genetic Operations: The basic operations include replication,
crossover and mutation. Usually single point crossover is
performed. The above mentioned procedure is repeated
concurrently till the optimal solution is obtained.
Steps for Dynamic Deme Algorithm Execution:
1. Input: Initial population of individuals
2. Evaluate the fitness of all individuals
3. While termination condition not met
do
{
4. Slave process performs
i. Fitness Evaluation
ii. Mutation
iii. Crossover
5. Master process performs
i. Selection
ii. Mating
6. Counter process, performs the dynamic re-organization
of population.
7. Sorter process, copies the fitness values and also
responsible for terminating the searching process.
}
End while
where master and slave process are performed concurrently.
140
5.2.2.6 Performance Evaluation
The performance evaluation of PGA with DD model was compared
with the existing PGA algorithm to evaluate the performance metric such as
evaluation time for resource allocation based on the various resource requests
for allocating the resources. Figure 5.3 shown below has given the detail of
the Cloud Environment created for proposed Dynamic Deme model by using
the CloudSim. Also Figure 5.4 denotes the Simulation of DD algorithm for
the resource allocation using PGA.
Figure 5.3 Cloud Environments for Dynamic Deme
Figure 5.4 Simulations for Dynamic Deme Algorithm
141
Table 5.1 shown below explains the evaluation time required for
the number of tests conducted for resource allocation using existing PGA
method and proposed PGA with DD model. The results expressed the
evaluation time consumption for resource allocation using PGA with DD is
comparatively better than the existing approach.
Table 5.1 Performance Evaluation for the Resource Allocation in PGA
with DD
PGA PGA with DD
No of Test Evaluation Time (ms) No of Test Evaluation Time
(ms) 5 3000 5 2800
6 3800 6 3200
7 4000 7 3700
8 4500 8 4200
9 5000 9 4600
10 5600 10 4800
11 6050 11 5100
12 6400 12 5300
Based on the obtained values of evaluation time for the number of
tests conducted, the graph as been plotted as shown in Figure 5.6, with
representing number of tests in X-axis and evaluation time in Y-axis
respectively. The performance curve clearly explains that the proposed
method consumed very less time to allocate the resources in cloud
environment.
142
Figure 5.5 Evaluation Time for Number of Tests
5.3 COUPLED APPLICATIONS USING COST OPTIMIZATION
Cloud computing technologies can offer important benefits for IT organizations and data centers running MTC applications. The challenges and viability of deploying computing clusters are analyzed in the earlier system
for loosely coupled MTC applications with the help of three different Cloud networks like private, public and hybrid. The system analyzes the performance of different cluster configurations, using the cluster throughput as performance metric.
The Multi Cloud deployment involves several challenges. A
performance and cost analysis for different configurations of the real implementation, of a multi-cloud cluster infrastructure, running a real
workload. However, due to hardware limitations in local infrastructure, and the high cost of renting many cloud resources for long periods, the tested
cluster configurations, are limited to a reduced number of computing resources (up to 16 worker nodes in the cluster), running a reduced number of
tasks (up to 128 tasks).
143
The upfront challenge are the constraints of Cloud Interface
Standard, the distribution and management of the service master, images and
interconnects the links between the service components. The clusters are
deployed in a hybrid setup, which combines local physical nodes with virtual
nodes, deployed in another compute cloud. Comparing the different cluster
configurations, and proving the viability of the MultiCloud solution proves
cost effective.
5.3.1 Architecture Diagram
Figure 5.6 Hybrid Cluster Architecture Diagram
A new approach for hybrid cluster called Path Clustering Heuristic
(PCH) algorithm is used for the initial Schedule scheme, to overcome the
above stated problem and to achieve cost optimization. The Hybrid Cloud
systems are a novel research challenge, which comes together with the
merging of private and Public Clouds. In this method, the different cluster
configurations are considered with PCH algorithm dynamically and the
cluster nodes can be provisioned with resources from different clouds, to
improve the cost effectiveness of the deployment, or to implement high-
availability strategies.
Submitted Jobs for Processing
Clustering Jobs
Private Cluster Public Cluster Hybrid Cluster
144
5.3.2 Methodology
Implementation of PCH algorithm has different modules which are
as follows:
5.3.2.1 Modules
The modules are,
i) Creation of Cloud Environment
ii) Implementation of Scheduling process in Private Cloud
iii) Implementation of Scheduling process in Public Cloud
iv) Implementation of Scheduling process in Hybrid Cloud using
PCH algorithm
5.3.2.2 Modules Description
The module descriptions for the above stated modules are as
follows:
i) Creation of Cloud Environment: Creating a Cloud network
model, for the simulation of new approach is the initial step.
Cloud computing paradigm is being widely used for the
execution of many types of applications, including ones with
data dependencies, which can be represented by workflows.
To execute such workflow applications, in a hybrid cloud, the
scheduling algorithm must take cost into consideration and
execution time. Cost and execution time play an important
role in the cloud environment.
High-availability and fault tolerance: The cluster
worker nodes can be spread on different cloud sites. In
case of cloud downtime or failure, the cluster operation
145
will not be disrupted. Furthermore, in this situation, it is
admissible to dynamically deploying new cluster nodes
in a different cloud, to avoid the degradation of the
cluster performance.
Infrastructure cost reduction: Different cloud
providers can follow different pricing strategies, and
even variable pricing models (based on the level of
demand of a particular resource type, daytime versus
night-time, weekdays versus weekends, spot prices, and
so forth), the different cluster nodes can change
dynamically their locations, from one cloud provider to
another one, in order to reduce the overall infrastructure
cost.
A flexible and generic cluster architecture that combines
the use of virtual machines and cloud computing,
dynamically delivers in the heterogeneous computational
environments. Moreover, the introduction of a new
virtualization layer between the computational
environments and the physical infrastructure, makes it
possible, to adjust the capacity, allocated to each
environment and to supplement them with resources,
from an external cloud provider.
ii) Implementation of Scheduling Process in Private Cloud:
The Scheduling process for Private Cloud network model is
implemented. Private Cloud is infrastructure operated, solely
for a single organization, whether managed internally or by a
third-party and hosted internally or externally. Here, resources
that can be accessed and used by individuals inside an
organization, that is similar to data farms or private grids. Also
146
it tries to balance the use of private resources, with the ones
available from the Public Cloud.
Private Cloud (also called internal cloud or corporate cloud) is
a marketing term, for a proprietary computing architecture,
that provides hosted services to a limited number of people
behind a firewall. Real-time monitoring of the condor job
queue and virtual machines that belong to individual Virtual
Organizations are provisioned and booted. Jobs belonging to
each Virtual Organization are then operator on the
organization specific virtual machines, which form a cluster,
dedicated to the specific organization.
Once the queued jobs have been executed, the virtual
machines are terminated, thereby allowing the physical
resources to be re-claimed. Tests of this system were
conducted using synthetic workloads which demonstrate that,
dynamic provisioning of virtual machines preserves system
throughput for all. The shortest-running of grid jobs, without
undue increase in scheduling latency and the deployment
requires root privileges on remote resources, which have made
difficult when dynamic deployment is done on those sites.
iii) Implementation of Scheduling Process in Public Cloud:
The implementation of the Scheduling process for the Public
Cloud network model is considered in this module. A Public
Cloud is the one, based on the standard cloud computing
model, in which a service provider makes resources, such as
applications and storage, and is available to the general public
over the internet.
Public Cloud services may be free or offered on a pay-per-
usage model. Public Cloud describes cloud computing in the
147
traditional mainstream sense, whereby resources are
dynamically provisioned to the general public on a fine-
grained, self-service basis over the internet, via web
applications/web services or from an off-site third-party
provider who bills on a fine-grained utility computing basis.
To users and applications, the process of borrowing nodes is
transparent.
A VM running as part of a VioCluster is practically
indistinguishable from a physical machine running inside the
same domain. Dynamic machine trading is activated between
mutually isolated virtual domains. VioCluster creates
software-based network components, which seamlessly
connect physical and virtual machines, to create isolated
virtual domains. Machines can be traded dynamically, through
the on-demand creation, deletion, and configuration of VMs
and network components.
Dynamic negotiation of machine trades: Each virtual
domain includes a machine broker which interacts with
other domains. Requests and offers are made through
these brokers based on workload and configurable
lending and borrowing policies. Building a prototype of
the VioCluster system, have demonstrated its
effectiveness using two independent Portable Batch
System (PBS) based job-execution clusters. The
performance evaluation results show benefits to both
clusters by increasing their resource utilization and
decreasing their job execution times.
Physical Domain: An autonomous set of networked
computers is managed as a unit. Physical domains have a
148
single administrator, and support a user-base, performing
specific computational activities. For example, a physical
domain belonging to a biology department may be
optimally configured for cellular simulations, while a
physical domain belonging to a network research group,
may be designed for shorter network intensive
experiments.
Virtual Domain: An autonomous set composed of
virtual and physical machines, is managed as a unit.
Machines in a virtual domain are connected through a
virtual private network, to which both virtual and
physical machines have access. Virtual domains are able
to grow and shrink on demand, and to the administrator
they appear to be identical as that of physical domains. A
one-to-one mapping exists between physical and virtual
domains; every virtual domain is hosted upon a physical
domain.
Machine Broker: It is a software agent that represents a
virtual domain when negotiating trade agreements with
other virtual domains. A machine broker consists of a
borrowing policy which determines, under which
circumstances, it will attempt to obtain more machines,
and a lending policy, which governs, when it is willing to
let another virtual domain, make use of machines within
its physical domain. Both policies are defined by the
domain’s administrator.
iv) Implementation of Scheduling Process in Hybrid Cloud
using PCH Algorithm: A new approach for hybrid clusters
called Path Clustering Heuristic (PCH) algorithm is
introduced for the initial schedule scheme. The Hybrid Cloud
149
systems are a novel research challenge that comes together
with the merging of Private and Public Clouds. It checks the
private resources whether it already satisfies the deadline.
Deploying a Hybrid Cloud, offers support or automatic service
installation in the resources, which are dynamically provided
by the grid or by the cloud, to execute the PCH algorithm. In
PCH algorithm, all the information necessary to compute
these attributes, are given by the programming model or by
the infrastructure.
New cluster management architecture for shared mixed-use
clusters is followed. The key feature of Cluster-on-Demand
(COD) is supporting configurable dynamic virtual clusters,
which associates variable shares of cluster resources with
application service environments, e.g., batch schedulers and
other grid services. The COD site manager assigns nodes to
‘v’ clusters according to demand and site policies, based on
dynamic negotiation with a pluggable service manager for
each dynamic ‘v’ cluster. Experimental results with the COD
prototype and a service manager for the Sun Grid Engine
(SGE) batch service demonstrates the potential of dynamic
virtual clusters and resource negotiation as a basis for dynamic
provisioning and other advanced resource management
operations, for future grid systems. The results prove that the
key needs for grid resource management can be met directly
by generic site management features which are independent of
any specific application or middleware environment.
A Well Known Address (WKA) based on membership
discovery and management scheme can be used on
environments where multicasting is not possible. There are
150
one or more members which are assigned with well known IP
addresses. All other members are aware about these well
known members. At least one well known member should be
started up before any other member. It is also possible to
assign a well known address to the member which started up
first. An elastic IP address can be assigned to this first
member. When other members boot up and try to contact one
or more well known members the WKA has the ability to send
a JOIN message.
The well known, member will add this new member to its
membership list, and notify all other members about this new
member who has joined, by sending a MEMBERJOINED
message to the group, and will send the MEMBERLIST
message to the newly joined member. Now, all group
members will be aware about this new member who has
joined, and the new member will learn about its group
membership. Auto scaling, Axis2, Web service applications
on Amazon EC2, are a very appealing ideas from a business
point of view. Such an approach, makes efficient usage of
resources on a cloud computing environment, and achieves an
optimal balance between performance, cost and availability &
scalability guarantees.
An assumption of a virtual homogeneous system is composed
of an unbounded number of best available processor
connected by links with the highest available bandwidth. Each
task is scheduled on a different processor on the virtual
system, and then the algorithm computes the initial attribute
values of each node. The decision is based on performance,
151
cost, and the number of services to be scheduled, in the
Hybrid Cloud using PCH algorithm.
5.3.2.3 Performance Evaluation
A new approach is evaluated with the earlier approaches for
identifying the utilization of resources. The system will analyze and compare
the performance offered by different configurations of the computing cluster,
and the evaluation comparison was performed by evaluating the parameter
metrics such as the viability, from the point of view of, Scalability, Execution
time, Performance and Cost. Based on the comparison and results, it is clear
that the proposed new approach works better than the other earlier systems.
Figure 5.7 shown below represents the creation of Hybrid Cloud
Environment which consists of Cloudlet and VM creation for performance
evaluation.
Figure 5.7 Cloudlet and Virtual Machine Creation
Figure 5.8 has given the details of simulation results of the Cost
optimizing techniques in Hybrid PCH.
152
Figure 5.8 Simulation Result of the Cost Optimizing Technique
The performance evaluations of various metrics stated above are as
follows by comparing the both the Hybrid with PCH and Hybrid Cloud.
Table 5.2 shown below gives the cost optimization for the number of
tasks performed to the utilization of resources in Hybrid with PCH and
Hybrid Cloud. The cost optimization for the proposed work is less when
compared to the existing system of the cloud environment.
Table 5.2 Cost Optimization for the Number of Task
Hybrid with PCH Hybrid Cloud No. of Task Cost No. of Task Cost
10 7200 10 450014 8700 14 500015 10,000 15 600025 15,000 25 12,00050 25,500 50 16,50080 32,500 80 25,000
153
Figure 5.9 illustrates the graphical representation of cost
optimization for the number of tasks executed to the utilization of resources in
both Hybrid with PCH and Hybrid Cloud. The X-axis denotes Number of
Task and Y-axis denotes Cost respectively. Cost optimization is
comparatively less which is shown in the graphical representation.
Figure 5.9 Cost optimization for different Tasks
Table 5.3 shown below given the throughput for the utilization of
resources carried out for the number of tasks for both Hybrid with PCH and
Hybrid Cloud. Here, the throughput of the proposed system is comparatively
higher than that of the existing system.
154
Table 5.3 Throughput obtained for various task
Hybrid with PCH Hybrid Cloud No of Task Throughput No of Task Throughput
10 500 10 60014 650 14 75015 1200 15 135025 1500 25 160050 2200 50 250080 3000 80 3050
100 5000 100 95000
Figure 5.10 represents the throughput for the number of tasks in
both Hybrid with PCH and Hybrid Cloud in the cloud environment. Increase
in throughput leads to the performance improvement of the system. The
graphical representation of the throughput is depicted with Number of Task in
the X-axis and Throughput in the Y-axis.
Figure 5.10 Throughput obtained for different Tasks
155
The scalability values of the systems for the utilization of resources
are shown in Table 5.4 for the number of tasks. Here, the efficiency of the
system is increased with that of the scalability.
Table 5.4 Scalability obtained for various task
Hybrid with PCH Hybrid Cloud No. of Task Scalability No. of Task Scalability
10 600 10 65014 750 14 77015 900 15 92025 1100 25 120050 1250 50 125080 2200 80 2250
100 3000 100 5000
Scalability for the number of tasks obtained for both Hybrid with
PCH and Hybrid Cloud in the cloud environment is demonstrated in
Figure 5.11 with X-axis representing the number of task and Y-axis
representing the scalability respectively.
Figure 5.11 Scalability of the system for different Tasks
156
The resource utilization value of the both, Hybrid with PCH and
Hybrid Cloud is shown in Table 5.5 representing the values of utility usage
for number of tasks.
Table 5.5 Resource Utilization for the Number of Task
Hybrid with PCH Hybrid Cloud
No. of Task Utility No. of Task Utility
10 700 10 800
14 750 14 850
15 770 15 870
25 950 25 1250
50 1500 50 3000
80 2000 80 3950
Figure 5.12 show case the utilization of the system resource for the
number of tasks executed in both Hybrid with PCH and Hybrid Cloud. From
the graphical representation it is clear that the utilization usage of the
proposed system is less than that of the existing system and the X-axis
denotes Number of Task and Y-axis denotes the Utility Usage respectively.
157
Figure 5.12 Utility of the system for different Tasks
5.4 SUMMARY
The main characteristics of virtualization technologies applied in
cloud environment are consolidating the resources, which will lead to
efficient management of resources. Here, two methods are addressed for
various optimized scheduling algorithms. The first method proposed, focused
on efficient utilization of resources, by using the parallel genetic algorithm for
Dynamic Deme model. This method investigates the scheduling procedure to
improve the utilization rate of the system resources. Due to this, the allotment
and releasing of the resources are done efficiently. The next method is used to
analyze the viability, from the view point of scalability, performance; cost of
deploying large virtual infrastructure, distributed over different cloud
providers for solving loosely coupled MTC. The performances of different
cluster configurations are evaluated with the performance metrics (cost
optimization, throughput, scalability and utility). Based on the evaluations,
the proposed method for resource scheduling is done very effectively.