[advances in intelligent systems and computing] intelligent computing, communication and devices...

12
A Multi-objective Cat Swarm Optimization Algorithm for Workflow Scheduling in Cloud Computing Environment Saurabh Bilgaiyan, Santwana Sagnika and Madhabananda Das Abstract As the world is progressing towards faster and more efficient computing techniques, cloud computing has emerged as an efficient and cheaper solution to such increasing and demanding requirements. Cloud computing is a computing model which facilitates not only the end-users but also organizational and other enterprise users with high availability of resources on demand basis. This involves the use of scientific workflows that require large amount of data processing, which can be costly and time-consuming if not properly scheduled in cloud environment. Various scheduling strategies have been developed, which include swarm-based optimization approaches as well. Due to the presence of multiple and conflicting requirements of users, multi-objective optimization techniques have become popular for workflow scheduling. This paper deals with cat swarm-based multi- objective optimization approach to schedule workflows in a cloud computing environment. The objectives considered are minimization of cost, makespan and CPU idle time. Proposed technique gives improved performance, compared with multi-objective particle swarm optimization (MOPSO) technique. Keywords Cloud computing Workflow scheduling Multi-objective cat swarm optimization (MOCSO) Cost minimization Makespan CPU idle time S. Bilgaiyan (&) S. Sagnika M. Das School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India e-mail: [email protected] S. Sagnika e-mail: [email protected] M. Das e-mail: [email protected] Ó Springer India 2015 L.C. Jain et al. (eds.), Intelligent Computing, Communication and Devices, Advances in Intelligent Systems and Computing 308, DOI 10.1007/978-81-322-2012-1_9 73

Upload: nikhil

Post on 21-Feb-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

A Multi-objective Cat SwarmOptimization Algorithm for WorkflowScheduling in CloudComputing Environment

Saurabh Bilgaiyan, Santwana Sagnika and Madhabananda Das

Abstract As the world is progressing towards faster and more efficient computingtechniques, cloud computing has emerged as an efficient and cheaper solution tosuch increasing and demanding requirements. Cloud computing is a computingmodel which facilitates not only the end-users but also organizational and otherenterprise users with high availability of resources on demand basis. This involvesthe use of scientific workflows that require large amount of data processing, whichcan be costly and time-consuming if not properly scheduled in cloud environment.Various scheduling strategies have been developed, which include swarm-basedoptimization approaches as well. Due to the presence of multiple and conflictingrequirements of users, multi-objective optimization techniques have becomepopular for workflow scheduling. This paper deals with cat swarm-based multi-objective optimization approach to schedule workflows in a cloud computingenvironment. The objectives considered are minimization of cost, makespan andCPU idle time. Proposed technique gives improved performance, compared withmulti-objective particle swarm optimization (MOPSO) technique.

Keywords Cloud computing �Workflow scheduling � Multi-objective cat swarmoptimization (MOCSO) � Cost minimization � Makespan � CPU idle time

S. Bilgaiyan (&) � S. Sagnika � M. DasSchool of Computer Engineering, KIIT University, Bhubaneswar, Odisha, Indiae-mail: [email protected]

S. Sagnikae-mail: [email protected]

M. Dase-mail: [email protected]

� Springer India 2015L.C. Jain et al. (eds.), Intelligent Computing, Communication and Devices,Advances in Intelligent Systems and Computing 308,DOI 10.1007/978-81-322-2012-1_9

73

1 Introduction

In the last two decades, online computing services have become more popular thantraditional offline computing services. Cloud computing has emerged as a dominantprocessing environment which has made it uncomplicated for demanding users toaccess their required services from among a wide variety of available and config-urable computing resources. It shifts computation and information from clientmachines to over the networks where cloud service providers are connected [1–3].

Scheduling of tasks on resources in a network-based computing environment,such as cloud computing, is always a challenging task. It is the process of allo-cating limited number of resources among a set of tasks that require the services ofthese resources. The main aim of scheduling is to minimize the cost and time oftask completion while maximizing the quality of services (QoS). Task schedulingon cloud is termed as a NP-complete problem because of dynamic nature of cloudenvironment [4, 5].

Most real-life scheduling applications require scheduling which satisfies mul-tiple objectives, which are generally contradictory. Hence, the requirement arisesto perform multi-objective scheduling that can satisfy such multiple objectives alltogether. For such problems, there is no solitary solution, but generally a group ofunique solutions that achieve trade-off between the objectives can be found out.These set of solutions are known as non-dominated solutions and are representedgraphically by Pareto front. Multi-objective workflow scheduling consists of a setof user-defined conflicting requirements [4, 6].

This paper aims to achieve optimization, considering the objectives as mini-mization of computation cost, makespan and CPU idle time. The authors proposean MOCSO technique to achieve scheduling. The proposed technique has beenimplemented, and the results have been compared with multi-objective particleswarm optimization (MOPSO) technique. This method MOCSO achieved optimalresults in lesser number of iterations than MOPSO. Good convergence is achievedas discussed in the results.

2 Related Work

Scientific workflows represent a set of computational tasks having dependenciesbetween them. Different applications are modelled as workflows for computation[7]. A major challenge is the allocation of these tasks in a manner to reduce theexecution time and cost. The size of relocating data and the associated overhead bothlead to a huge amount of data processing. So, various techniques have been discussedunder this section for scheduling workflows in cloud computing systems [8].

ACO, BCO, GA and PSO have been applied to solve scheduling strategies incloud systems that are market oriented. ACO was found to show the best per-formance among all [9].

74 S. Bilgaiyan et al.

A multi-objective PSO model has been designed to find an optimal solution,minimize task execution cost, transfer time and task execution time. It finds besttrade-off and also effectively utilizes cloud resources and improves QoS [10].

A heuristic algorithm for static workflow scheduling comprising energy con-sumption, makespan, reliability and economic cost is defined that performs betterthan bicriteria heuristic algorithms [11].

A PSO algorithm is proposed for batch processing workflow scheduling. In thisproposed technique, generational distance (GD) and execution time are taken asperformance measures where GD represents the proximity between obtainedsolution and actual Pareto solution. In this proposed work, a substitute deep memorywith particle swarm optimization (DMPSO) is implemented using dynamicgrouping and scheduling optimization (DGSO) and standard PSO. The experimentalresults show that DGSO gives better searching and average GD while the executiontime of the algorithm increases with the number of iterations [12, 13].

An artificial bee colony optimization algorithm is proposed for workflowscheduling in cloud computing environment. The proposed algorithm optimizesthe computation time and server utilization. The experiment is done usingCloudSim tool, and results are compared to existing GA. The proposed techniqueshows a better performance over the GA [14].

An ant colony optimization algorithm (ACO) is introduced for task schedulingin a cloud computing environment. The basic ACO is enhanced for minimizing theexecution time and makespan of the all scheduled tasks. The experiment is carriedout using CloudSim toolkit, and results show the better performance of extendedACO over existing simple ACO [15].

3 Workflow Design

The authors have characterized a general workflow as a directed acyclic graph(DAG). The DAG is denoted by G = (W, D). W represents a set of tasks in theworkflow, where W = {W1, W2, W3, … Wn}. D represents the dependenciesamong these tasks. These tasks need to be mapped on a set of available resourcesS = {S1, S2, S3, … Sn}, all geographically distributed throughout the world. Eachresource has its own storage/memory, designated as M = {M1, M2, M3, … Mn}.

The execution time and cost of every task on all resources are known as per ahypothetically assumed pricing policy, within range defined by some policies ofGogrid and Amazon Web Services. The cost of transferring data between tasks isalso known.

Figure 1 shows a sample workflow model having 14 tasks along with their in-terdependencies that the authors have used for experimenting. The tasks are repre-sented by W1, W2, …, W14, and their dependencies are shown as di,j, which means adependency on task Wj on Wi. In Fig. 2, the available 4 resources are depicted in theform of S1, …, S4 with the per unit data transfer cost between 2 any two resources Si,and Sj is denoted by tcSi,Sj. Size of transferred data is assumed to be constant.

A Multi-objective Cat Swarm Optimization Algorithm… 75

4 Multi-objective Scheduling Model

The authors represent the current multi-objective problem in the followingmathematical format.

T cost Sxð Þ ¼X

y

ðecyx þX

z

tcyzÞ; 8 Sch Wy

� �¼ Sx and Sch Wzð Þ 6¼ Sx ð1Þ

where T_cost(Sx) is the total cost of executing tasks assigned to resource Sx andtransmitting data to other tasks dependent on those tasks. ecyx is the cost ofexecuting task Wy on Sx. tcyz represents transmission cost between tasks Wy and

Fig. 1 A sample experimental workflow

Fig. 2 A sample resourcedistribution architecture withlocal storages

76 S. Bilgaiyan et al.

Wz, which are on different resources. Sch is a specific scheduling map, andSch(Wy) is the resource where Wy is being executed.

Max cost Schð Þ ¼ Max T cost Sxð Þð Þ; 8 Sx ð2Þ

where Max_cost(Sch) signifies that the tasks are fairly distributed over resources.

T time Sxð Þ ¼X

y

ttyx ð3Þ

where T_time(Sx) signifies total time of executing all tasks running on Sx. Here, ttgives the total running time of task Wy on Sx.

Makespan Schð Þ ¼ max T time Sxð Þð Þ; 8 Sx ð4Þ

where Makespan(Sch) indicates the total time between start and finish of totalschedule.

T idle Schð Þ ¼X

x

Makespan Schð Þ � Ttime Sxð Þ� �

ð5Þ

where T_idle(Sch) indicates the amount of time the resources remain idle, foundout by summation of the idle time of each resource Sx till all tasks are completed.

Minimize Max cost Schð Þ; Makespan Schð Þ; T idle Schð Þð Þ; 8 Sch ð6Þ

Equation 6 is the fitness function for this problem.

5 General CSO Algorithm

The common behaviour of cats in real world has inspired the development of anew swarm-based optimization technique known as cat swarm optimization(CSO), as proposed in 2007 by Chu and Tsai. The activity of cats in generalincludes spending maximum time resting but alert, with slow and calculatedmethods (Seeking mode) else while chasing targets, they move with high velocity,converging towards the target (Tracing mode) [16–18]. The CSO optimizationtechnique makes use of this behaviour of cats to search complex solution spacesfor optimal solutions, using an initial population of cats that are randomly dividedinto seeking and tracing modes, as per a defined mixture ratio (MR). The catsmove closer to solutions by updating best results in the memory. This processcontinues iteratively by redistributing cats into either mode each time, till all catsachieve the best solution. A universal CSO algorithm can be described as follows[19–21].

A Multi-objective Cat Swarm Optimization Algorithm… 77

6 Proposed Algorithm

The authors put forward an approach that utilizes this CSO technique to addressthe multi-objective scheduling problem as described in Sect. 4. The different tasksof a workflow are represented by dimensions, and each cat denotes a schedulebetween the set of tasks and available resources, and all cats follow the steps ofCSO till the best schedule is achieved or termination criteria is reached. Here, bestschedule refers to a set of non-dominated optimal schedules that achieve balancebetween the required objectives, namely computation cost, makespan and CPUidle time. These various solutions form a graph which is known as Pareto front.

6.1 Seeking Mode

Seeking mode represents the resting condition of cats, wherein they remain alertand look around to their surroundings. This mode uses certain parameters thatdetermine a cat’s behaviour. Those are the following:

• Seeking memory pool (SMP)—The number of replicas for each cat that are tobe made. One among them will replace the original cat later.

• Count of dimension to change (CDC)—The number of allotments in each copythat will be modified. Each copy will be evaluated, and one of the best will beselected for replacement.

When a cat is in seeking mode, it performs the following steps.

CSO algorithm

1. Generate N cats over required number of dimensions D

2. Allocate random velocities to all cats

3. Randomly distribute cats to seeking and tracing modes as per defined MR

4. Calculate fitness of all cats and memorize the non-dominated cats

5. For each cat, if cat is in seeking mode, perform seeking mode operations, else perform tracingmode operations on it and move it to its new position

6. If termination condition is not satisfied, goto step 3, else stop

Seeking mode steps

1. Generate replicas of the cat as per SMP

2. Randomly change dimensions of each replica as per CDC

3. Assess fitness values of all replicas

4. Find the best non-dominated replicas

5. Substitute the cat with a randomly selected non-dominated replica

78 S. Bilgaiyan et al.

6.2 Tracing Mode

Tracing mode depicts movement of cats towards targets with high velocity, whilespending a high amount of energy. In tracing mode, a cat performs the followingsteps.

Figure 3 represents the complete algorithm flow.

7 Input Data for Experiment

The authors have assumed a hypothetical workflow having 14 tasks to be dis-tributed over 4 resources that are located in different countries across the world, asillustrated previously. The cost of execution and communication is presumed onthe basis of pricing policies followed by some well-known service providers, i.e.Amazon Web Services, Mosso, Gogrid, etc. The execution time for each task foreach resource is presumed as suitable for experimenting. The experiments havebeen performed, and resultant graphs were generated using MATLAB as a tool.Following are the input data tables, where Table 1 represents the execution timefor tasks on resources, Table 2 gives communication costs between differentresources and Table 3 denotes execution time of each task on each resource.

8 Results and Discussion

The authors have performed various experiments using the proposed algorithm onMATLAB tools and have compared the results with an existing MOPSO algorithmfor the same problem. The population size was preset at 50 while the number ofiterations has been varied from 100 to 300. The following results have been obtained.

Tracing mode steps

1. Find the new velocity Vi dtþ1 for cat i by using the formula

Vi dtþ1 ¼ w�Vi d

t þ c�r� Xbest d � Xi dtð Þ

where Vi dt is the velocity at tth iteration, Xi d

t is the position in tth iteration, Xbest_d is thecurrent global best position in dth dimension, c represents a constant and r is a randomnumber between 0 and 1

(7)

2. Change cats to a new position to the next best position Xi dtþ1 by adding the new

velocity as perXi d

tþ1 ¼ Xi dt þ Vi d

tþ1(8)

3. Limit the updated position of cat within desired range

4. Calculate fitness of all cats

5. Update the set of solutions with non-dominated cats

A Multi-objective Cat Swarm Optimization Algorithm… 79

8.1 Output Graphs Generated by MATLAB

See Figs. 4, 5, and 6.

8.2 Analysis of Resultant Graphs

Figures 4, 5, and 6 show pairs of graphs where in each figure, (a) represents Paretofront obtained by MOPSO and (b) represents Pareto front obtained by MOCSO fora particular number of iterations. By comparing the graph pairs, it can be observedthat MOCSO approach generally provides more number of solutions than MOPSOfor a fixed number of iterations, which are also closer to and better distributed overthe Pareto front. On analysing the results for different iterations, it can be seen thatMOCSO achieves good convergence and faster attainment of a convex Paretofront as compared with MOPSO. The rationale behind better performance of

Start

Create & randomly initialize cat population

Randomly assign seeking & tracing modes to cats

Calculate fitness for all cats, store non-dominated cats

Is cat in tracing mode?

Generate copies & modify them

Calculate fitness for each copy

Substitute current cat with one among the non-dominated copies

Calculate new velocity

Add velocity to change position

Calculate new fitness for all cats, update in memory

Is termination condition attained?

Stop

NoYes

No

Yes

Fig. 3 Basic steps of MOCSO

80 S. Bilgaiyan et al.

MOCSO can be attributed to the fact that it exhibits intelligent updating ofpositions rather than the random updating mechanism followed by MOPSO, whichsaves energy and hence increases speed and efficiency of reaching at bestsolutions.

9 Conclusion and Future Scope

This paper has presented a new and more efficient approach to use a swarm-basedtechnique to solve the multi-objective scheduling problem in a cloud computingenvironment. This technique has proved to be faster and highly convergent overexisting MOPSO technique, which is already among the dominant optimizationmethods. This is owing to the smart mechanism of position updating in MOCSOthat reduces unnecessary energy expenditure and moves closer towards the

Table 1 Execution costmatrix (in cents)

S1 S2 S3 S4

T1 1.56 1.59 1.59 1.72

T2 1.83 2.01 2.24 2.14

T3 1.65 1.68 1.71 1.82

T4 2.21 2.00 2.33 2.48

T5 2.11 2.03 1.98 1.87

T6 1.79 2.19 1.63 1.50

T7 1.50 1.57 1.88 1.72

T8 2.50 2.03 2.16 2.42

T9 1.57 1.93 1.82 1.75

T10 2.19 2.03 2.25 2.36

T11 1.50 1.83 1.76 2.23

T12 2.31 2.98 2.50 2.27

T13 1.52 1.93 1.61 1.74

T14 2.39 2.14 2.04 1.96

Table 2 Communicationcost matrix (in cents/MB)

S1 S2 S3 S4

S1 0 0.12 0.17 0.07

S2 0.12 0 0.20 0.11

S3 0.17 0.20 0 0.13

S4 0.07 0.11 0.13 0

A Multi-objective Cat Swarm Optimization Algorithm… 81

Table 3 Execution timematrix (in hours)

S1 S2 S3 S4

T1 0.51 0.44 0.31 0.44

T2 0.93 0.84 0.46 0.84

T3 0.75 0.66 0.52 0.66

T4 1.20 1.15 1.11 1.15

T5 1.08 0.92 0.78 0.92

T6 0.63 0.59 0.52 0.59

T7 0.32 0.24 0.10 0.24

T8 0.91 0.78 0.62 0.78

T9 0.74 0.66 0.91 0.66

T10 0.42 0.36 0.23 0.36

T11 0.55 0.50 0.40 0.50

T12 0.88 0.72 0.53 0.72

T13 0.61 0.54 0.49 0.54

T14 1.16 0.98 0.81 0.98

Fig. 4 Optimal results for 100 iterations a for MOPSO b for MOCSO

Fig. 5 Optimal results for 200 iterations a for MOPSO b for MOCSO

82 S. Bilgaiyan et al.

solution in each iteration. Future scope in this field can consist of improving therunning time of this algorithm and involving more number of real-time conflictingobjectives, which the authors believe can be effectively handled by this technique.Finding competent solutions to the workflow scheduling problem on cloud canincrease the QoS and extend the reach of cloud computing over even morebusiness and enterprise areas.

References

1. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emergingIT platforms: vision, hype, and reality for delivering computing as the 5th utility. FutureGener. Comput. Syst. 25, 599–616 (2009)

2. Dikaiakos, M.D., Pallis, G., Katsaros, D., Mehra, P., Vakali, A.: Distributed internetcomputing for IT and scientific research. IEEE Internet Comput. 13, 10–13 (2009)

3. Chaisiri, S., Lee, B.S., Niyato, D.: Optimization of resource provisioning cost in cloudcomputing. IEEE Trans. Serv. Comput. 5, 164–177 (2012)

4. Fard, H.M., Prodan, R., Fahringer, T.: A truthful dynamic workflow scheduling mechanismfor commercial multicloud environments. IEEE Trans. Parallel Distrib. Syst. 24, 1203–1212(2013)

5. Jangra, A., Saini, T.: Scheduling optimization in cloud computing. Int. J. Adv. Res. Comput.Sci. Softw. Eng. 3, 62–65 (2013)

6. Wang, X.J., Zhang, C.Y., Gao, L., Li, P.G.: A survey and future trend of study on multi-objective scheduling. In: Fourth IEEE International Conference on Natural Computation,pp. 382–391 (2008)

7. Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M.,Moreau, L., Myers, J.: Examining the challenges of scientific workflows. IEEE Compu. Soc.40, 24–32 (2007)

8. Szabo, C., Sheng, Q.Z., Kroeger, T., Zhang, Y., Yu, J.: Science in the cloud: allocation andexecution of data-intensive scientific workflows. J. Grid Comput. (2013)

9. Singh, L., Singh, S.: A survey of workflow scheduling algorithms and research issues. Int.J. Comput. Appl. 0975–8887(74), 21–28 (2013)

10. Ramezani, F., Lu, J., Hussain, F.: Task Scheduling Optimization in Cloud ComputingApplying Multi-objective Particle Swarm Optimization. Service-Oriented Computing.Lecture Notes in Computer Science 8274, pp. 237–251. Springer, Berlin (2013)

Fig. 6 Optimal results for 300 iterations a for MOPSO b for MOCSO

A Multi-objective Cat Swarm Optimization Algorithm… 83

11. Fard, H.M., Prodan, R., Barrionuevo, J.J.D., Fahringer, T.: A Multi-objective approach forworkflow scheduling in heterogeneous environments. In: 12th IEEE/ACM InternationalSymposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 300–309 (2012)

12. Wen, Y., Chen, Z., Chen, T., Liu, J., Kang, G.: A particle swarm optimization algorithm forbatch processing workflow scheduling. In: Second IEEE International Conference on Cloudand Green Computing, pp. 645–649 (2012)

13. Shi, Y.H., Eberhart, R.: A modified particle swarm optimizer. In: Proceedings of the IEEEInternational Conference on Evolutionary Computation, pp. 63–69 (1998)

14. Kumar, P., Anand, S.: An approach to optimize workflow scheduling for cloud computingenvironment. J. Theor. Appl. Inf. Technol. 57, 617–623 (2013)

15. Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.: An Ant Algorithm for cloud taskscheduling. In: Proceedings of International Workshop on Cloud Computing and InformationSecurity (CCIS), pp. 169–172 (2013)

16. Shojaee, R., Faragardi, H.R., Alaee, S., Yazdani, N.: A new cat swarm optimization basedalgorithm for reliability-oriented task allocation in distributed systems. In: 6th IEEEInternational Symposium on Telecommunications, pp. 861–866 (2012)

17. Sharafi, Y., Khanesar, M.A., Teshnehlab, M.: Discrete binary cat swarm optimizationalgorithm. In: 3rd IEEE International Conference on Computer, Control & Communication(IC4), pp. 1–6 (2013)

18. Chu, S.C., Tsai, P.W.: Computational intelligence based on the behavior of cats. Int. J. Innov.Comput. Inf. Control 3, 163–173 (2007)

19. Pradhan, P.M. Panda, G.: Solving multiobjective problems using cat swarm optimization.Expert Syst. Appl. 39, 2956–2964 (2011)

20. Santosa, B., Ningrum, M.K.: Cat Swarm optimization for clustering. In: IEEE InternationalConference on Soft Computing and Pattern Recognition, pp. 54–59 (2009)

21. Tsai, P.W., Pan, J.S., Chen, S.M., Liao, B.Y., Hao, S.P.: Parallel cat swarm optimization. In:Proceedings of the Seventh International Conference on Machine Learning and Cybernetics,Kunming, pp. 3328–3333 (2008)

84 S. Bilgaiyan et al.