[ieee 2013 ieee 6th international conference on cloud computing (cloud) - santa clara, ca...

8
Smart CloudBench - Automated Performance Benchmarking of the Cloud Mohan Baruwal Chhetri, Sergei Chichin, Quoc Bao Vo, Ryszard Kowalczyk Faculty of Information & Communication Technologies Swinburne University of Technology, Melbourne, Australia {mchhetri, schichin, bvo, rkowalczyk}@swin.edu.au Abstract—As the rate of cloud computing adoption grows, so does the need for consumption assistance. Enterprises that are looking to migrate their IT systems to the cloud, would like to quickly identify providers that offer resources with the most appropriate pricing and performance levels to match their specific business needs. However, no two vendors offer the same resource configurations, pricing and provisioning models, making the task of selecting appropriate computing resources complex, time-consuming and expensive. In this paper, we present Smart CloudBench - a platform that automates the performance benchmarking of cloud infrastructure, helping potential consumers quickly identify the cloud providers that can deliver the most appropriate price/performance levels to meet their specific requirements. Users can estimate the actual performance of the different cloud platforms by testing rep- resentative benchmark applications under representative load conditions. Experimentation using the prototype implementa- tion shows that higher price does not necessarily translate to better or more consistent performance, and benchmarking results can provide more information to help enterprises make better informed decisions. Keywords-Cloud computing, Benchmarking, Cloud Infras- tructure, Cloud Performance, Performance Evaluation, Auto- mated Benchmarking I. I NTRODUCTION In recent years, cloud computing has emerged as a major disruptive technology that has changed the way in which enterprises procure and consume computing re- sources. There has been an exponential growth in the number of Infrastructure-as-a-Service (IaaS) vendors and their offerings, with a corresponding increase in the number of enterprises looking to migrate some, or all of their IT systems to the cloud. However, as the rate of cloud computing adoption grows, so does the need for consump- tion assistance. Different providers offer different resource configurations and use different pricing and provisioning models, and, while information about pricing levels and supported resource configurations are publicly available, there is limited information about the resource performance levels. This is important for organizations looking for first opportunities to migrate their in-house IT systems to the cloud. They would like to obtain a quick assessment of the price/performance levels of different IaaS providers before making any migration decisions. A naive approach to do that would be to deploy their own application on the target platforms, benchmark it with the range of possible workloads, measure the actual per- formance and analyse the test results. However, such an approach is complex, time-consuming and expensive, and very few organizations possess the time, resources and in- house expertise to do a thorough and proactive evaluation on multiple cloud platforms. A practical alternative is to test representative applications against representative workloads to estimate the performance of different cloud providers. For example, the representative application for an e-commerce website would be TPC-W, which is a transactional web e-commerce benchmark, and the representative workload would be the estimated or measured amount of concur- rent requests to the website. The benchmarking results of representative benchmarks could be used to quantify the application performance on the different IaaS platforms and to obtain valuable insights into the difference in performance across providers. By combining the benchmark results with pricing information, enterprises can better identify the most appropriate cloud providers and offerings based on their specific business needs. In this paper, we present the Smart CloudBench plat- form, a system that enables the measurement of infrastruc- ture performance in an efficient, quick and cost-effective manner, through the automated execution of representative benchmarks on multiple IaaS clouds to measure their per- formance levels under different workload conditions. The Smart CloudBench offers an extensible suite of benchmark applications corresponding to the most common types of ap- plications hosted on the cloud 1 . Prospective cloud consumers can use Smart CloudBench to (i) select the representative application/s to use for evaluating cloud performance (ii), configure the test harness (iii), select and acquire instances on the cloud platforms to be tested (iv), launch the tests, and (v) gather and use the results to build a price/performance matrix that can help with decision-making for provider and resource selection. 1 Today, the cloud is increasingly being used for hosting web applications, high performance computing applications, social networks, media streaming applications and for the development and testing of complex enterprise applications. Ideally, there will be a benchmark application corresponding to each type of application in the suite of benchmark application. 2013 IEEE Sixth International Conference on Cloud Computing 978-0-7695-5028-2/13 $26.00 © 2013 IEEE DOI 10.1109/CLOUD.2013.7 414

Upload: ryszard

Post on 19-Feb-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Smart CloudBench - Automated Performance Benchmarking of the Cloud

Mohan Baruwal Chhetri, Sergei Chichin, Quoc Bao Vo, Ryszard Kowalczyk

Faculty of Information & Communication TechnologiesSwinburne University of Technology,

Melbourne, Australia{mchhetri, schichin, bvo, rkowalczyk}@swin.edu.au

Abstract—As the rate of cloud computing adoption grows,so does the need for consumption assistance. Enterprises thatare looking to migrate their IT systems to the cloud, wouldlike to quickly identify providers that offer resources withthe most appropriate pricing and performance levels to matchtheir specific business needs. However, no two vendors offer thesame resource configurations, pricing and provisioning models,making the task of selecting appropriate computing resourcescomplex, time-consuming and expensive. In this paper, wepresent Smart CloudBench - a platform that automates theperformance benchmarking of cloud infrastructure, helpingpotential consumers quickly identify the cloud providers thatcan deliver the most appropriate price/performance levels tomeet their specific requirements. Users can estimate the actualperformance of the different cloud platforms by testing rep-resentative benchmark applications under representative loadconditions. Experimentation using the prototype implementa-tion shows that higher price does not necessarily translateto better or more consistent performance, and benchmarkingresults can provide more information to help enterprises makebetter informed decisions.

Keywords-Cloud computing, Benchmarking, Cloud Infras-tructure, Cloud Performance, Performance Evaluation, Auto-mated Benchmarking

I. INTRODUCTION

In recent years, cloud computing has emerged as a

major disruptive technology that has changed the way

in which enterprises procure and consume computing re-

sources. There has been an exponential growth in the

number of Infrastructure-as-a-Service (IaaS) vendors and

their offerings, with a corresponding increase in the number

of enterprises looking to migrate some, or all of their

IT systems to the cloud. However, as the rate of cloud

computing adoption grows, so does the need for consump-

tion assistance. Different providers offer different resource

configurations and use different pricing and provisioning

models, and, while information about pricing levels and

supported resource configurations are publicly available,

there is limited information about the resource performance

levels. This is important for organizations looking for first

opportunities to migrate their in-house IT systems to the

cloud. They would like to obtain a quick assessment of the

price/performance levels of different IaaS providers before

making any migration decisions.

A naive approach to do that would be to deploy their

own application on the target platforms, benchmark it with

the range of possible workloads, measure the actual per-

formance and analyse the test results. However, such an

approach is complex, time-consuming and expensive, and

very few organizations possess the time, resources and in-

house expertise to do a thorough and proactive evaluation

on multiple cloud platforms. A practical alternative is to test

representative applications against representative workloads

to estimate the performance of different cloud providers. For

example, the representative application for an e-commerce

website would be TPC-W, which is a transactional web

e-commerce benchmark, and the representative workload

would be the estimated or measured amount of concur-

rent requests to the website. The benchmarking results of

representative benchmarks could be used to quantify the

application performance on the different IaaS platforms and

to obtain valuable insights into the difference in performance

across providers. By combining the benchmark results with

pricing information, enterprises can better identify the most

appropriate cloud providers and offerings based on their

specific business needs.

In this paper, we present the Smart CloudBench plat-

form, a system that enables the measurement of infrastruc-

ture performance in an efficient, quick and cost-effective

manner, through the automated execution of representative

benchmarks on multiple IaaS clouds to measure their per-

formance levels under different workload conditions. The

Smart CloudBench offers an extensible suite of benchmark

applications corresponding to the most common types of ap-

plications hosted on the cloud1. Prospective cloud consumers

can use Smart CloudBench to (i) select the representative

application/s to use for evaluating cloud performance (ii),

configure the test harness (iii), select and acquire instances

on the cloud platforms to be tested (iv), launch the tests, and

(v) gather and use the results to build a price/performance

matrix that can help with decision-making for provider and

resource selection.

1Today, the cloud is increasingly being used for hosting web applications,high performance computing applications, social networks, media streamingapplications and for the development and testing of complex enterpriseapplications. Ideally, there will be a benchmark application correspondingto each type of application in the suite of benchmark application.

2013 IEEE Sixth International Conference on Cloud Computing

978-0-7695-5028-2/13 $26.00 © 2013 IEEE

DOI 10.1109/CLOUD.2013.7

414

The key benefits of using the Smart CloudBench are:

• Reduced time and effort involved in benchmarking

cloud platforms. If the number of cloud instances to

benchmark is high, and the number of representa-

tive applications is large, then manually executing the

benchmarking process becomes a very cumbersome

exercise.

• Reduced cost of performance testing. Since the cloud

resources to be tested can be commissioned just in time

and decommissioned immediately after completion of

the tests, there are significant cost savings.

• Minimised human error due to simplified repetition of

the benchmarking process. While the initial investment

is large, subsequent executions become easy and quick.

• Reports autogeneration based on the test results for

consumption by non-technical audiences.

• Centralised storage of performance data over time

which enables analysis of performance evolution.

• Benchmarking as a Service (BaaS) - the entire process

of performance benchmarking of cloud infrastructure is

offered as a service.

The rest of this paper is organised as follows. In Section

II we summarize the related work. In Section III we give

an overview of cloud performance benchmarking, followed

by a description of our proposed approach in Section IV.

In Section V we describe the experimental environment

used to validate the usefulness of Smart CloudBench. We

discus the results of the experiments in Section VI and make

conclusions and identify future research in Section VII.

II. RELATED WORK

There has been significant research activity on the mea-

surement and characterization of cloud infrastructure perfor-

mance to enable decision support for provider and resource

selection.

In [1][2], the authors present CloudCmp, a framework to

compare cloud providers based on the performance of the

various components of the infrastructure including compu-

tation, scaling, storage and network connectivity. The same

authors present the CloudProphet tool [3] to predict the

end-to-end response time of an on-premise web application

when migrated to the cloud. The tool records the resource

usage trace of the application running on-premise and then

replays it on the cloud to predict performance. In [4], the

authors present CloudSuite, a benchmark suite for emerging

scale-out workloads. While most work on cloud performance

looks at the performance bottlenecks at the application

level [1][2][8], this work focusses on analysing the micro-

architecture of the processors used.

In [11], the authors propose CloudRank-D, a benchmark

suite for benchmarking and ranking the performance of

cloud computing systems hosting big data applications. The

main difference between CloudRank-D and our work is that

CloudRank-D specifically targets big-data applications while

our framework applies to any application. In [6] and [7], the

authors present their results on the analysis of resource usage

from the service provider and service consumer perspectives.

They study two models for resource sharing - the t-shirt

model and the time-sharing model. While we look at the

performance of the different cloud providers from a cloud

consumer’s perspective, the resource usage results can be

included as part of the benchmarking results to highlight the

resource usage under different load conditions. The resource

usage levels could also potentially affect the resource and

provider selection process.

In [10], the authors propose a methodology and process to

implement custom tailored benchmarks for testing different

cloud providers. Using this methodology, any enterprise

looking to examine the different cloud service offerings

can manually go through the process of selecting providers,

selecting and implementing (if necessary) a benchmark

application, deploying it on multiple cloud resources, per-

forming the tests and recording the results. Evaluation is

done at the end of the tests. Our work differs in that it

offers prospective cloud consumers with a service to do

all of this without having to go through the entire process

of setup. Additionally, it gives users the flexibility to try

out different what-if scenarios to get additional information

about the performance.

In [15], the authors discuss the IaaS cloud-specific ele-

ments of benchmarking from the user’s perspective. They

propose a generic approach for IaaS cloud benchmarking

which supports rapidly changing black box systems, where

resource and job management is provided by the testing

infrastructure and tests can be conducted with complex

workloads. Their tool SkyMark provides support for mi-

cro performance benchmarking in the context of multi-

job workloads based on the MapReduce model. In [16],

the authors provide a theoretical discussion on what cloud

benchmarking should, can and cannot be. They identify the

actors involved in cloud benchmarking and analyse a number

of use cases where benchmarking can play a significant

role. They also identify the challenges of building scenario-

specific benchmarks and propose some solutions to address

them.

In [17], the authors present the Cloud Architecture Run-

time Evaluation (CARE) framework for evaluating cloud ap-

plication development and runtime platforms. Their frame-

work includes a number of pre-build, pre-configured and

reconfigurable components for conducting performance eval-

uations across different target platforms. The key difference

between CARE and our work is that while CARE looks at

micro performance benchmarking, we look at performance

benchmarking across the complete application stack.

III. OVERVIEW

In this Section, we give a brief overview of performance

benchmarking of cloud infrastructure. In the IaaS service

415

model, the service provider gives consumers the capability to

provision processing, storage, network and basic computing

resources on demand. While the consumer has control over

the operating system, assigned storage and the deployed

applications, it has no control over the underlying cloud

infrastructure. When a client requests and receives virtual

machines from a cloud provider, it perceives the provisioned

resource as a black-box whose run-time behaviour is un-

known.

Therefore, there is a need for tools and techniques to

measure the actual performance of the computing resources

offered by different cloud providers. One way to do this

is through benchmarking, which is a traditional approach

for verifying that the performance of a system meets the ex-

pected levels. In our current work, we look at benchmarking

from the consumers perspective i.e a black box view of cloud

performance, when the tester has limited or no knowledge

of the underlying hardware specifications2.

There are two ways to benchmark the cloud infrastructure:

application stack benchmarking, and micro benchmarking.

We focus on benchmarking of the entire application stack

instead of looking at individual services, like IO, CPU or

RAM performance. While a set of micro benchmarks can

offer a good starting point in evaluating the performance

of the server, application stack benchmarking offers more

customer-specific results and is easier to understand by non-

technical audience. Thus, if prospective consumers can find

representative benchmarks for their in-house applications,

they can design experiments to match the internal load levels

and load variations, and then test the representative applica-

tion to estimate how the different clouds are performance

wise and cost/performance wise. By doing representative

performance benchmarking, consumers can quickly assess

multiple cloud providers and their offerings in an objective,

consistent and fully automated manner without having to

deploy their own applications on the various cloud platforms.

Consumers can conduct performance benchmarking be-

fore migrating to the cloud to determine whether selected

providers offer appropriate price/performance levels. Once

they have selected a particular cloud provider and migrated

their in-house applications, they can continue to benchmark

the provided infrastructure to ensure that there is no degra-

dation of the performance over time.

IV. SMART CLOUDBENCH SYSTEM

In this Section, we present the main features and reference

architecture of Smart CloudBench. Smart CloudBench is a

configurable, extensible and portable system for the auto-

mated benchmarking of cloud infrastructure performance.

We start with an overview of the system in Section IV-A

followed by a description of the benchmarking process in

Section IV-B.

2Different providers use different virtualization techniques to provisionresources which can affect the performance significantly [12][14]

Figure 1: Smart CloudBench Architecture

A. Overview

The main components of the Smart CloudBench system,

presented in Figure 1, include:

1) Cloud Comparator (CC): This module allows users

to compare different cloud provider offerings based on their

specific requirements in terms of cost, geographic location,

infrastructure requirements etc. This module helps users

shortlist potential candidates for benchmarking.

2) Benchmark Orchestrator (BO): This is the main mod-

ule of the Smart CloudBench system. It orchestrates the

automated performance benchmarking of IaaS clouds. It

controls the entire process including benchmark selection,

provider selection, workload description, resource manage-

ment, workload generation, workload execution and result

collection. It automates all the tasks that would be manually

carried out in a normal benchmarking exercise.

3) Cloud Manager (CM): The cloud manager module

performs fundamental cloud resource management. It re-

ceives resource provisioning instructions from the Bench-

mark Orchestrator, based on which it procures appropriate

instances on the different providers - both for the System

Under Test (SUT) and the Test Agents (TA). It is responsible

for the decommissioning of the instances at the end of

each test. It uses the cloud purchaser module which we

have previously developed [18][19] to procure resources

according to user constraints regarding test completion time

and available budget.

4) Result Analyser (RA): This module collects the results

of benchmark tests, and delivers the performance results both

graphically and as textual reports.

5) Cloud Interface (CI): This module provides interfaces

to the different IaaS providers to enable automated manage-

ment of cloud instances including instantiation and termina-

tion before and after the execution of the benchmarks.

6) Provider & Benchmark Catalogs: The Smart Cloud-

Bench maintains a catalog of the different IaaS providers

416

Figure 2: Smart CloudBench Workflow

and their offerings. It also maintains a catalog of supported

benchmarks for the different types of representative appli-

cations.

B. Benchmarking Process

The steps involved in executing a typical benchmark using

the Smart CloudBench are depicted in Figure 2 with the

screen shots of the User Interface in Figure 3.

1) Provider Selection: (Figure 3a) As a first step, the user

selects the specific cloud providers and resource configura-

tions to test. This selection could be done based on user

requirements, which could include resource configuration,

cost, geographic location, supported operating systems etc.

2) Benchmark Selection: (Figure 3b) The user then se-

lects the representative benchmark application/s among pro-

posed.

3) Workload Selection: (Figure 3b) The user can de-

fine different scenarios to be tested against the selected

benchmark on the shortlisted cloud providers. The request

(comprising of selected benchmark/s, scenarios to test, and

cloud resources to be tested on) is submitted to the BO.

4) Instance Procurement: Upon receiving the bench-

marking request, the BO procures the required server in-

stances from the selected providers. Technically, the back-

end system engine generates the requests to the required

cloud providers APIs in order to launch the VMs of speci-

fied type in required location with pre-built images, which

contain the packaged applications, to be used to start up the

SUT and the TA. Different rules could be used to procure

these instances depending upon the request context - e.g.

available time or available budget.

5) Benchmark Execution: The BO then executes the

benchmark by issuing remote calls to the web service that

operates on newly started cloud machines and waits for the

benchmark results to be returned to it.

(a) Provider and Instances Selection

(b) Benchmark & Workload Selection

(c) Results Visualisation: Test Summary Report

Figure 3: Smart CloudBench UI

6) Result Collection: The TAs return the benchmarking

results to BO, as a result of web service invocation.

7) Report Generation: BO generates reports based on

returned results. Basically, these reports represent the col-

lection of formatted benchmarking data together with the

static data about cloud providers prices.

8) Report Visualisation: (Figure 3c) Generated reports

are pushed back to the user and the data is visualised as

graphs and data tables to help the user in analysis and

decision-making process.

9) Instance Decommissioning: Once the tests have been

completed, the BO decommissions the instances that were

started up for conducting the tests by issuing the call to

Cloud Providers APIs.

V. EXPERIMENTAL ENVIRONMENT

In this section, we describe the experimental environ-

ment used to validate the usefulness of Smart CloudBench.

417

The representative benchmark that we have used in our

experiments is TPC-W[22], which is an e-commerce appli-

cation. This kind of application is the most popular type

of application running on the cloud and its behaviour is

relatively simple and well understood. To achieve diversity

and comprehensiveness in our experiments we have tested

TPC-W on a representative set of resources. We describe

the experimental setup and the measured metrics here. We

present the results of the experiments in the next section.

A. TPC-W Benchmark

The TPC-W application models an online bookstore

which is representative of a typical enterprise web appli-

cation. It includes a web server to render the web pages, an

application server to execute business logic, and a database

to store application data. It is designed to test the complete

application stack and does not make any assumptions on the

technologies and software systems used in each layer. The

benchmark consists of two parts. The first part is the TPC-

W application which supports a mix of 14 different web

interactions and three workload mixes, including searching

for products, shopping for products and ordering products.

The second part is the remote browser emulation (RBE)

system which generates the workload to test the application.

One RBE emulates a single customer and simulates the same

HTTP network traffic as would be seen by a real customer

using the browser.

There are certain inherent characteristics of the TPC-

W benchmark that we have inherited because we use a

Java implementation of TPC-W application that is available

online at http://www.cs.virginia.edu/∼th8k/downloads/. Each

Provider Instance Type Code Price($/hr) CPU Model Memory (GB)Amazon EC2 TPC-W Server m1.small S1 0.096 1 EC2 1.7(US-West N. California) m1.medium S2 0.192 2 EC2 3.75

m1.large S3 0.384 4 EC2 7.5m1.xlarge S4 0.768 8 EC2 15m2.xlarge S5 0.560 6.5 EC2 17.1m2.2xlarge S6 1.120 13 EC2 34.2m2.4xlarge S7 2.240 26 EC2 68.4c1.medium S8 0.245 5 EC2 1.7c1.xlarge S9 0.980 20 EC2 7

GoGrid TPC-W Server Medium S10 0.160 2 Core 2(US-West 1) Large S11 0.32 4 Core 4

X-Large S12 0.64 8 Core 8XX-Large S13 1.28 8 Core 16XXX-Large S14 1.92 8 Core 24

Rackspace TPC-W Server 1GB S15 0.08 1 Core 1(Chicago) 2GB S16 0.16 2 Core 2

4GB S17 0.32 2 Core 48GB S18 0.58 4 Core 815GB S19 1.08 6 Core 1530GB S20 1.56 8 Core 30

Table I: Configurations of the instances used in the benchmarking experiments (prices correct on the 22nd of April 2013)

Code Avg Responce Time Max Response Time Successful Interactions Timeouts Stnd. Devialtion of ART100 500 1000 100 500 1000 100 500 1000 100 500 1000 100 500 1000

S1 2213 8583 8123 15361 23681 24353 1024 906 1174 23 1354 3253 345 470 654S2 202 6082 6481 3277 24205 24358 1401 1741 2068 2 988 2807 76 344 230S3 64 4768 6041 1240 23301 24472 1425 2413 2703 3 780 2490 23 203 155S4 49 99 3669 600 1879 21452 1438 7103 7558 2 13 730 2 18 305S5 41 755 3724 613 8560 21501 1426 6431 7640 2 38 837 12 553 1726S6 34 42 1256 321 1028 15491 1423 7146 11814 2 13 104 1 6 359S7 36 43 52 402 1057 1670 1435 7099 14312 3 13 26 5 13 13S8 57 4670 5869 732 23387 24453 1429 2896 3043 2 572 2328 6 234 293S9 56 100 2737 735 1534 19798 1426 7095 9039 3 13 451 3 53 516S10 64 4427 5955 707 23574 24409 1420 3019 3329 0 587 3034 19 462 359S11 54 133 3552 617 2164 21794 1437 7069 7737 0 0 709 7 72 495S12 51 68 741 428 723 14898 1431 7128 12405 0 0 243 2 4 381S13 93 2957 3778 1495 11020 17650 1418 4443 7395 0 499 1498 62 2973 3248S14 57 154 2084 483 3490 18654 1435 7054 9963 0 2 498 6 152 1240S15 236 6989 7076 4989 23976 24130 1385 1558 1840 1 1063 2942 106 229 485S16 97 5825 6827 1438 23686 23930 1431 2286 2823 0 749 2344 31 430 163S17 79 5492 6618 998 23690 23909 1446 2494 2960 0 667 2307 14 242 211S18 64 736 4531 573 8547 22426 1418 6474 6330 0 0 988 2 135 236S19 60 89 2878 509 1405 18358 1433 7113 9187 0 0 323 3 13 252S20 60 71 1299 418 834 15817 1443 7114 11727 0 0 138 2 1 230

Table II: Test Results

418

benchmark cycle runs for 2 minutes. During this period, the

TPC-W client generates a random number of simultaneous

requests to the server, depending on the specified number of

RBEs. A single RBE can request only one web-page at a

time. The client also simulates the waiting time between

the browsing sessions of each emulated user. The server

responds to the requests of the client by generating the

corresponding web-pages. In case the request time exceeds

25 seconds, the request is dropped by timeout. The total

number of requests that fit in a single benchmarking cycle

varies depending on the response time. If the server cannot

cope with the workload, the average response time and the

number of timeouts will be high. In such case the number

of generated requests will be lower, than when the server

is capable of handling the generated workload and responds

faster to the incoming requests.

B. Experimental Setup

All our experiments were conducted on three cloud

providers - Amazon Elastic Compute Cloud (EC2), GoGrid

Cloud Hosting, and Rackspace Hosting & Cloud. Resources

on all providers were provisioned in United States (N.

California for Amazon, San Francisco for GoGrid, Chicago

for Rackspace). We have implemented the cloud interface

by using the JClouds API, which currently supports 17 IaaS

providers with datacenters in more than 30 geographical

regions. Three different workloads of 100, 500 and 1000

RBEs were used to test the TPC-W application on 20

different types of cloud instances (refer to Table I). In this

experiments we have used the TA (client), provided by TPC-

W. The cloud instance selected for running the TPC-W client

on EC2 was m1.medium, Large on GoGrid, and 2GB on

Rackspace for all tests. The scenario chosen for the test

was page browsing with the property get-images set to false

[22]. All tests on EC2 and Rackspace were repeated 10

times and the average was calculated. The benchmark on

GoGrid was conducted 20 times, because of high deviations

in performance of particular virtual instances. To address

that, we represent standard deviation values as a measure of

service consistency, or variability of performance.

The benchmarks on all three providers were executed in

parallel. The tests on Amazon and Rackspace completed

within 2 hours. Due to the limit of free ip addresses on

GoGrid, the tests were executed in two stages and took 4

hours. The total cost of running the tests was USD $53.23.

VI. DISCUSSION OF RESULTS

In this section we present the results of our benchmarking

experiments. The full results are displayed in the Table

II. In the experiment, we collected the following metrics

- average response time (ART), maximum response time

(MRT), total number of successful interactions (SI) and total

number of timeouts (T) during each benchmark cycle. In

order to determine the consistency of performance we also

calculated the standard deviation of average response time

(St.Dev). In order to illustrate the results we describe 3

scenarios: one generic scenario, where we do not consider

any specific constraints and 2 custom scenarios where the

customer has got specific requirements. For the sake of

readability, sometimes we omit some of the metrics. We

discuss the measured metrics and their significance, and

give recommendations below. Please, note that in all figures

displayed in this section, the provider offerings are sorted

by increasing price, and the Y-axis in custom scenarios is in

logarithmic scale.

A. Generic scenario

In this scenario, we assume that there are no user con-

straints and we evaluate the entire set of cloud servers from

3 providers with a workload of 1000 RBEs. We present this

scenario to demonstrate the way to select the potentially

good offers. Different techniques can be used to rank the

offers based on the benchmarking results. We have proposed

one way to do this using utility theory and preference

policies in [8]. In [5], the authors propose CloudGenius,

a framework for automated decision-making for migration

of web applications to the cloud. They make use of the

well known mult-criteria decision-making technique called

Analytic Hierarchy Process. These are just two among

several multi-criteria decision-making techniques that can

be used to help automate the decision-making process.

In this example we focus on 3 characteristics: ART,

St.Dev, and T. The results are displayed in Figure 4. Based

on the displayed information, we can easily read and evalu-

ate the server configurations. For example, S15 has the aver-

age response time of about 7000ms, 3000 requests dropped

by timeout, and about 500ms deviation in performance. We

can notice that S5, S13, and S14 have rather high deviation

in their performance compared to other instances. Thus,

if consistency of performance is essential, they should not

be considered. S7 is the most productive instance with the

lowest values for all three metrics, but is the most expensive

one as well. S12 offers a good compromise between the price

and performance. If we want to achieve the lowest level of

timeouts, we should consider S7, S6 and S20 with the final

choice depending on the available budget.

B. Custom scenarios

In this subsection we assume that A has a small online

retail store selling jewellery. He currently runs the three

tier enterprise application on his own infrastructure and has

now decided to migrate to the cloud because of growing

business. He has a reasonable idea about the characteristics

of his application. He has several options in front of him

and does not know which option to select. We consider

two scenarios when A has different budget constraints and

different workload and performance requirements. We give

the recommendation in each scenario.

419

Figure 4: Generic Scenario Results (workload of 1000 RBEs)

(a) Scenario 1 Results (workload of 100 RBEs) (b) Scenario 2 Results (workload of 1000 RBEs)

Figure 5: Custom Scenarios Results

Scenario 1: low requirements. The customer needs to

accommodate the workload of maximum 100 concurrent

requests with the response within 1 second, and the budget

is limited to 20c/hour. By filtering out the entire set of

results, we get 4 instances that fit the requirements, which

are presented in the Figure 5a.

All 4 instances have insignificant variation in performance

among each other, including S15, which is twice as cheaper

than the other configurations. If the level of performance

of S15 is satisfactory, it is the best choice. S2, being the

most expensive option has the highest chance of dropping

the request by timeout, which does not make it particularly

attractive. The best price/performance ratio is achieved by

S10 and S16. Having the same price, S10 would be the better

choice because its performance characteristics are slightly

better. Our recommendation is given in the Table III.

Scenario 2: high requirements. The customer needs to

accommodate a workload of 1000 concurrent requests with

the response within 3 seconds, and the budget is limited to

$1.30/hour. After filtering the entire set of results, we get 4

instances that satisfy the conditions as shown in Figure 5b.

According to the benchmarking results, S12 is the cheap-

Constraint RecommendationScenario 1 Scenario 2

Budget S15 S12Performance S10 S12, S6Consistency of service S10 S19

Table III: Recommendations

est server and has the best performance in terms of average

and maximum response time, and successful interactions.

This instance seems to be the best choice among all options.

S6, being the most expensive server, offers the lowest

number of timeouts and the second best performance. If the

consistency of service is important, S19 is the best choice

with the lowest standard deviation value. Our recommenda-

tion is given in the Table III.

VII. CONCLUSION

As the rate of cloud computing adoption grows, it is

becoming more important for prospective consumers to

make informed decisions. Consumers would like to obtain a

quick assessment of the price/performance levels of different

420

IaaS providers before making any migration decisions. One

way to do this is through benchmarking. In this paper, we

have presented the Smart CloudBench system, which allows

the automated execution of representative benchmarks on

different IaaS clouds under representative load conditions to

quickly estimate their cost/performance levels. The Smart

CloudBench platform helps decision-makers make informed

decisions about migrating their in-house systems to the

cloud. They can design specific experiments to test the

performance of representative applications using load con-

ditions that match the load levels of their own in-house

applications. Smart CloudBench is useful for organizations

that do not possess the time, resources and in-house expertise

to do a thorough evaluation of multiple cloud platforms.

We have implemented a proof-of-concept prototype of

Smart CloudBench system and evaluated the TPC-W bench-

mark on twenty different cloud server types under variable

load conditions. Even though we performed simple load

tests, the results show the value of having such a bench-

marking tool by highlighting that price does not necessarily

translate to performance (and its consistency) on the cloud,

and that users do not necessarily benefit by procuring the

most powerful server instances.

As future work, we plan to add more application stack

benchmarks such as Rubis3, Oli4 and jEnterprise20105. We

plan to integrate and test more IaaS providers and their

offerings. We also plan to equip Smart CloudBench with a

set of micro-benchmarks, which could serve as an auditing

service for IaaS provider’s SLA. We are also adding a more

generic testing environment using tools like JMeter.

ACKNOWLEDGMENT

This work was partially funded by the Service Delivery

& Aggregation Project within the Smart Services CRC.

REFERENCES

[1] A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp:comparing public cloud providers. In Proc. of the 10th AnnualConference on Internet Measurement, November 2010.

[2] A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCMP:Shopping for a cloud made easy. In Proc. of the 2nd USENIXConference on Hot Topics in Cloud Computing, June 2010.

[3] A. Li, X. Yang, S. Kandula, X. Yang and M. Zhang.CloudProphet: towards application performance predictionin cloud. In Proc. of ACM SIGCOMM 2011, Toronto, pp-426-427.

[4] M. Ferdman, et. al: Clearing the clouds: a study of emergingscale-out workloads on modern hardware. In Proc of 17thASPLOS 2012, pp.37-48.

[5] M. Menzel, R. Ranjan CloudGenius: decision support forweb server cloud migration. In Proc. of 21st WWW 2012,Lyon, France: pp–979-988.

3http://rubis.ow2.org/index.html4http://incubator.apache.org/olio/5http://www.spec.org/jEnterprise2010/

[6] D. Gmach, J. Rolia, and L. Cherkasova. Comparingefficiency and costs of cloud computing models. In Proc.of IEEE Network Operations and Management Symposium(NOMS), pp. 647-650, 2012.

[7] D. Gmach, J. Rolia, and L. Cherkasova. Selling T-shirtsand Time Shares in the Cloud. In Proc. of 12th IEEE/ACMCCGrid, pp.539-546 (2012).

[8] M. Baruwal Chhetri, Q. Bao Vo, R. Kowalczyk, & C. LanDo. Cloud Broker: Helping You Buy Better. In Proc. of 12thWISE 2011, pp–341-342 (2011).

[9] M. Baruwal Chhetri, Q. Bao Vo, and R. KowalczykA Flexible Policy Framework for the QoS DifferentiatedProvisioning of Services. In Proc. of The 11th CCGRID-11,pp-444-453, 2011.

[10] A. Lenk, M. Menzel, J. Lipsky, S. Tai, S, P. OffermannWhat are you paying for? performance benchmarking forinfrastructure-as-a-service offerings. In proc. of 2011 IEEECLOUD, pp–484-491 (2011).

[11] C. Luo et. al. CloudRank-D: benchmarking and rankingcloud computing systems for data processing applications.In Frontiers of Computer Science, 6(4), pp–347-362 (2012).

[12] V. Vedam, J. Vemulapati. Demystifying Cloud BenchmarkingParadigm – An in Depth View. In Proceedings of 36th IEEECOMPSAC, pp. 416-421. (2012).

[13] P. Shivam, V. Marupadi, J. Chase, T. Subramaniam,and S. Babu. Cutting corners: Workbench automation forserver benchmarking. In Proc. of USENIX Annual TechnicalConference. 2008

[14] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T.Fahringer, and D. Epema. A performance analysis of EC2cloud computing services for scientific computing. In CloudComputing Journal, Vol 34, pp–115-131 (2010).

[15] A. Iosup, R. Prodan, and D. Epema. IaaS Cloud Bench-marking: Approaches, Challenges, and Experience. In Proc.of 5th Workshop on Many-Task Computing on Grids andSupercomputers (MTAGS) 2012.

[16] A. Alexandrov, et. al. Benchmarking in the Cloud: What itShould, Can, and Cannot Be. In Proc.of TPC TechnologyConference on Performance Evaluation & Benchmarking(TPCTC), VLDB 2012.

[17] L. Zhao, A. Liu, and J. Keung. Evaluating cloud plat-form architecture with the care framework. In Proc. of 17thAPSEC, pp. 60-69. (2010).

[18] M. Baruwal Chhetri, B. Q. Vo, R. Kowalczyk Policy-BasedAutomation of SLA Establishment for Cloud Computing Ser-vices. In Proc. of the 11th IEEE/ACM CCGRID-12, Ottawa(Canada), 13-16 May 2012

[19] M. Baruwal Chhetri, B. Q. Vo, R. Kowalczyk AutoSLAM - APolicy-driven Middleware for Automated SLA Establishmentin SOA Environments. In Proc. of the 9th SCC 2012, pp. 9-16,2012.

[20] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, andR. Sears. Benchmarking cloud serving systems with YCSB.In Proc. of the 1st ACM symposium on Cloud computing,pp–143-154. ACM (2010).

[21] SPEC Open Systems Group. Report on cloud computing tothe OSG Steering Committee. Technical Report OSG-wg-nal-20120214, available online at www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf(2012)

[22] Transaction Processing Performance Council. TPC Bench-mark W (Web Commerce) Specification, version 2.0r. Techni-cal Specification, available online at http://www.tpc.org/tpcw/spec/TPCWV2.pdf(2003)

421