11th ieee/acm international symposium on cluster, cloud...

PROCEEDINGS

11th IEEE/ACM International Symposium on Cluster, Cloud

and Grid Computing

—— CCGrid 2011 ——

PROCEEDINGS

11th IEEE/ACM International Symposium on Cluster, Cloud

and Grid Computing

—— CCGrid 2011 ——

23–26 May 2011

Newport Beach, CA, USA

Los Alamitos, California

Washington • Tokyo

Copyright © 2011 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331. The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They reflect the authors’ opinions and, in the interests of timely dissemination, are published as presented and without change. Their inclusion in this publication does not necessarily constitute endorsement by the editors, the IEEE Computer Society, or the Institute of Electrical and Electronics Engineers, Inc.

IEEE Computer Society Order Number E4395

BMS Part Number CFP11276-CDR ISBN 978-0-7695-4395-6

Additional copies may be ordered from:

IEEE Computer Society IEEE Service Center IEEE Computer Society Customer Service Center 445 Hoes Lane Asia/Pacific Office

10662 Los Vaqueros Circle P.O. Box 1331 Watanabe Bldg., 1-4-2 P.O. Box 3014 Piscataway, NJ 08855-1331 Minami-Aoyama

Los Alamitos, CA 90720-1314 Tel: + 1 732 981 0060 Minato-ku, Tokyo 107-0062 Tel: + 1 800 272 6657 Fax: + 1 732 981 9667 JAPAN Fax: + 1 714 821 4641 http://shop.ieee.org/store/ Tel: + 81 3 3408 3118

http://computer.org/cspress [email protected]

[email protected] Fax: + 81 3 3408 3553 [email protected]

Individual paper REPRINTS may be ordered at: <[email protected]>

Editorial production by Randall Bilof Cover art production by Joe Daigle/Studio Productions

Printed in the United States of America by Applied Digital Imaging

IEEE Computer Society Conference Publishing Services (CPS)

http://www.computer.org/cps

2011 11th IEEE/ACMInternational Symposiumon Cluster, Cloud and Grid

Computing

CCGrid 2011Table of Contents

Message from the General Cochairs........................................................................................................xiiMessage from the Program Committee Chair........................................................................................xivConference Chairs ....................................................................................................................................xviSteering Committee.................................................................................................................................xviiiProgram Committee Members.................................................................................................................xixReviewers..................................................................................................................................................xxiiKeynotes ..................................................................................................................................................xxiv

Virtual MachinesCharacterizing the Performance of Parallel Applications on Multi-socket VirtualMachines ......................................................................................................................................................1

Khaled Z. Ibrahim, Steven Hofmeyr, and Costin Iancu

CloudSpider: Combining Replication with Scheduling for Optimizing LiveMigration of Virtual Machines across Wide Area Networks ........................................................................13

Sumit Kumar Bose, Scott Brock, Ronald Skeoch, and Shrisha Rao

Optimized Management of Power and Performance for VirtualizedHeterogeneous Server Clusters .................................................................................................................23

Vinicius Petrucci, Enrique V. Carrera, Orlando Loques, Julius C.B. Leite,and Daniel Mossé

GPU-Based ComputingSmall Discrete Fourier Transforms on GPUs .............................................................................................33

S. Mitra and A. Srinivasan

A Parallel Rectangle Intersection Algorithm on GPU+CPU ........................................................................43Shih-Hsiang Lo, Che-Rung Lee, Yeh-Ching Chung, and I-Hsin Chung

v

GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms ......................................53Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero,David Atienza, and Luca Benini

Programming Models and Runtime SystemsAssertion Based Parallel Debugging ..........................................................................................................63

Minh Ngoc Dinh, David Abramson, Donny Kurniawan, Chao Jin, Bob Moench,and Luiz DeRose

Cheetah: A Framework for Scalable Hierarchical Collective Operations ...................................................73Richard Graham, Manjunath Gorentla Venkata, Joshua Ladd, Pavel Shamis,Ishai Rabinovitz, Vasily Filipov, and Gilad Shainer

Enabling Multi-physics Coupled Simulations within the PGAS ProgrammingFramework ..................................................................................................................................................84

Fan Zhang, Ciprian Docan, Manish Parashar, and Scott Klasky

Multiple Services Throughput Optimization in a Hierarchical Middleware ..................................................94Eddy Caron, Benjamin Depardon, and Frédéric Desprez

Grid and Cloud Computing PerformanceOn the Performance Variability of Production Cloud Services .................................................................104

Alexandru Iosup, Nezih Yigitbasi, and Dick Epema

The Grid Observatory ...............................................................................................................................114Cécile Germain-Renaud, Alain Cady, Philippe Gauron, Michel Jouvin,Charles Loomis, Janusz Martyniak, Julien Nauroy, Guillaume Philippon,and Michèle Sebag

Grid Global Behavior Prediction ...............................................................................................................124Jesús Montes, Alberto Sánchez, and María S. Pérez

Volunteer ComputingA Robust Communication Framework for Parallel Execution on Volunteer PCGrids .........................................................................................................................................................134

Eshwar Rohit, Hien Nguyen, Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel,Qian Wang, Margaret S. Cheung, and David Anderson

Non-cooperative Scheduling Considered Harmful in Collaborative VolunteerComputing Environments .........................................................................................................................144

Bruno Donassolo, Arnaud Legrand, and Cláudio Geyer

Towards Real-Time, Volunteer Distributed Computing ............................................................................154Sangho Yi, Emmanuel Jeannot, Derrick Kondo, and David P. Anderson

vi

Distributed Systems and ApplicationsGeoServ: A Distributed Urban Sensing Platform ......................................................................................164

Jong Hoon Ahnn, Uichin Lee, and Hyun Jin Moon

Building an Online Domain-Specific Computing Service over Non-dedicatedGrid and Cloud Resources: The Superlink-Online Experience ................................................................174

Mark Silberstein

Techniques for Fine-Grained, Multi-site Computation Offloading .............................................................184Kanad Sinha and Milind Kulkarni

Resource Scheduling on the CloudSLA-Based Resource Allocation for Software as a Service Provider (SaaS)in Cloud Computing Environments ...........................................................................................................195

Linlin Wu, Saurabh Kumar Garg, and Rajkumar Buyya

Improving Utilization of Infrastructure Clouds ...........................................................................................205Paul Marshall, Kate Keahey, and Tim Freeman

Resource and Revenue Sharing with Coalition Formation of Cloud Providers:Game Theoretic Approach ........................................................................................................................215

Dusit Niyato, Athanasios V. Vasilakos, and Zhu Kun

Self-Healing Distributed Scheduling Platform ...........................................................................................225Marc E. Frincu, Norha M. Villegas, Dana Petcu, Hausi A. Müller,and Romain Rouvoy

Data StreamingTowards Reliable, Performant Workflows for Streaming-Applications on CloudPlatforms ...................................................................................................................................................235

Daniel Zinn, Quinn Hart, Timothy McPhillips, Bertram Ludäscher,Yogesh Simmhan, Michail Giakkoupis, and Viktor K. Prasanna

A Sketch-Based Architecture for Mining Frequent Items and Itemsetsfrom Distributed Data Streams .................................................................................................................245

Eugenio Cesario, Antonio Grillo, Carlo Mastroianni, and Domenico Talia

Caching and Shared MemoryAPP: Minimizing Interference Using Aggressive Pipelined Prefetchingin Multi-level Buffer Caches ......................................................................................................................254

Christina M. Patrick, Nicholas Voshell, and Mahmut Kandemir

PAC-PLRU: A Cache Replacement Policy to Salvage Discarded Predictionsfrom Hardware Prefetchers .......................................................................................................................265

Ke Zhang, Zhensong Wang, Yong Chen, Huaiyu Zhu, and Xian-He Sun

vii

Contention Modeling for Multithreaded Distributed Shared Memory Machines:The Cray XMT ..........................................................................................................................................275

Simone Secchi, Antonino Tumeo, and Oreste Villa

Data-Driven ComputingPredictive Data Grouping and Placement for Cloud-Based Elastic ServerInfrastructures ...........................................................................................................................................285

Juan M. Tirado, Daniel Higuero, Florin Isaila, and Jesús Carretero

BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for CloudComputing ................................................................................................................................................295

Jiahui Jin, Junzhou Luo, Aibo Song, Fang Dong, and Runqun Xiong

Fault Tolerance and CheckpointingOn the Scheduling of Checkpoints in Desktop Grids ................................................................................305

Mohamed Slim Bouguerra, Derrick Kondo, and Denis Trystram

High Performance Pipelined Process Migration with RDMA ....................................................................314Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron,and Dhabaleswar K. Panda

Failure Avoidance through Fault Prediction Based on Synthetic Transactions ........................................324Mohammed Shatnawi and Matei Ripeanu

Communication and Network ManagementA Scalable Method for Signalling Dynamic Reconfiguration Eventswith OpenSM ............................................................................................................................................332

Wei Lin Guay and Sven-Arne Reinemo

On the Relation between Congestion Control, Switch Arbitration and Fairness ......................................342Ernst Gunnar Gran, Eitan Zahavi, Sven-Arne Reinemo, Tor Skeie,Gilad Shainer, and Olav Lysne

Network-Friendly One-Sided Communication through Multinode Cooperationon Petascale Cray XT5 Systems ..............................................................................................................352

Xinyu Que, Weikuan Yu, Vinod Tipparaju, Jeffrey S. Vetter, and Bin Wang

Distributed Hash TablesEvaluating and Optimizing Indexing Schemes for a Cloud-Based ElasticKey-Value Store ........................................................................................................................................362

David Chiu, Apeksha Shetty, and Gagan Agrawal

Sophia: Local Trust for Securing Routing in DHTs ...................................................................................372Raúl Gracia-Tinedo, Pedro García-López, and Marc Sánchez-Artigas

The Benefits of Estimated Global Information in DHT Load Balancing ....................................................382Nico Kruber, Mikael Högqvist, and Thorsten Schütt

viii

I/O and File Systems 1DHTbd: A Reliable Block-Based Storage System for High PerformanceClusters .....................................................................................................................................................392

George Parisis, George Xylomenos, and Theodore Apostolopoulos

Adaptive QoS Decomposition and Control for Storage Cache Managementin Multi-server Environments ....................................................................................................................402

Ramya Prabhakar, Shekhar Srikantaiah, Rajat Garg, and Mahmut Kandemir

A Segment-Level Adaptive Data Layout Scheme for Improved Load Balancein Parallel File Systems ............................................................................................................................414

Huaiming Song, Yanlong Yin, Xian-He Sun, Rajeev Thakur, and Samuel Lang

QoSClassification and Composition of QoS Attributes in Distributed,Heterogeneous Systems ..........................................................................................................................424

Elisabeth Vinek, Peter Paul Beran, and Erich Schikuta

Autonomic SLA-Driven Provisioning for Cloud Applications .....................................................................434Nicolas Bonvin, Thanasis G. Papaioannou, and Karl Aberer

A Flexible Policy Framework for the QoS Differentiated Provisioningof Services ................................................................................................................................................444

Mohan Baruwal Chhetri, Bao Quoc Vo, and Ryszard Kowalczyk

Data Intensive Computing and MapReduceDELMA: Dynamically ELastic MapReduce Framework for CPU-IntensiveApplications ..............................................................................................................................................454

Zacharia Fadika and Madhusudhan Govindaraju

Cloud MapReduce: A MapReduce Implementation on Top of a CloudOperating System .....................................................................................................................................464

Huan Liu and Dan Orban

Ex-MATE: Data Intensive Computing with Large Reduction Objects and ItsApplication to Graph Mining ......................................................................................................................475

Wei Jiang and Gagan Agrawal

I/O and File Systems 2ASDF: An Autonomous and Scalable Distributed File System .................................................................485

Chien-Ming Wang, Chi-Chang Huang, and Huan-Ming Liang

Managing Distributed Files with RNS in Heterogeneous Data Grids .......................................................494Yutaka Kawai, Go Iwai, Takashi Sasaki, and Yoshiyuki Watase

DDFTP: Dual-Direction FTP .....................................................................................................................504Jameela Al-Jaroodi and Nader Mohamed

ix

Efficient Support for MPI-I/O Atomicity Based on Versioning ...................................................................514Viet-Trung Tran, Bogdan Nicolae, Gabriel Antoniu, and Luc Bougé

SecurityImplementing Trust in Cloud Infrastructures .............................................................................................524

Ricardo Neisse, Dominik Holling, and Alexander Pretschner

Detection and Protection against Distributed Denial of Service Attacksin Accountable Grid Computing Systems .................................................................................................534

Wonjun Lee, Anna C. Squicciarini, and Elisa Bertino

Dealing with Grid-Computing Authorization Using Identity-BasedCertificateless Proxy Signature .................................................................................................................544

Mohamed Amin Jabri and Satoshi Matsuoka

Social Network (SN4CCGridS) WorkshopOpen Social Based Collaborative Science Gateways ..............................................................................554

Wenjun Wu, Hui Zhang, and ZhenAn Li

Social Networks of Researchers and Educators on nanoHUB.org ..........................................................560Gerhard Klimeck, George B. Adams III, Krishna P.C. Madhavan, Nathan Denny,Michael G. Zentner, Swaroop Shivarajapura, Lynn K. Zentner,and Diane L. Beaudoin

A Trustworthiness Fusion Model for Service Cloud Platform Based on D-SEvidence Theory .......................................................................................................................................566

Rong Hu, Jianxun Liu, and Xiaoqing Frank Liu

Engineering Incentives in Social Clouds ...................................................................................................572Christian Haas, Simon Caton, and Christof Weinhardt

Clouds for Business, Industry, and Enterprise (C4BIE) WorkshopUtilizing “Opaque” Resources for Revenue Enhancement on Clouds and Grids .....................................576

Jose Orlando Melendez and Shikharesh Majumdar

Debunking Real-Time Pricing in Cloud Computing ..................................................................................585Sewook Wee

Unifying Cloud Management: Towards Overall Governance of Business LevelObjectives .................................................................................................................................................591

Mina Sedaghat, Francisco Hernández, and Erik Elmroth

Defining a Cloud Reference Model ...........................................................................................................598Teresa Tung

x

Poster AbstractsInferring Network Topologies in Infrastructure as a Service Cloud ..........................................................604

Dominic Battré, Natalia Frejnik, Siddhant Goel, Odej Kao, and Daniel Warneke

Addressing Resource Fragmentation in Grids through Network-AwareMeta-scheduling in Advance .....................................................................................................................606

Luis Tomás, Carmen Carrión, Blanca Caminero, and Agustín Caminero

Performance under Failures of MapReduce Applications ........................................................................608Hui Jin, Kan Qiao, Xian-He Sun, and Ying Li

MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm FileSystem ......................................................................................................................................................610

Hiroki Kimura and Osamu Tatebe

Diagnosing Anomalous Network Performance with Confidence ..............................................................612Bradley W. Settlemyer, Stephen W. Hodson, Jeffery A. Kuehn,and Stephen W. Poole

A Performance Goal Oriented Processor Allocation Technique for CentralizedHeterogeneous Multi-cluster Environments ..............................................................................................614

Po-Chi Shih, Kuo-Chan Huang, Che-Rung Lee, I-Hsin Chung,and Yeh-Ching Chung

A Hybrid Shared-Nothing/Shared-Data Storage Architecture for Large ScaleDatabases .................................................................................................................................................616

Huaiming Song, Xian-He Sun, and Yong Chen

EZTrace: A Generic Framework for Performance Analysis ......................................................................618François Trahay, François Rue, Mathieu Faverge, Yutaka Ishikawa,Raymond Namyst, and Jack Dongarra

Supporting Federated Multi-authority Security Models .............................................................................620John Watt and Richard O. Sinnott

Author Index ............................................................................................................................................622

xi

Message from the CCGrid 2011 General Cochairs It gives us great pleasure to welcome you to the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011) sponsored by the IEEE Computer Society, IEEE Technical Committee on Scalable Computing (TCSC), and the Association for Computing Machinery (ACM). The CCGrid conference series strives to provide a forum for all distributed computing technologies, for all technology stakeholders. Clusters, grids, and now clouds are clearly having an enormous impact on science, industry, and government. As a truly global venue, we want to enable the “cross-pollination” of researchers, developers, and users as they explore research ideas, evaluate maturing technologies, and put them to good use! The inaugural CCGrid conference was held in Brisbane, Australia, in 2001. Since then, the conference has successfully been hosted around the world, becoming the international event it is today. From 2002 to 2010, CCGrid annual events were held in Germany, Japan, USA, UK, Singapore, Brazil, France, China, and Australia. We are delighted to host the 11th iteration of the CCGrid conference in Newport Beach, California. This year’s symposium continues this tradition by exploring leading-edge issues in cluster, cloud, and grid computing with an outstanding technical program featuring keynote talks, tutorials, plenary panels, workshops, poster sessions, research exhibits, demos, and the IEEE SCALE competition. We are pleased to host keynotes by Prof. Larry Smarr on state-of-the-art optical interconnectivity for clusters, grids, and clouds and by Dr. Rich Wolski on open source cloud computing infrastructures. CCGrid has been extremely fortunate to serve as a venue for presentation of the prestigious IEEE Medal for Excellence in Scalable Computing award offered annually by the IEEE Technical Committee on Scalable Computing. We thank Prof. Manish Parashar for coordinating the efforts to select the TCSC’s 2011 IEEE Medal for Excellence in Scalable Computing recipient, who will also be featured as a keynote speaker at the conference. The continued success and leadership of CCGrid requires the dedicated efforts and high standards of numerous international volunteers. Putting together CCGrid 2011 was a team effort – this event would not have been possible without the enthusiastic support and hard work of a number of colleagues. As General Cochairs of this year’s event, we would like to express our sincere gratitude to the members of the Steering Committee, in particular, the constant encouragement and input of Prof. Rajkumar Buyya, the Chair of the CCGrid Steering Committee. The excellent technical program was made possible by the dedicated leadership of Professor Carlos Varela, the Program Chair, and his excellent Program Committee. Special thanks to the Program Committee Area Vice Chairs (Kenjiro Taura, Manish Parashar, Henri Casanova, Gul Agha, David Anderson, and Pavan Balaji) who coordinated peer reviews for all submitted “full” papers and selected top-quality research papers for presentation at the conference. The CCGrid 2011 conference received 189 submissions from 36 countries around the world. After peer review of all these full papers, the Program Committee accepted 55 papers, resulting in an acceptance rate of ~29%. We thank all the authors for submitting to the conference and workshops, and helping to create an exciting technical program. We thank Professors Omer Rana and Bruno Schulze for coordinating the organization of the satellite workshops/mini-symposiums on hot topics such as multicore clusters and clouds for business. We appreciate the efforts of the chairs of the various workshops and their PC members for attracting and selecting top-quality papers for presentation at the conference. We wish to thank the Publicity Cochairs, Drs. Johannes Watzl and Fabio Costa, and the Student Awards Chair, Dr. Anwitaman Datta, for advertising the conference extensively and helping us reach a broad community of students, researchers, and practitioners. We would like to thank our Cyber Cochairs, Reza Rahimi and Suraj Pandey, for the excellent management of the

xii

conference website. They provided much needed information throughout the various phases of the conference organization. We would also like to thank Tutorials Cochairs Professor Sushil K. Prasad and Dr. Pavan Balaji and SCALE Challenge Chair, Dr. Shantenu Jha, for their efforts in enhancing the conference program with interesting tutorials and demos. We thank Dr. Rajkumar Buyya, Proceedings Chair, and Randall Bilof for their help and support in ensuring the publication of the conference proceedings in a timely manner, and Elizabeth Brookes Little from IEEE for enabling us to secure a good venue for the conference and assisting us with budgeting issues. We appreciate the efforts of Mary Carrillo in setting up and managing registrations for the conference. As we all know, the local arrangements are a key aspect of any event. We would like to express our deep appreciation to leading volunteers of the local organizing committee, Mary Carrillo, Reza Rahimi, and Venita D’Souza, for their dedicated work during the last year. Thanks are also due to our sponsors, namely, IEEE, ACM, TCSC, and the organizational supporters at the Department of Computer Science and the Donald Bren School of Information and Computer Sciences at the University of California, Irvine. We would like to thank the National Science Foundation, in particular, Dr. Gabrielle Allen, for providing student travel awards for the conference. Ultimately, however, the success of the conference will be judged by how well the delegates have participated, learned, interacted, and established contacts with other researchers in different fields. We hope that the conference will provide you with a valuable opportunity to share ideas with other researchers and practitioners from institutions around the world. We wish everyone a successful, stimulating, and rewarding meeting and look forward to seeing you again at future CCGrid conferences. Enjoy your visit to sunny Southern California!

Craig Lee The Aerospace Corporation and Open Grid Forum

Nalini Venkatasubramanian University of California, Irvine

xiii

Message from the CCGrid 2011 Program Committee Chair The 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing program contains 55 high-quality technical papers selected from 189 submissions from 36 different countries (29.1% acceptance rate.) The special focus in this year’s call for papers was on: adaptive elastic computing, green computing, virtualization, and GP-GPU computing. The great majority of papers received three or more reviews, and we ensured that all borderline papers received at least three reviews and were discussed electronically by the Program Committee before deciding on acceptance or rejection. Unfortunately, we had to decline many worthy submissions. There will be three parallel morning and afternoon sessions on Tuesday, Wednesday, and Thursday (6 sessions per day). Each session with two to four related papers, most with three. We will have three keynote talks, one each morning preceding the technical sessions: On Tuesday, Dr. Larry Smarr will talk about dedicated optical lightpaths for clusters, grids and clouds. On Wednesday, the IEEE Medal Winner will give her/his acceptance speech. On Thursday, Dr. Rich Wolski will discuss open-source software infrastructure for cloud computing. After the morning and afternoon technical paper sessions, we will have panels on exciting emerging themes: autonomic cloud computing and green/GP-GPU computing. We also invited authors of 18 papers that were in the borderline but did not make it into the program, and received 9 high-quality poster submissions, which are accompanied by 2-page extended abstracts in this Conference Proceedings. Watch for the poster session on Tuesday. Following are the accepted paper statistics by topic: Statistics by Topic

Topic Submissions Accepted Acceptance rate

PC members

Scalable Heterogeneous Fault-Tolerant Computing 28 11 0.39 21

Programming Models and Systems 35 13 0.37 29 Algorithms and Applications 54 20 0.37 20 Performance Modeling and Evaluation 55 14 0.25 28 Scheduling and Resource Management 61 18 0.30 33 Middleware, Autonomic Computing, and Cyberinfrastructure 62 22 0.35 31

Notice that papers authors chose multiple topics of relevance (avg: 1.6 topics/paper), and PC members also chose multiple topics of expertise (avg: 1.8 topics/PC member). Since some topics are highly interrelated (e.g., a paper may have both middleware and performance evaluation components, or fault-tolerant computing and applications), I was purposefully flexible in terms of "topics" as opposed to using separate "tracks" as in other

xiv

conferences. Instead, I have allowed both authors and PC members to select their topics of relevance/expertise, and I think it worked out quite well. We assigned papers in multiple topics to PC members with expertise in the same multiple topics, so authors should have received better quality reviews. For example, a paper that chose "Algorithms and Applications" and "Scheduling and Resource Management" was assigned to a PC member that declared those two topics of expertise as well. We ended up with 55 accepted papers, which means that on average, an accepted paper selected 1.78 (98/55) topics. This high-quality program would not have been possible if not for the very hard work of 100 Program Committee members, including my colleagues, Program Vice-Chairs Gul Agha, UIUC, USA (Programming Models and Systems), David Anderson, UC Berkeley, USA (Scalable Heterogeneous Fault-Tolerant Computing), Pavan Balaji, Argonne National Lab, USA (Scheduling and Resource Management), Henri Casanova, U. Hawaii, USA (Performance Modeling and Evaluation), Manish Parashar, Rutgers University (Middleware, Autonomic Computing and Cyberinfrastructure), and Kenjiro Taura, U. Tokyo, Japan (Algorithms and Applications). We also are indebted to the additional reviewers who volunteered their time and effort to make this the best possible CCGrid technical program. I also want to especially thank Rajkumar Buyya, the Steering Committee Chair, for his never-ending guidance, and my colleagues Craig Lee and Nalini Venkatasubramanian, the General Cochairs, for a seamless conference coordination. Last but not least, I want to thank the technical paper authors for submitting their research results and presenting them to the CCGrid community. Without your work, none of this would be possible. And now, it is your time to enjoy it! We encourage you to discuss emerging themes, meet old colleagues, make new friends, and brew new ideas in this geographically diverse conference, in the warmth of Southern California. Welcome to CCGrid 2011!

Carlos A. Varela Rensselaer Polytechnic Institute

xv

CCGrid 2011 Conference Chairs

General Cochairs Nalini Venkatasubramanian, University of California, Irvine, USA

Craig Lee, Aerospace Corporation, USA

Program Committee Chair Carlos A. Varela, Rensselaer Polytechnic Institute, USA

Program Committee Vice Chairs

Algorithms and Applications Kenjiro Taura, University of Tokyo, Japan

Middleware, Autonomic Computing, and Cyberinfrastructure

Manish Parashar, Rutgers University, USA

Performance Modeling and Evaluation Henri Casanova, University of Hawaii, USA

Programming Models and Systems

Gul Agha, UIUC, USA

Scalable Heterogeneous Fault-Tolerant Computing David Anderson, University of California, Berkeley, USA

Scheduling and Resource Management

Pavan Balaji, Argonne National Laboratory, USA

Workshop Cochairs Bruno Schulze, LNCC, Brazil

Omer Rana, Welsh eScience Center and Cardiff University, UK

xvi

Tutorial Cochairs Pavan Balaji, Argonne National Laboratory, USA Sushil K. Prasad, Georgia State University, USA

Student Awards Chair

Anwitaman Datta, Nanyang Technology University, Singapore

Industry Track Chair Radha Nandkumar, National Center for Supercomputing Applications, UIUC, USA

Proceeding Chair

Rajkumar Buyya, University of Melbourne, Australia

Publicity Cochairs Johannes Watzl, LMU, Germany

Fabio Costa, Federal University of Goias, Brazil

Cyber Cochairs M. Reza Rahimi, University of California, Irvine, USA

Suraj Pandey, University of Melbourne, Australia

Local Arrangements Coordinator Mary Carrillo, University of California, Irvine, USA

xvii

CCGrid 2011 Steering Committee

Rajkumar Buyya, University of Melbourne, Australia (Chair) Craig Lee, The Aerospace Corporation, USA (Cochair)

Henri Bal, Vrije University, The Netherlands Franck Capello, University of Paris-Sud, France

Jack Dongarra, University of Tennessee & ORNL, USA Dick Epema, Technical University of Delft, The Netherlands

Thomas Fahringer, University of Innsbruck, Austria Ian Foster, Argonne National Laboratory, USA

Wolfgang Gentzsch, D-Grid, Germany, and RENCI, USA Shantanu Jha, Louisiana State University, USA (SCALE 2011 Coordinator)

Hai Jin, Huazhong University of Science & Technology, China Laurent Lefevre, INRIA, France Geng Lin, Cisco Systems, USA

Shikharesh Majumdar, Carleton University, Canada Satoshi Matsuoaka, Tokyo Institute of Technology, Japan

Manish Parashar, Rutgers University, USA Omer Rana, Cardiff University, UK

Paul Roe, Queensland University of Technology, Australia Bruno Schulze, LNCC, Brazil

xviii

CCGrid 2011 Program Committee Members

Sherif Abdelwahed, Mississippi State University, USA Ahmad Afsahi, Queen’s University, Canada Gagan Agrawal, Ohio State University, USA

Kento Aida, National Institute of Informatics, Japan Jorn Altmann, Seoul National University, Korea

Geoff Arnold, Huawei, China David Bader, Georgia Institute of Technology, USA

Henri, Bal, Vrije Universiteit, Amsterdam, Netherlands Viraj Bhat, Yahoo, USA

Carlos Jaime Barrios, Universidad Industrial de, Santander Colombia Ivona Brandic, Vienna University of Technology, Austria

Franck Cappello INRIA/UIUC, France/USA Harold Castro, Universidad de Los Andes, Colombia

Brad Chamberlain, Cray, USA Yong Chen, Oak Ridge National Laboratory, USA

Eunmi Choi, Kookmin, Korea Michael Cummings, University of Maryland, USA

Ewa Deelman, University of Southern California, USA Travis Desell, Rensselaer Polytechnic Institute, USA

Frezdezric Desprez, LIP/INRIA, France Zhihui Du, Tsinghua University, China

Kaoutar El-Maghraoui, IBM T.J. Watson Research Lab, USA Erik Elmroth, Umecea University, Sweden

Toshio Endo, Tokyo Institute of Technology, Japan Dick Epema, Technische Universiteit Delft, Netherlands

Thomas Fahringer, University of Innsbruck, Austria Gilles Fedak, INRIA, France

Wu-chun Feng, Virginia Tech, USA John Field, IBM T.J. Watson Research Lab, USA Renato Figueiredo, University of Florida, USA

Jose Fortes, University of Florida, USA Geoffrey Fox, Indiana University, USA

Keiichiro Fukazawa, Kyushu University, Japan Saurabh Kumar Garg, University of Melbourne, Australia

xix

Ada Gavrilovska, Georgia Institute of Technology, USA Cecile Germain-Renaud, Laboratoire de Recherche en Informatique (LRI), France

Michael Gerndt, Technische Universitaet Muenchen, Germany Daniel Gonzalez, Univ. of Extremadura, Spain

Ganesh Gopalakrishnan, University of Utah, USA William Gropp, UIUC, USA Bob Grossman, UICC, USA

Seif Haridi, KTH Stockholm, Sweden Kenneth Hawick, Massey University–Albany, New Zealand

Torsten Hoefler, NCSA, USA Marty Humphrey, University of Virginia, USA

Alexandru Iosup, Delft University of Technology, Netherlands Takeshi Iwashita, Kyoto University, Japan

Shantenu Jha, Louisiana State University, USA Hai Jin, Huazhong University of Science and Technology China

Krishna Kant, NSF, USA Takahiro Katagiri, University of Tokyo, Japan

Kate Keahey, Argonne National Laboratory, USA Thilo Kielmann, Vrije Universiteit, Amsterdam, Netherlands

Hyunjoo Kim, Rutgers University, USA Derrick Kondo, INRIA, France

Adam Kornafeld, Worcester Polytechnic Institute, USA Xiaolin Li, University of Florida, USA

Arnaud Legrand, INRIA, France Laurent Lefevre, INRIA, France

Satoshin Matsuoka, Tokyo Institute of Technology, Japan Guillaume Mercier, INRIA, France

Hidemoto Nakada, AIST, Japan Charles Norton, JPL/NASA, USA

Scott Pakin, Los Alamos National Laboratory, USA Dhabaleswar K. Panda, Ohio State University, USA Suraj Pandey, University of Melbourne, Australia

Anjaneyulu Pasala, Infosys, India Martin Quinson, University of Nancy, France

Ivan Rodero, Rutgers University, USA Rajiv Ranjan, University of Melbourne, Australia

xx

Massoud Sadjadi, Florida International University, USA Joel Saltz, Emory University, USA

Martin Schulz, Lawrence Livermore National Laboratory, USA Bruno Schulze, LNCC, Brazil

Naveen Sharma, Xerox Innovation Group, USA Mark Silberstein, Technion, Israel

Jaspal Subhlok, University of Houston, USA Frzedezric Suter, CNRS, France

Shivasubramanian Swami, Amazon Inc, USA Michela Taufer, University of Delaware, USA

Rajeev Thakur, Argonne National Laboratory, USA Vinod Tipparaju, Oak Ridge National Laboratory, USA Jordi Torres, Technical University of Catalonia, Spain

Sathish Vadhiyar, Indian Institute of Science, India Srikumar Venugopal, The University of New South Wales, Australia

Deepak Vij, Huawei, China Abhinav Vishnu, Pacific Northwest National Laboratory, USA

Jan-Jan Wu, Academia Sinica, Taiwan Wei-Jen Wang, National Central University, Taiwan

Jon B. Weissman, University of Minnesota, USA Matt Wolf, Georgia Institute of Technology, USA Albert Zomaya, University of Sydney, Australia

xxi

CCGrid 2011 Reviewers

Sriram Aananthakrishnan Sherif Abdelwahed

Vijay Srinivas Agneeswaran Muhammad Aleem

Nawab Ali Athanasios Antoniou

Anne Auger Jim Basney

Leonardo Bautista Josep Ll. Berral Laurent Bobelin Ryan Braithwaite John Bresnahan

Rodrigo N. Calheiros Miguel Camelo

Ghislain Charrier Qian Chen

Wei-Fan Chiang Nitin Chiluka

Pierre-Nicolas Clauss Xabriel J. Collazo-Mojica

Minh Quan Dang Thomas De Ruiter Simon Delamare Javier Delgado

Benjamin Depardon Marcos Dias De Assuncao

James Dinan Mohammed El Mehdi Diouri

Bruno Donassolo Abhishek Dubey

Trilce Estrada Ikki Fujiwara Ana Gainaru

Jean-Patrick Gelas Stéphane Genaud

Nathan Gnanasambandam Brice Goglin Ã. Íñigo Goiri Francesc Guim

Zach Hill Tasuku Hiraishi Alan Humphrey

Bahman Javadi Emmanuel Jeannot

Hui Jin Hideyuki Jitsumoto

Gueyoung Jung Gokul Kandiraju

Gilles Kassel Balazs Kegl Dries Kimpe

Tsafack Landry Lars Larsson

Gaël Le Mahec Damien Le Moal

Guodong Li Wubin Li

Xiaofei Liao Jason Maassen Mario Macias

Ming Mao Gabriel Marin

Michael Maurer Rajat Mehrotra

Lucas Mello Schnorr Hamid Mohammadi Fard

Ramses Morales Stephanie Moreaud

Thomas Morris Adrian Muresan Bogdan Nicolae Satoshi Ohshima

Ana-Maria Oprescu P-O Östberg

Simon Ostermann Swann Perarnau

Kassian Plankensteiner Gargi Prasad

Weizhong Qiang Jean-Noel Quintin

Thomas Ropars Arkaitz Ruiz-Alvarez

Siqi Shen Xuanhua Shi

Takao Shimayoshi

xxii

Huaiming Song Ozan Sonmez Kyle Spafford Petter Svard

Greg Szubzda Bing Tang

Christophe Thiery Johan Tordsson

Xuping Tu

Kees Van Reeuwijk Ana Lucia Varbanescu

Mario Villamizar David Villegas

Song Wu Naotaka Yamamoto

Bulent Yener Deqing Zou

xxiii

The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters, Grids, and Clouds

Larry Smarr

Abstract: Today we are living in a data-dominated world where distributed scientific instruments, as well as clusters, generate terabytes to petabytes of data which are stored increasingly in specialized campus facilities or in the Cloud. It was in response to this challenge that the NSF funded the OptIPuter project to research how user-controlled 10Gbps dedicated lightpaths (or "lambdas") could transform the Grid into a LambdaGrid. This provides direct access to global data repositories, scientific instruments, and computational resources from "OptIPortals," PC clusters which provide scalable visualization, computing, and storage in the user's campus laboratory. The use of dedicated lightpaths over fiber optic cables enables individual researchers to experience "clear channel" 10,000 megabits/sec, 100-1000 times faster than over today's shared Internet-a critical capability for data-intensive science. The seven-year OptIPuter computer science research project is now over, but it stimulated a national and global build-out of dedicated fiber optic networks. U.S. universities now have access to high bandwidth lambdas through the National LambdaRail, Internet2's WaveCo, and the Global Lambda Integrated Facility. A few pioneering campuses are now building on-campus lightpaths to connect the data-intensive researchers, data generators, and vast storage systems to each other on campus, as well as to the national network campus gateways. I will give examples of the application use of this emerging high performance cyberinfrastructure in genomics, ocean observatories, radio astronomy, and cosmology. Speaker Bio: Larry Smarr is the founding Director of the California Institute for Telecommunications and Information Technology (Calit2), a UC San Diego/UC Irvine partnership, and holds the Harry E. Gruber professorship in Computer Science and Engineering (CSE) at UCSD’s Jacobs School. At Calit2, Smarr has continued to drive major developments in information infrastructure-- including the Internet, Web, scientific visualization, virtual reality, and global telepresence--begun during his previous 15 years as founding Director of the National Center for Supercomputing Applications (NCSA). Smarr served as principal investigator on NSF’s OptIPuter project and currently is principal investigator of the Moore Foundation’s CAMERA project and co-principal investigator on NSF’s GreenLight project. In October 2008 he was the Leadership Dialog Scholar in Australia.

xxiv

Eucalyptus: Open Source Infrastructure for Cloud Computing Rich Wolski

Abstract: Elastic Utility Computing Architecture for Linking Your Programs to Useful Systems, an open-source software infrastructure that implements IaaS-style cloud computing. The goal of Eucalyptus is to allow sites with existing clusters and server infrastructure to host a cloud that is interface-compatible with Amazon's AWS. In addition, through its interfaces, Eucalyptus is able to host cloud platform services such as AppScale (an open source implementation of Google's AppEngine), and Hadoop making it possible the "mix and match" different service paradigms and configurations within the cloud. Finally, Eucalyptus is capable of leveraging a heterogeneous collection of virtualization technologies within a single cloud making it possible to incorporate resources that have already been virtualized without modifying their configuration. The talk will focus on specific features of the system that are designed to enable rapid development, prototyping, and deployment of local computing clouds, particularly for debugging and/or application development purposes. It will also discuss experiences with hosting the Eucalyptus Public Cloud (EPC) as a free public cloud platform for experimental use and the ability to use the EPC in conjunction with commercial web development services that target AWS, such as Right scale. Finally, we will discuss our experiences in building and supporting open source cloud infrastructure and point to potential future directions that we believe will enable greater innovation. Speaker Bio: Dr. Rich Wolski is the Chief Technology Officer and co-founder of Eucalyptus Systems Inc., and a Professor of Computer Science at the University of California, Santa Barbara (UCSB). Having received his M.S. and Ph.D. degrees from the University of California at Davis (while a research scientist at Lawrence Livermore National Laboratory) he has also held positions at the University of California, San Diego, and the University of Tennessee. He has also been a strategic advisor to the San Diego Supercomputer Center and an adjunct faculty member at the Lawrence Berkeley National Laboratory Rich has led several national scale research efforts in the area of high-performance distributed computing and grid computing, is the author of numerous research articles concerning the empirical study of distributed systems, and is the progenitor of the Eucalyptus project.

xxv

Optimized Management of Power and Performancefor Virtualized Heterogeneous Server Clusters

Vinicius Petrucci∗, Enrique V. Carrera†, Orlando Loques∗, Julius C. B. Leite∗, Daniel Mosse‡∗Instituto de Computacao — Universidade Federal Fluminense, Niteroi, Brazil

Email: {vpetrucci, loques, julius}@ic.uff.br†Department of Electrical Engineering — Army Polytechnic School, P.O. Box 171-5-231B, Sangolqui, Ecuador

Email: [email protected]‡Department of Computer Science — University of Pittsburgh, USA

Email: [email protected]

Abstract—This paper proposes and evaluates an approach

for power and performance management in virtualized server

clusters. The major goal of our approach is to reduce power con-

sumption in the cluster while meeting performance requirements.

The contributions of this paper are: (1) a simple but effective

way of modeling power consumption and capacity of servers

even under heterogeneous and changing workloads, and (2) an

optimization strategy based on a mixed integer programming

model for achieving improvements on power-efficiency while

providing performance guarantees in the virtualized cluster.

In the optimization model, we address application workload

balancing and the often ignored switching costs due to frequent

and undesirable turning servers on/off and VM relocations. We

show the effectiveness of the approach applied to a server cluster

testbed. Our experiments show that our approach conserves

about 50% of the energy required by a system designed for

peak workload scenario, with little impact on the applications’

performance goals. Also, by using prediction in our optimization

strategy, further QoS improvement was achieved.

I. INTRODUCTION

An increasing number of server clusters are being deployedin data centers in order to support many network servicesin a seamless, transparent fashion. These server architecturesare becoming common in utility/cloud computing platforms[1], [2], such as Amazon EC2 and Google AppEngine. Theseplatforms normally have high processing and performancedemands, incurring in huge energy costs and contributingto indirectly deteriorate the environment by increasing CO2

generation [3].Many of today’s server clusters rely on virtualization tech-

niques to run different VM’s (Virtual Machines) on a singlephysical server, allowing the hosting of multiple independentservices. Server virtualization has been widely adopted in datacenters around the world for improving resource usage effi-ciency and service isolation. In fact, several VM technologieshave been developed to support server virtualization (e.g., Xen,VMware) [4], [5].

From a power and performance management point of view,the adoption of virtualization technologies has turned power-aware optimization on clusters into a new challenging researchtopic. Since virtualization provides a means for server consol-idation through on-demand allocation and live migration, ithelps to increase server utilization allowing to reduce the long

term use of computer resources and their associated powerdemands. Specifically, the ability to dynamically distributeserver workloads in a virtualized server environment allows forturning off physical machines during periods of low activity,and bringing them back up when the demand increases.Moreover, this on/off mechanism can be combined with DVFS(Dynamic Voltage and Frequency Scaling) — a techniquethat consists of varying the frequency and voltage of themicroprocessor at runtime according to processing needs —to provide even better power and performance optimizations.

However, virtual server consolidation must be carefullyemployed, considering individual service performance levels,in order to guarantee QoS (Quality of Service) related to theircorresponding SLA’s (Service Level Agreements). This is nota simple task since large clusters include many heterogeneousmachines and each machine could have different capacity andpower-consumption according to the number of CPU cores,their frequencies, their specific devices, and so forth. Addi-tionally, the incoming workload of a server can significantlychange over time. Recent studies show that servers in datacenters are loaded between 10 and 50 percent of peak, witha CPU utilization that rarely surpasses 40 percent [6]. Oursolution to this problem is to efficiently manage the clusterresources leveraging server virtualization and CPU DVFStechniques.

Considering the issues described above, this paper proposesa simple but effective way of modeling power consumptionand capacity of servers even under very heterogeneous andchanging workloads. Leveraging the power and capacity mod-els we devise an optimization strategy to determine the mostpower efficient configuration (i.e., which servers must be activeand their respective CPU frequencies) that can handle time-varying workloads for different applications in the cluster.Furthermore, the optimization determines an optimized loaddistribution for the applications, which is a non-trivial taskconsidering that an application workload can be distributedamong multiple heterogeneous servers in a cluster.

The underlying mathematical formulation for optimizing thepower and performance in a virtualized cluster is given bya mixed integer programming (MIP) model. A preliminaryversion of this MIP model was first introduced and evaluated

!000111111 111111ttthhh IIIEEEEEEEEE///AAACCCMMM IIInnnttteeerrrnnnaaatttiiiooonnnaaalll SSSyyymmmpppooosssiiiuuummm ooonnn CCCllluuusssttteeerrr,,, CCClllooouuuddd aaannnddd GGGrrriiiddd CCCooommmpppuuutttiiinnnggg

!777888---000---777666!555---444333!555---666///111111 $$$222666...000000 ©©© 222000111111 IIIEEEEEEEEE

DDDOOOIII 111000...111111000!///CCCCCCGGGrrriiiddd...222000111111...111555

222333

by simulations in a previous work [7]. In this work, weprovide a set of experiments in a real cluster tested to applyour optimization model and control strategy, where the MIPformulation was also tailored to the power and capacity modelsproposed in this work.

The optimization problem is solved periodically and thesolution is used to configure the cluster. That is, to scalethe solution over time, considering that the applications haveindividual time-varying workloads, our optimization strategyenables the virtualized server system to react to load varia-tions and adapt its configuration accordingly. The proposedoptimization strategy also accounts for the overhead due toswitching servers on/off and disruptive migration of virtualservers.

The remainder of the paper proceeds as follows. The pro-posed power and performance management approach is pre-sented in Section II. The system model and server architecturetestbed used to apply our optimization proposal is described inSection III. In Section IV, we evaluate our approach throughexperiments driven by actual workload traces. Section Vsummarizes related work and Section VI concludes the paper.

II. OPTIMIZED MANAGEMENT APPROACH

Our approach consists of two parts. First, we provide asimple and effective way of modeling performance and powerconsumption of the servers in the cluster. Second, based on thepower and performance models, we describe an optimizationformulation implemented in a control loop strategy for powerand performance management in a virtualized server cluster.

A. Server Cluster Modeling1) Performance model: In order to model the performance

of each real machine, several basic relationships are important.However, before discussing the required relationships, it isuseful to define some terminology:

• M is the number of applications or services intended torun on the cluster.

• N is the number of physical servers in the cluster.• fij is a valid working frequency j ∈ {1 . . . Fi} for the

CPU on a given server i ∈ {1 . . . N}.• uij is the normalized utilization of the CPU at server-

frequency fij .• rij(uij) is the maximum number of requests of certain

type that a server i can attend per unit of time whenrunning at frequency j and its CPU utilization is uij .

• Rij is the maximum rij (i.e., when uij = 100%).Figure 1 shows the linear relationship existing between the

number of completed requests per second and the normalizedCPU utilization for five different machines in our server cluster(described in more detail in Section III). The web requestsused in the experiments are CPU-bound and consume a fixedamount of CPU time, and the frequency of the CPU was keptconstant. Specifically, we measured the performance of theservers, for each frequency, in terms of the number of requestsper second (req/s) that they can handle at a given target CPUutilization. To generate the benchmark workload, we used the

Fig. 1. Server performance at maximum CPU frequency as a function ofthe normalized CPU utilization

Fig. 2. Instance of server power consumption as a function of the serverperformance, varying its CPU frequency

httperf tool. Note that the case of web servers is similar toany other CPU-intensive services in a cluster.

We consider the CPU resource as the bottleneck in oursystem, because much of the required data for the web servicesare already cached in main memory after the system has beenrunning. Since the performance of a server is proportional to itsCPU utilization at constant frequency, the server performancecan be modeled as:

rij(uij) = Rij · uij

The performance of a server is also proportional to its CPUfrequency. Figure 2 shows the relationship existing betweenthese two variables. Since this relationship is also linear, theserver performance can be expressed as:

rij(uij) = riFi(uij) ·fij

fiFi

Based on the two previous relationships, we can concludethat the performance of a machine can be modeled as:

rij(uij) = RiFi · uij ·fij

fiFi

where RiFi and fiFi are the only constants defined for eachserver. Thus, we define the capacity of a server-frequency by

cij = RiFi · (fij/fiFi)

222444

Fig. 3. Server power consumption as a function of the server performancekeeping constant the CPU frequency

In addition, the normalized utilizations of the CPU at differentfrequencies when the workload is kept constant can be relatedby: uik = uij · fik

fijwhere k and j are valid CPU frequency

levels in a given server i.Since we consider a shared server cluster with multiple

distinct applications, it is quite impractical to assume a prioriany information about the incoming workload for those appli-cations because they all have different types of requests withdiverse processing requirements. Thus, we define and measurethe workload demand of an application by considering theutilization of the bottleneck resource in the system, whichis the CPU in our case. More formally, for each applicationk ∈ {1 . . .M}, we define

dk =N�

i=1

u�ik ·RiFi

to represent the workload of application k basically in terms ofthe sum of the CPU usage, given by the variable u�

ik, which ismonitored from the running system, for each server i allocatedto that application k.

2) Power Consumption Model: To model the power con-sumption of each real machine, besides the previous terminol-ogy, some additional terms are required:

• pij(uij) is the average power consumption of a server-frequency fij when its CPU utilization is uij .

• PMij is the maximum (busy) power consumption of aserver-frequency when uij = 100% at fij .

• Pmij is the minimum (idle) power consumption of aserver-frequency when idle at fij .

Figure 3 shows the measured relationship existing betweenthe power consumed by a given server and its performanceat 2.57Ghz. We obtained a similar relationship for the otheravailable frequencies. Since the relationship between these twovariables is linear when the CPU frequency is kept constant,the power consumed by a server can be modeled as:

pij(uij) = Pmij + (PMij − Pmij) · uij

To obtain the several PMij and Pmij values, we can usethe relationship displayed in Figure 4. This shows that the

Fig. 4. Server power consumption as a function of the CPU frequency at100% utilization

power consumed by a server is proportional (in a quadraticform) to its CPU frequency. Thus, the server power consump-tion can be modeled as:

PMij = PMi1 +KM · (fij − fi1)2

Pmij = Pmi1 +Km · (fij − fi1)2

where KM = (PMiFi − PMi1)/(fiFi − fi1)2 and Km =(PmiFi − Pmi1)/(fiFi − fi1)2. In other words, the powerconsumption of a server basically depends on its workingfrequency (fij), its normalized CPU utilization (uij), and theconstants PMi1, Pmi1, PMiFi and PmiFi .

Although related works, such as [8], consider a cubicrelationship between power consumption and the frequency atwhich a server is running, especially for the processor, we areassuming a quadratic relationship between these two variables.Because our main goal is to predict power consumption valuesbased on only four power measurements (i.e., Pmi1, PmiF ,PMi1, and PMiF for a given server i), we have proposed amore straightforward alternative, given by the previous powerequations, to the curve fitting processes. Using this simplifiedpower model, we observed that the lowest mean squared errorsare obtained with the quadratic relationship instead of thecubic one. As we can see in the sequel, the normalized rootmean squared errors for the power vs. frequency relationshipare always less than 2%. When we used the cubic relationship,the normalized root mean squared errors were as big as 5%.

3) Model Validation: To validate our analytical model, wehave measured the performance and power consumption offive different machines in our cluster, varying their CPUfrequency and CPU utilization. The machines used on ourexperiments, termed Edison (Intel Core i7), Farad (Intel Core2 Duo), Galvani (Intel Core i5), Gauss (AMD Athlon 64)and Tesla (AMD Phenom X4), have very different technolo-gies including a wide range of working frequencies, whosemain characteristics are described in Section III. In our case,recording the average values for power and performance for agiven load over a period of 2 minutes was sufficient to obtaina good average and low standard deviation. We measuredthe average power consumption and performance for different

222555

Machine r vs u r vs f p vs u p vs f

Farad 1.5% 0.8% 0.4% 0.0%Galvani 1.8% 0.8% 1.5% 1.7%Edison 1.5% 0.8% 0.3% 1.2%Tesla 1.7% 1.7% 0.2% 1.2%Gauss 2.5% 1.3% 0.8% 1.4%

TABLE INORMALIZED ROOT MEAN SQUARED ERRORS

levels of load in 10% increments for each server at each CPUfrequency available. We measured AC power directly using theWattsUP Pro power meter with 1% accuracy [9]. The powermeasurements thus represent the whole machines, not onlytheir CPUs.

Table I summarizes our measurements. Comparing the mea-sured values of performance with our model, the maximumnormalized root mean squared error for the performance–CPUutilization relationship (r vs u) is 2.5%. For the performance–CPU frequency relationship (r vs f), the normalized root meansquared error is less than 1.7% in all the cases. Similarly, thecomparison of the measured values of power with our modelshows that the power–CPU utilization relationship (p vs u)has a maximum normalized root mean squared error of 1.5%.The normalized root mean squared error for the power–CPUfrequency relationship (p vs f) is always less than 1.7%.

Although the normalized root mean squared errors are quitesmall, it is also important to mention that the maximum abso-lute error is always less than 8.3% (Tesla in the performance–CPU utilization relationship), which validates our analyticalmodel with very good accuracy. Given these accurate powerand performance relationships, we simplify the extensive andtime-consuming task of power and performance benchmark,which commonly leads to high customization and setup costsfor archiving power-aware optimizations in server clusters[10], [11].

B. Optimization Model and Control StrategyThe cluster optimization problem we consider is to deter-

mine the most power efficient configuration (i.e., which serversmust be active and their respective CPU frequencies) that canhandle a certain set of application workloads. The underlyingmathematical formulation for minimizing the power consumedwhile meeting performance requirements in the virtualizedcluster problem is given by a MIP (Mixed Integer Program-ming) model.

1) Optimization Model: The information about power con-sumption and capacity of servers can then be used by our op-timization model to make runtime decisions. In our virtualizedenvironment, the applications are implemented as VMs, whichare assigned to physical servers. In our optimization model,several different VMs can be mapped (consolidated) to a singleserver. This way, our strategy for power-aware optimization ina cluster of virtualized servers is to select the physical serverswhere the VMs will be running and the frequencies of eachreal machine according to the CPU utilization that every VMassociated to an application requires. Our optimization model

also allows an application to be implemented using multipleVMs, which are mapped and distributed to different physicalservers. This is useful for load balancing purposes, that is,when an application workload demands more capacity thanthe supported by a single physical server.

In addition to the terminology introduced previously, thefollowing decision variables are defined: xijk is a binaryvariable that denotes whether server i uses frequency j to runapplication k (xijk = 1), or not (xijk = 0); yij is a binaryvariable that denotes whether server i is active at frequencyj (yij = 1), or not (yij = 0). The uij variable denotes theutilization of server i running at frequency j.

The problem formulation for the cluster optimization is thusgiven by the following MIP model:

MinimizeN�

i=1

Fi�

j=1

[PMij · uij + Pmij · (yij − uij)]

+ swt cost(U, y) (1)+ mig cost(A, z)

Subject toM�

k=1

dk · xijk ≤ cij · uij ∀i ∈ {1 . . . N}, ∀j ∈ {1 . . . Fi} (2)

N�

i=1

Fi�

j=1

xijk = 1 ∀k ∈ {1 . . .M} (3)

Fi�

j=1

yij ≤ 1 ∀i ∈ {1 . . . N} (4)

uij ≤ yij ∀i ∈ {1 . . . N}, ∀j ∈ {1 . . . Fi} (5)

xijk ∈ {0, 1}, yij ∈ {0, 1}, uij ∈ [0, 1] (6)

The objective function given by Equation (1) is to find acluster configuration that minimizes the overall server clustercost in terms of power consumption. The power consumptionfor the servers is given by the analytical model from SectionII-A. The objective function has also two terms to account forserver switching and VM relocation costs. To model a serverswitching cost, we have included in the model a new parameterinput Uij ∈ {0, 1} to denote the previous cluster usage interms of which machines are turned on and off; that is, Uij = 1if machine i is running at speed j. Similarly, the new parameterinput Aik ∈ {0, 1} denotes which application was previouslyassociated with which server. More precisely, we may definethe switching cost function as follows: swt cost(U, y) =�N

i=1

�Fi

j=1[SWT P · (yij · (1−Uij)+Uij · (1− yij))]. Theconstant SWT P represents a penalty for turning a machineoff (if it was on) and for turning a machine on (if it wasoff), which can mean an additional power consumed to bootthe respective server machine. We do not consider the cost ofchanging frequencies, since the overhead incurred is quite low[10]. Actually, if Uij = 1 for a given server i ∈ {1 . . . N}, weset Uij = 1 for all j ∈ {1 . . . Fi} to avoid taking into accountfrequency switching costs.

To facilitate modeling the VM relocation cost, we define a

222666

new decision variable zik ∈ {0, 1} to denote if an applicationk is to be allocated on server i. We also include in the modela new set of constraints, which are defined as follows: xijk ≤zik ∀i ∈ {1 . . . N}, ∀j ∈ {1 . . . Fi}, ∀k ∈ {1 . . .M}. Therelocation cost function can be defined similarly based on theprevious allocation input variable A and the new allocationdecision z, such as mig cost(A, z) =

�Ni=1

�Mk=1[MIG P ·

(zik · (1−Aik)+Aik · (1− zik))]. We assume that both serverswitching on/off and relocation penalties can be estimated ina real server cluster. In our case, these costs were basicallydefined by the average power consumption of the servers inthe cluster.

In the optimization model, the constraints (2) prevent apossible solution in which the demand of all applicationsk ∈ M running on a server i at frequency j exceeds thecapacity of that server, given by our analytical model (SectionII-A). The constraints (3) guarantee that a server i at frequencyj is assigned to a given application k. The constraints (4) aredefined so that only one frequency j on a given server i canbe chosen. The constraints (5) are used to relate the decisionvariable yij with the uij variable in the objective function. Thesolution is thus given by the decision variable xijk, where i

is a server reference, j is the server’s status (i.e., its operatingfrequency or inactive status, when j = 0), and k represents therespective allocated application. The expressions on (6) definethe variables of the problem.

For example, a MIP solution that returns x123 = 0.3 andx273 = 0.7 means that 30% of the workload of application3 is executed on server 1 running on the CPU frequency 2,and 70% of that workload is executed on server 2 runningat frequency 7. The optimized solution also provides usefulinformation to implement a load balancing scheme, such asa weighted round-robin policy, for distributing the applicationworkloads among the servers in the cluster.

2) Optimization control loop: The dynamic optimizationbehavior is attained by periodically monitoring the properinputs of the optimization model and solving a new optimalconfiguration given the updated values of the inputs. In otherwords, as some of the input variables may change over time,such as the application workload demands, a new instance ofthe optimization problem is constructed and solved dynami-cally at run-time.

Specifically, for the optimization control strategy to work, asshown in Figure 5, a Monitor module is employed to collectrun-time data of the server cluster system, such as servicearrival rate and resource utilization. Next, the monitored valuesare evaluated by the Predictor module to filter and estimatefuture values from past observed values for the workloadtime series. Several prediction techniques can be adopted,such as well-known linear regression, exponential smoothingor autoregressive moving average (ARMA) models [12]. Forexample, it is important to prevent overloaded servers insituations where the load increases quicker than the timerequired to turn servers on. Thus, besides an accurate powerand performance modeling, our optimization approach requiresa load prediction capabilities in order to further reduce those

on/off disruptions and anticipate fast load increments (seeSection III-D).

Fig. 5. Optimization control loop

The MIP model is then updated using the observed andpredicted values for the workload of the applications and anew instance of the optimization problem is constructed. TheOptimizer module implements an optimization algorithmand solves the new optimization problem instance, yielding anew optimal configuration. The Control module is responsi-ble for applying the changes in the server cluster, transitioningit to a new state given by the new optimized configuration.

III. CLUSTER TESTBED

In our virtualized cluster environment, targeted at hostingmultiple independent web applications, we need to maintaintwo correlated objectives. First, we aim to guarantee quality-of-service requirements for the applications, for example, bycontrolling average cluster load. Second, to reduce costs, theset of currently active servers and their respective processor’sspeeds should be dynamically configured to minimize thepower consumption.

A. Server architecture

The target testbed architecture (shown in Figure 6) consistsof a cluster of servers running CentOS Linux 5.5. The clusterpresents a single view to the clients through the front-end ma-chine, which distributes incoming requests among the actualservers that process the requests (also known as workers). Theworker servers run Xen 3.1 hypervisor enabled to support theexecution of virtual machines and capable of CPU DVFS. Ourtestbed includes a cluster of seven physical machines, whosecharacteristics and configurations are detailed in Table II. Thecolumn “Off power (W)” indicates the power dissipated bythe servers when they are switched off.

The web requests from the clients are redirected, basedon a load balancing scheme, to the VMs allocated to theapplications that run Apache web servers on top of physicalserver machines. Each VM has a copy of a simple CPU-bound PHP script to characterize a web application. We definethree different applications in the cluster, named App1, App2and App3. To generate the workload for each application,we developed a workload generator tool to simulate realisticworkload patterns (like WorkCup’98), which is described inSection III-C.

The load generator machine is physically connected tothe front-end machine via a gigabit ethernet switch. Theworker machines and the front-end machine are connected via

222777

another gigabit switch. The front-end machine has two gigabitnetwork interfaces, each one connected to one of the switches.All worker machines share an NFS (Network File System)storage mounted in the front-end to store the VM images andconfiguration files.

Fig. 6. Testbed server architecture

Notice that the boot time, when a server is switched offby a halt or shutdown command, is on average 1.5 minutes.In the future experiments, we believe that this time couldbe minimized by using mechanisms of suspend/hibernate onthe servers. However, the suspend/hibernate mechanism is notsupported yet by the Xen-enabled kernel in our servers.

The front-end machine is the main component in thearchitecture including three entities: (a) VM manager, (b)Load balancer, and (c) Controller. The VM Manager isimplemented using the OpenNebula toolkit [13] which en-ables the management of the VMs in the cluster, such asdeployment and monitoring. Since we are running virtualizedweb servers in our cluster, the Load Balancer implementsa weighted round-robin scheduler strategy provided by theApache’s mod_proxy_balancer module [14]. Finally, theController implements the control loop strategy describedin Section II-B2, which consists of an external module writtenin Python that relies on the primitives provided by the VM

Manager and Load Balancer modules. The goal of theController is to dynamically configure the applicationsover the processor’s cluster, in order to reduce power consump-tion, while meeting the application’s performance require-ments. The Controller: (a) monitors application’s load, (b)decides about a possible relocation in the cluster to attend thedemand, based on the proposed optimization model, and (c)applies the new configuration executing necessary primitives(see details of the optimization strategy in Section II-B).

For the integration of the Controller with the VM

Manager module, we used the XML-RPC interface pro-vided by OpenNebula. To dynamically manage the Load

Balancer module, we implemented and installed in thefront-end machine a new Apache module written in C calledmod frontend, which relies on the Apache proxy load bal-ancer module functionalities. This Apache module exposes a

generic interface (through the XML-RPC protocol) to enable(and disable) a worker server, to assign weights in the loadbalancing scheme to the servers, and to monitor the clusterQoS properties, such as arrival rate, throughput and responsetime of the web requests execution. To manage the clustermachines, we issue commands remotely via SSH, for example,to turn servers off and to adjust servers frequencies. To resumea machine from the shutdown state, the servers support theWake-on-LAN mechanism, which allows a machine to beturned on remotely by a network message.

B. Cluster QoS measurement and controlIn this work, the cluster system load is measured by the

sum of CPU usage of the virtual machines (running theapplications) in the cluster. Because we are interested in themacro behavior of our optimization, the web applications aresimplified in that we assume that there is no state informa-tion to be maintained for PHP requests. As for performanceoptimization guarantees, while still meeting temporal QoSrequirements of the requests in the cluster, we provide a tightcontrol on the system load; that is, the CPU utilization inthe whole cluster. With such a control strategy, we show thatwe can also satisfy the user-perceived experience commonlymeasured by the response time of the requests in the cluster.

We observed that when the CPU utilization of a VM(running a web server application) is low, the average responsetime is also low. This is expected since no time is spentqueuing due to the presence of many concurrent requests.On the other hand, when the utilization is high, the responsetime goes up abruptly as the CPU utilization gets close to100%. To meet response time requirements, we shall performdynamic optimizations in the cluster before the machine isnear to saturate, for example, above a target of 80% of CPUutilization, which gives us a simple but effective measure tokeep the response time under control. Moreover, since wemeasured the bottleneck resource (CPU) as a “black-box”, thiscontrol scheme can work for different types of HTTP requestswith distinct average execution times, or even for other kindsof applications in the cluster.

The response time is defined as the difference betweenthe time a response is generated and the moment the serverhave accepted the request. To obtain the response time forthe web applications we have implemented a new Apachemodule (as mentioned on Section III-A) that collects such timeinformation using pre-defined hooks provided by the ApacheModule API [14]. The hooks used to measure the responsetime are: post_read_request and log_transaction.The post_read_request function allows our module tostore the moment a request was accepted by Apache and thelog_transaction function allows it to store the momenta response was sent to the client.

Based on these hooks, we can also measure the workload ar-rival rate by accumulating the number of requests for a giveninterval in the post_read_request phase. Similarly, wecan measure the throughput in the log_transaction

phase. To smooth out high short-term fluctuations in these

222888

Machine Role Processor model Cores CPU model RAMFrequency Boot Off

steps time (s) power (W)

Xingo Load generator Intel Core i7 4 2.93 8GB — — —Henry Front-end AMD Athon 64 1 2.2GHz 4GB — — —Edison Worker Intel Core i7 4 2.67GHz 8GB 9 103 6Farad Worker Intel Core 2 Duo 2 3.0GHz 6GB 2 96 8Galvani Worker Intel Core i5 4 2.67GHz 8GB 12 97 7Gauss Worker AMD Phenom X4 II 4 3.2GHz 8GB 4 89 5Tesla Worker AMD Athon 64 X2 2 3.0GHz 6GB 8 88 6

TABLE IISPECIFICATION OF THE MACHINES USED IN OUR CLUSTER

measurements readings, we have implemented a filter proce-dure in our Apache module based on a single exponentialmoving average. In the filter module, we have used α = 0.6as the default smoothing factor. Based on our experiments,this value was found suitable.

We mainly define the QoS for the applications in the servercluster by means of the maximum allowed response time forthe respective requests — the application deadline. In theoptimization approach, we should satisfy this QoS metric bymanaging two variables. First, the cluster utilization must bebelow a reference value, in our case target util = 80%.Second, in case the current response time is above the deadlineresponse time, we determine a tardiness value by the ratioof current response time to the deadline response time, whichdenotes “how far” the average application requests are fromthe respective requests deadline.

More precisely, to guarantee that the response time restric-tion is met, we regulate the application workload demandvector by multiplying it to the tardiness preset limit toachieve the desired response time requirement. In like manner,to maintain the cluster utilization around its reference value,we normalize the application workload demand vector by thetarget util to achieve the desired cluster utilization bound.Typically these reference values for utilization and requestsdeadline depends on workload demand characteristics andperformance purposes for a target cluster system.

That way, the target workload demand for a given appli-cation k, which was previously defined in Section II-A1, canbe rewritten as d�k = (dk / target util) ∗ tardiness ∗ γ. Areinforcement constant γ can be used to give smoothing orboosting performance effect to allow the cluster to be more orless responsive to changes with respect to the response timetardiness metric. In our case, we assume a γ = 1.0, but othervalues for it need to be investigated in future experiments. So,the modified workload demand aims to drive our controllerto dynamically configure the cluster to meet the QoS goalaccordingly.

C. Application workloadsWe generated three distinct workload traces using the 1998

World Cup Web logs to characterize the multiple applicationsin the cluster [15]. We calculated the number of requestsover a fixed and customized sampling interval for differentparts of the World Cup logs over time. We also adapted theoriginal workload data to fit our cluster capacity, which is

measured in requests per second, for a fixed type of request.As shown in Figure 7, the applications within the cluster canhave a wide range of request demands. Each application in thecluster is assigned a maximum allowed response time specifiedat 500ms. In a real system, the deadline response times aregiven based on particular requirements of the applications andcan assume different target values, which is allowed by ourapproach.

0

50

100

150

0 1000 2000 3000 4000 5000 6000 7000

Load

(re

q/s)

App1

0

50

100

150

0 1000 2000 3000 4000 5000 6000 7000

Load

(re

q/s)

App2

0

50

100

150

0 1000 2000 3000 4000 5000 6000 7000

Load

(re

q/s)

Time (s)

App3

Fig. 7. Workload traces for three different applications

Each workload demand point in Figure 7 is the average of36 seconds, describing an experiment execution of 2 hours induration. In our experiments, we assume that App1, App2,and App3 services can be distributed and balanced amongall servers in the cluster. To generate the workloads, weimplemented a closed-loop workload generator written in Cthat sends web requests to the applications. The idea employedwas to dynamically change the “think time” of a given set ofrunning web sessions (implemented using P-threads) withinthe system every 36 seconds, which is the time granularityof our workload traces. Specifically, the workload generatorinterprets each data point in Figure 7 as the target load togenerate, deriving a new adjustment on the “think time” for therunning web sessions, during each 36-second interval. Clearly,the “think time” is inversely proportional to the workloadintensity; that is, the smaller the value of the “think time”,the greater the workload intensity.

D. Load predictionPrevious works show that CPU server utilization [12] and

Web server traffic [16] can be modeled through simple linear

222!

models. Dinda and O’Hallaron [12] conclude that the hostload is consistently predictable by practical models such asAR, reaching a very good prediction confidence for up to30 seconds into the future. On the other hand, Baryshnikovet. al. [16] prove that even bursts of traffic can be predictedfar enough in advance using a simple linear-fit extrapolation.Based on that, our predictor module implements a linearregression based on k past values to predict the load of theapplications in the cluster.

The load prediction in our case should not be too far intothe future. To anticipate fast load increments, we must takeinto account the time for turning servers on. In order to avoidon/off disruptions we must consider the break-even thresholdof the servers. That is, the time required by an unloaded serverto consume as much energy as required for turning a serveroff and on immediately. In other words, our predictor needsto see as far into the future as the break-even threshold, thatin our case, it is no more that 110 seconds, according to themaximum boot time for the machines in the cluster (see TableII).

Leveraging predictive capabilities, the optimization ap-proach can cope with well-known patterns in measurementsreadings, such as trends, that indicate anticipatory conditionsfor triggering new optimized configurations. This plays animportant role for improving the optimization decisions, guar-anteeing a better quality of service for the applications inserver systems [17]. For example, turning on a new serverin advance before a resource saturation occurs. As will beshown in Section IV, the applicability of our predictor, whichuses a linear regression fit through (k = 10) past workloadmeasurements in the control loop, was enough to anticipatefast load increments, improving the QoS in the system. In thefollowing experiments, since the controller sampling period isset to 25 seconds, our controller looks ahead 4 steps to helpanticipate the time to boot a new server. We aim to evaluate therobustness and accuracy of this kind of prediction and otherprediction methods in future work.

IV. EXPERIMENTAL EVALUATION

To evaluate our optimization approach, we have carried outa set of experiments in the cluster environment describedin Section III. The proposed optimization formulation wasimplemented using the solver IBM ILOG CPLEX 11 [18],which employs very efficient algorithms based on the branch-and-cut exact method to search for optimal configurationsolutions. The optimization problem worst-case execution timewas less than 1 second, considering our cluster setup of 3applications and 5 physical servers (Section III). Scalabilityissues concerning large-scale clusters using the adopted opti-mization model are discussed in the conclusion, Section VI.

In the experiments, we adopted an optimization controlloop of 25 seconds interval. This value was found suitablein our case, but it typically depends on the target system andworkload demand characteristics, like variance. In future work,we plan to investigate the use of two control loops with distinctintervals concerning different kinds of dynamic operations in

the cluster. For example, a shorter control interval for loadbalancing and DVFS adaptations and a longer interval forserver on/off and VM migration/replication activities.

In this work, we mainly evaluated the effectiveness ofour approach by means of the energy consumption reduc-tion and QoS violation in the cluster as compared to theLinux on-demand and performance CPU governors. Theperformance governor keeps all servers turned on at fullspeed to handle peak load and dynamic optimization is notconducted. The ondemand governor allows for managing theCPU frequency depending on system utilization, but does notinclude server on/off mechanisms.

The allocation scheme for the performance andondemand governors are statically configured such that re-sponse time goals can be met under the worst-case (peak)workload arrival rate. That is, a static number of VMs weredeployed and configured on each physical server for everyapplication running in the cluster. The respective workloadallocation shares for all the applications to be balanced amongthe physical servers in the cluster are statically defined asfollows: 28% for Edison; 13% for Farad; 25% for Galvani;26% for Gauss; and 8% for Tesla. That means we adopted afixed weighted round-robin method for application workloadbalancing given the measured capacity for each server (cf.Section II-A1).

The experimental results for the optimization executionare given in Figure 8. The upper and middle plots show,respectively, the throughput and response time measured foreach application in the cluster. The bottom plots show thecluster load as a function of the average CPU utilization ofthe currently active servers (left plot) and the cluster powerconsumption (right plot), measured using the WattsUp Pro[9] to sample the power data every second. So, the energyconsumption of the cluster could be calculated by the sumthese power values over each second interval time.

As shown in Table III, by using our approach, the energyconsumption in the cluster is substantially reduced. The mainargument for this greater energy savings is the fact thatthe baseline (or idle) power consumption of current servermachines is substantially high. This in turn makes server on/offmechanisms (used by our optimization) very power-efficient.The energy savings are reduced percentages with respect to theperformance policy. The QoS violations were calculatedby the sum of how many times the response time of allapplications have missed their deadlines divided by the totalnumber of requests completed over the experiment. Clearly,there is a trade-off between QoS and energy minimization.Both performance and ondemand policies produced zeroQoS violation. On the other hand, the energy optimizationachieved by our approach lead to some QoS violations. Byusing a predictive optimization policy, about 7% more energywas consumed although with the benefit of an approximately50% less QoS violations.

Based on these experiments, using realistic time-varyingworkload traces, we show that a cluster system managed usingour approach is effective in reducing the power consumption

333000

0

50

100

150

0 2000 4000 6000T

hrou

ghpu

t (re

q/s)

Time (s)

App1

0

50

100

150

0 2000 4000 6000

Thr

ough

put (

req/

s)

Time (s)

App2

0

50

100

150

0 2000 4000 6000

Thr

ough

put (

req/

s)

Time (s)

App3

0

200

400

600

800

0 2000 4000 6000

Res

pons

e tim

e (m

s)

Time (s)

App1

0

200

400

600

800

0 2000 4000 6000

Res

pons

e tim

e (m

s)

Time (s)

App2

0

200

400

600

800

0 2000 4000 6000

Res

pons

e tim

e (m

s)

Time (s)

App3

0 10 20 30 40 50 60 70 80 90

100

0 2000 4000 6000

Clu

ster

load

(%

)

Time (s)

100

200

300

400

500

600

700

0 2000 4000 6000

Pow

er (

Wat

ts)

Time (s)

Fig. 8. Dynamic optimization execution with predictive capabilities

Policy Energy (Wh) Savings QoS violations

Performance 281.45 — 0%On-demand 241.84 14.07% 0%Optimization (reactive) 134.67 52.15% 9.72%Optimization (predictive) 143.61 48.97% 4.79%

TABLE IIICOMPARISON OF CLUSTER MANAGEMENT POLICIES

costs, when compared to built-in Linux power managementpolicies, while still maintaining good QoS levels. As forfuture experiments, we plan to compare our optimization withimplementations of other power management policies foundin the literature, like those described [10], [19], which adoptheuristics to turn on/off servers in a predefined order basedon a power-efficiency metric for all servers in the cluster. Inthat case, a migration/replication for the VMs and workloadbalancing schemes would need to be designed accordingly.

V. RELATED WORK

Optimization approaches, based on the bin packing prob-lem, for configuring virtualized servers are described in theliterature, such as [20], [21]. However, their models are notdesigned for power-aware optimization. In [22], the authorspresent a two-layer control architecture aimed at providingpower-efficient real-time guarantees for virtualized computingenvironments. The work relies on a sound control theory basedframework, but does not addresses dynamic virtual machineallocation or migration and machine on/off mechanisms in aserver cluster context. A heuristic-based solution outline forthe power-aware consolidation problem of virtualized clustersis presented in [23], but it does not provide a means to findsolutions that are at least near to the optimal.

The non-virtualized approach described in [24] determinespredefined thresholds to switch servers on/off (given by sim-

ulation) based on CPU frequency values for the active serversthat must match in order to change the cluster configuration tomeet the performance requirements. However, their proposaldoes not provide an optimal solution as a combination ofwhich servers should be active and their respective CPUfrequencies. The problem of optimally allocating a powerbudget among servers in a cluster in order to minimize meanresponse time is described in [8]. In contrast to our approach,which is designed to minimize the cluster power consumptionwhile meeting performance requirements, their problem posesa different optimization objective and does not address avirtualized environment.

A dynamic resource provisioning framework is developed in[25] based on lookahead control theory. Their approach doesnot consider DVFS, but the proposed optimization controlleraddresses interesting issues, such as switching machines costs(i.e., overhead of turning servers on and off) and predictiveconfiguration model. Our approach also copes with switchingcosts and it includes the ability to incorporate predictiontechniques in the optimization strategy to improve the con-figuration decisions. A power-aware migration framework forvirtualized HPC (High-performance computing) applications,which accounts for migration costs during virtual machinereconfigurations, is presented in [19]. Similarly to our ap-proach, it relies on virtualization techniques used for dynamicconsolidation, although the application domains are differentand the algorithm solution does not provide optimal clusterconfigurations.

Contrasting with [23], [25], [19], our approach takes ad-vantage of dynamic voltage/frequency scaling (DVFS) mech-anisms to optimize the server’s operating frequencies to reducethe overall energy consumption. An approach based on DVFSis presented in [26] for power optimization and end-to-end de-lay control in multi-tier web servers. Recent approaches, such

333111

as presented in [10], [11], [27], [28], [29], [30], [31] also relyon DVFS techniques and include server on/off mechanismsfor power optimization. However, these approaches are notdesigned (and not applicable) for virtualized server clusters.That is, they do not consider multiple application workloadsin a shared cluster infrastructure.

VI. CONCLUSIONS AND FUTURE WORK

In this paper, we presented an approach for power opti-mization in virtualized server clusters, including power andperformance models, an optimization MIP model and a strat-egy for dynamic configuration. In the optimization model, weaddressed application workload balancing and the often ig-nored switching costs due to frequent and undesirable turningservers on/off. The major goal of this work was to developand demonstrate the feasibility of our optimization approachfor power and performance management, while providingexperiments with realistic Web traffic driven by time-varyingdemands with real system measurements.

Our experiments show that our strategy can achieve powersavings of 52% compared to an uncontrolled system (usedas baseline), while simultaneously meeting applications’ per-formance goals as specified in their response time deadlines.Moreover, unlike our approach, many power managementschemes like [11] and [27], need extensive power and per-formance measurements, which would need to be repeatedevery time upon new installations, server failures, upgradesor changes. The power and performance models presented inthis work can simplify that benchmarking activities.

We plan to develop additional experiments, using extramachines in a larger cluster setup, with state-aware appli-cations considering VMs deployed in a separated databasetier. To address this, we intend to use the Rubis multi-tierbenchmark [32] which models an auction site similar to eBay.Currently, our optimization models are solved using a black-box exact MIP solver (IBM ILOG CPLEX), without exploringany specific knowledge about the optimization problem. Onedrawback of solving MIP models to optimality is the highcomputation time for large instances of the problem (see scal-ability results in [7]). To achieve the necessary scalability forsolving the optimization problem within a bound processingtime, we rely on the use of features, such as solution timelimit and optimal gap tolerance, provided by state-of-art solverpackages. At present, we are working on heuristic techniques,such as proposed in [33], integrated with our exact modelingapproach for this particular problem, which is a variant of theNP-hard bin packing problem.

In large data centers, with thousands of machines, smallersets (like hundreds) of machines can be allocated (dynami-cally) to support dedicated clusters for specific applications,which could be managed autonomously (as pointed out by[2]). Thus, we believe that our optimization approach can besuccessfully applied in this context.

REFERENCES

[1] B. Hayes, “Cloud computing,” Commun. ACM, vol. 51, no. 7, pp. 9–11,2008.

[2] K. Church, et al., “On delivering embarrassingly distributed cloudservices,” in HotNets, 2008.

[3] McKinsey & Company, “Revolutionizing data center efficiency,”http://uptimeinstitute.org, 2008.

[4] P. Barham et al., “Xen and the art of virtualization,” in SOSP’03. ACM,2003, pp. 164–177.

[5] E. L. Haletky, VMware ESX Server in the Enterprise: Planning andSecuring Virtualization Servers.

[6] L. A. Barroso and U. Holzle, “The case for energy-proportional com-puting,” Computer, vol. 40, no. 12, pp. 33–37, 2007.

[7] V. Petrucci et al., “A dynamic optimization model for power andperformance management of virtualized clusters,” in 1st Int’l Conf.on Energy-Efficient Computing and Networking. In cooperation withSIGCOMM. ACM, 2010.

[8] A. Gandhi et al., “Optimal power allocation in server farms,” inSIGMETRICS ’09. ACM, 2009.

[9] Electronic Educational Devices, “Watts Up PRO,”http://www.wattsupmeters.com/, 2010.

[10] C. Rusu et al., “Energy-efficient real-time heterogeneous server clusters,”in RTAS’06, 2006, pp. 418–428.

[11] T. Heath et al., “Energy conservation in heterogeneous server clusters,”in PPoPP ’05. ACM, 2005, pp. 186–195.

[12] P. A. Dinda and D. R. O’Hallaron, “Host load prediction using linearmodels,” Cluster Computing, vol. 3, no. 4, pp. 265–280, 2000.

[13] OpenNebula, “The open source toolkit for cloud computing,”http://opennebula.org/, 2010.

[14] The Apache Software Foundation, “Apache HTTP server version 2.2,”http://httpd.apache.org/docs/2.2/, 2008.

[15] M. Arlitt and T. Jin, “Workload characterization of the 1998 world cupweb site,” IEEE Network, Tech. Rep., 1999.

[16] Y. Baryshnikov et al., “Predictability of web-server traffic congestion,”in WCW ’05, Sep. 2005, pp. 97–103.

[17] C. Santana et al., “Load forecasting applied to soft real-time webclusters,” in ACM SAC, Sierre, Switzerland, March 2010.

[18] ILOG, Inc., “CPLEX,” 2009, http://www.ilog.com/products/cplex/.[Online]. Available: http://www.ilog.com/products/cplex/

[19] A. Verma et al., “pMapper: power and migration cost aware applicationplacement in virtualized systems,” in Middleware’08, 2008, pp. 243–264.

[20] M. Bichler et al., “Capacity planning for virtualized servers,” Workshopon Information Technologies and Systems (WITS), Milwaukee, Wiscon-sin, USA, 2006.

[21] G. Khanna et al., “Application performance management in virtualizedserver environments,” 10th IEEE/IFIP Network Operations and Man-agement Symposium, pp. 373–381, 0-0 2006.

[22] Y. Wang et al., “Power-efficient response time guarantees for virtualizedenterprise servers,” in RTSS’08, 2008, pp. 303–312.

[23] S. Srikantaiah et al., “Energy aware consolidation for cloud computing,”in USENIX Workshop on Power Aware Computing and Systems, 2008.

[24] E. N. Elnozahy et al., “Energy-efficient server clusters,” in Power-AwareComputer Systems, ser. Lecture Notes in Computer Science, 2003.

[25] D. Kusic et al., “Power and performance management of virtualizedcomputing environments via lookahead control,” Cluster Computing,vol. 12, no. 1, pp. 1–15, 2009.

[26] T. Horvath et al., “Dynamic voltage scaling in multitier web servers withend-to-end delay control,” IEEE Transactions on Computers, vol. 56,no. 4, pp. 444–458, 2007.

[27] L. Bertini et al., “Power optimization for dynamic configuration inheterogeneous web server clusters,” Journal of Systems and Software,vol. 83, no. 4, pp. 585 – 598, 2010.

[28] B. Khargharia et al., “Autonomic power and performance managementfor computing systems,” Cluster Computing, vol. 11, no. 2, pp. 167–181,2008.

[29] N. Kandasamy et al., “Self-optimization in computer systems via on-line control: Application to power management,” in ICAC ’04, 2004, pp.54–61.

[30] V. Sharma et al., “Power-aware QoS management in web servers,” inRTSS’03, 2003, pp. 63–72.

[31] J. S. Chase et al., “Managing energy and server resources in hostingcenters,” SIGOPS Oper. Syst. Rev., vol. 35, no. 5, pp. 103–116, 2001.

[32] RUBiS, “Rubis: Rice university bidding system,” http://rubis.ow2.org/,2010.

[33] M. Haouari and M. Serairi, “Heuristics for the variable sized bin-packingproblem,” Comput. Oper. Res., vol. 36, no. 10, pp. 2877–2884, 2009.

333222

11th ieee/acm international symposium on cluster, cloud...

Documents