[ieee 2013 13th ieee/acm international symposium on cluster, cloud and grid computing (ccgrid) -...
TRANSCRIPT
An Autonomic and Scalable Management System for Private Clouds
Matthieu Simonin∗, Eugen Feller∗, Anne-Cecile Orgerie†, Yvon Jegou∗ and Christine Morin∗∗INRIA, Myriads team, IRISA, Rennes - France
{matthieu.simonin, eugen.feller, yvon.jegou, christine.morin}@inria.fr†CNRS, Myriads team, IRISA, Rennes - France
Abstract—Snooze is an open-source scalable, autonomic, andenergy-efficient virtual machine (VM) management frameworkfor private clouds. It allows users to build compute infrastruc-tures from virtualized resources. Particularly, once installedand configured, it allows its users to submit and controlthe life-cycle of a large number of VMs. For scalability, thesystem relies on a self-organizing hierarchical architecture.Moreover, it implements self-healing mechanisms in case offailure to enable high availability. It also performs energy-efficient distributed VM management through consolidationand power management techniques. This poster focuses onthe experimental validation of two main properties of Snooze:scalability and fault-tolerance.
Keywords-autonomic computing; cloud computing; resourcemanagement; scalability; fault tolerance
I. INTRODUCTION
The ever growing appetite of new applications for Cloud
resources leads to managing resources at large scale while
providing performance isolation and efficient use of underly-
ing hardware [1]. Resource management is challenging and
requires autonomic mechanisms to deal with scalability and
fault-tolerance [2]. Autonomic systems are self-managing,
which is a highly desired feature for large-scale distributed
systems such as Clouds.
In this poster, we will present an extended validation of
Snooze [3], [4], an autonomic and energy-efficient man-
agement system for private clouds. Embedding a self-
configuring hierarchical architecture, Snooze provides self-
healing mechanisms that are able to rebuild the hierarchy on
the fly in the event of failures.
II. SNOOZE ARCHITECTURE
Previous work has shown that hierarchical architectures
can greatly contribute to system scalability [5], [1]. Fol-
lowing this principle of splitting the knowledge among
independent managers, Snooze is based on an autonomic
hierarchical architecture shown in Figure 1. Servers (i.e.
physical nodes) are represented by light gray rectangles.
Each computing resource is managed by a Local Con-troller (LC) which monitors the VMs of the node and
enforces the VM life-cycle and node management com-
mands coming from higher levels of the hierarchy. A GroupManager (GM) is in charge of managing a subset of LCs. It
integrates energy-management mechanisms such as overload
���������
��� �������
�������������
���������
��� �������� ��� ����������� ��������
������������� ������������� ������������� �������������
���� �� �� �� ������ ��
����������
�� �������������
��������
���������
Figure 1. Snooze architecture
/ underload mitigation and VM consolidation policies. A
Group Leader (GL) manages the GMs, it assigns the LCs
to GMs, and it deals with clients VM submission requests.
To support self-configuration and healing, Snooze inte-
grates bi-directional heartbeat protocols at all levels of the
hierarchy. When a system service is started on a node it
is statically configured to become either a LC or a GM.
In the example depicted in Figure 1, we chose to have
three GMs but this number is defined by the administrator
depending on the desired level of redundancy (for fault-
tolerance and performance). When the services boot, the first
step in hierarchy construction involves the election of a GL
among the GMs. After the GL election, other GMs need
to register with it. For a LC to join the hierarchy, it first
needs to discover the current GL and get a GM assigned
by contacting the GL. Once it is assigned to a GM, it can
register with it. Self-healing is performed at all levels of the
hierarchy. It involves the detection of missing heartbeats and
the recovery from LC, GM, and GL failures.
VM and LC resource utilization changes over time. In
order to support VM management decisions such as VM
dispatching, placement, underload/overload mitigation, and
VM consolidation, VM and LC monitoring is performed
at all layers of the system. At the computing layer VMs
are monitored and VM resource utilization is periodically
2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
978-0-7695-4996-5/13 $26.00 © 2013 IEEE
DOI 10.1109/CCGrid.2013.44
198
transferred by the LCs to their assigned GM. At the man-
agement layer, to keep monitoring light and scalable, GMs
periodically send summary information to the GL including
the aggregated resource utilization for all the LCs they
manage.
Snooze is written in Java. It currently comprises 15,000
lines of highly modular code. It is distributed in open-source
under the GPL v2 license [6]. It relies on the libvirt library
which provides a uniform interface to most of the modern
hypervisors [7].
III. EXPERIMENTAL EVALUATION
Snooze has been validated on Grid’5000, the French
experimental test-bed to support experiment-driven research
in all areas of computer science related to parallel, large-
scale or distributed computing [8]. Grid’5000 comprises
6,500 cores geographically distributed in 10 sites linked
with a dedicated gigabit network. The platform supports in-
advance reservations and users are able to deploy their own
environment (i.e. operating system) on the reserved nodes.
In contrast to [4] which only shows the impact of Snooze’s
distributed VM management, in this poster, experiments
were conducted to evaluate the scalability and fault tolerance
of the Snooze’s hierarchy. First, to test the scalability in
number of hosted VMs, we launched an experiment with
1,000 VMs. For this experiment, we used 126 physical nodes
from 4 different Grid’5000 sites: 1 GL, 8 GMs and 117
LCs (hosting the 1,000 VMs). For this experiment, it took
about 15 minutes to deploy the nodes in Grid’5000 and
completely build the hierarchy from scratch: deployment of
a Debian environment on the managing nodes, NFS and
SSH configuration, installation of Snooze code, and self-
configuration of the hierarchy (GL election, GM allocation
for each LC). Then, it took about 15 minutes to run the
1,000 VMs: dispatching of the VMs onto LCs, deployment
of the VM images, and starting of the 1,000 VMs.
Then, we perform a second experiment to test the scala-
bility of the hierarchy itself. We launched progressively 50
GMs (one by one), and we measured the network traffic
on the GL incurred by the self-configuration (GL election,
and heartbeat mechanism initialization among the GMs)
and self-healing mechanisms (heartbeats messages). Figure 2
shows these measurements. As we can see, the amount of
traffic is proportional to the number of GMs, but it remains
light (about 25 kB/s for 50 GMs) because GMs only send
aggregated data for all their LCs to the GL.
Similar experiments were performed to evaluate the scal-
ability of the other levels of the hierarchy (for example,
one GM with 350 LCs). We also switch off nodes to cause
failures in the hierarchy and we measured the time to rebuild
the hierarchy depending on the failure case (crash of a GM,
the GL or an LC, or crash of several of them). In all the cases
(except when all the hierarchy nodes are killed of course),
Snooze was able to rebuild the hierarchy in less than one
Figure 2. Network traffic caused on the GL by the progressive start of 50Group Managers
minute even for large-scale experiments. These experiments
are not described here in detail due to lack of space.
IV. CONCLUSION
Snooze [3], [4] is a virtual machine management frame-
work based on a hierarchical approach. It is open-source [6],
and can thus be used by any research team dealing with
Cloud computing, energy efficiency, and autonomic resource
management in large-scale distributed systems. This poster
will present experimental validations of two key properties
required by Cloud resource managers: scalability and fault-
tolerance.
REFERENCES
[1] A. Gulati, G. Shanmuganathan, A. Holler, and I. Ahmad,“Cloud-scale resource management: challenges and tech-niques,” in USENIX conference on Hot topics in cloud com-puting (HotCloud), 2011.
[2] R. Buyya, R. Calheiros, and X. Li, “Autonomic Cloud Com-puting: Open Challenges and Architectural Elements,” in Inter-national Conference on Emerging Applications of InformationTechnology (EAIT), 2012, pp. 3–10.
[3] E. Feller, C. Rohr, D. Margery, and C. Morin, “Energy Man-agement in IaaS Clouds: A Holistic Approach,” in CLOUD:IEEE International Conference on Cloud Computing, 2012.
[4] E. Feller, L. Rilling, and C. Morin, “Snooze : A Scalable andAutonomic Virtual Machine Management Framework for Pri-vate Clouds,” in CCGrid: IEEE/ACM International Symposiumon Cluster, Cloud, and Grid Computing, 2012.
[5] S. Perera and G. D., “Enforcing user-defined management logicin large scale systems,” in IEEE Congress on Services, 2009,pp. 243–250.
[6] Snooze, “http://snooze.inria.fr,” GPL v2 licence, 2012.
[7] Red Hat, “libvirt: The virtualization API,” http://libvirt.org/,2012.
[8] R. Bolze et al., “Grid’5000: A Large Scale And Highly Recon-figurable Experimental Grid Testbed,” International Journal ofHigh Performance Computing Applications, vol. 20, no. 4, pp.481–494, 2006.
199