cernvm – a virtual software appliance for lhc applications

11
Journal of Physics: Conference Series OPEN ACCESS CernVM – a virtual software appliance for LHC applications To cite this article: P Buncic et al 2010 J. Phys.: Conf. Ser. 219 042003 View the article online for updates and enhancements. You may also like Software installation and condition data distribution via CernVM File System in ATLAS A De Salvo, A De Silva, D Benjamin et al. - Status and Roadmap of CernVM D Berzano, J Blomer, P Buncic et al. - Recent Developments in the CernVM-File System Server Backend R Meusel, J Blomer, P Buncic et al. - Recent citations Operating an HPC/HTC Cluster with Fully Containerized Jobs Using HTCondor, Singularity, CephFS and CVMFS Oliver Freyermuth et al - The CMS monitoring infrastructure and applications Christian Ariza-Porras et al - Rucio beyond ATLAS: experiences from Belle II, CMS, DUNE, EISCAT3D, LIGO/VIRGO, SKA, XENON Mario Lassnig et al - This content was downloaded from IP address 219.68.131.108 on 07/01/2022 at 18:18

Upload: others

Post on 11-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Physics Conference Series

OPEN ACCESS

CernVM ndash a virtual software appliance for LHCapplicationsTo cite this article P Buncic et al 2010 J Phys Conf Ser 219 042003

View the article online for updates and enhancements

You may also likeSoftware installation and condition datadistribution via CernVM File System inATLASA De Salvo A De Silva D Benjamin et al

-

Status and Roadmap of CernVMD Berzano J Blomer P Buncic et al

-

Recent Developments in the CernVM-FileSystem Server BackendR Meusel J Blomer P Buncic et al

-

Recent citationsOperating an HPCHTC Cluster with FullyContainerized Jobs Using HTCondorSingularity CephFS and CVMFSOliver Freyermuth et al

-

The CMS monitoring infrastructure andapplicationsChristian Ariza-Porras et al

-

Rucio beyond ATLAS experiences fromBelle II CMS DUNE EISCAT3DLIGOVIRGO SKA XENONMario Lassnig et al

-

This content was downloaded from IP address 21968131108 on 07012022 at 1818

CernVM ndash a virtual software appliance for LHC applications

P Buncic1 C Aguado Sanchez1 J Blomer1 L Franco1 A Harutyunian23

P Mato1 Y Yao3

1 CERN 1211 Geneve 23 Geneva Switzerland 2Armenian e-Science Foundation Yerevan Armenia

3Yerevan Physics Institute after AI Alikhanyan Yerevan Armenia

4 Lawrence Berkeley National Laboratory 1 Cyclotron Road CA 94720

Abstract CernVM is a Virtual Software Appliance capable of running physics applications

from the LHC experiments at CERN It aims to provide a complete and portable environment

for developing and running LHC data analysis on any end-user computer (laptop desktop) as

well as on the Grid independently of Operating System platforms (Linux Windows MacOS)

The experiment application software and its specific dependencies are built independently from

CernVM and delivered to the appliance just in time by means of a CernVM File System

(CVMFS) specifically designed for efficient software distribution The procedures for building

installing and validating software releases remains under the control and responsibility of each

user community We provide a mechanism to publish pre-built and configured experiment

software releases to a central distribution point from where it finds its way to the running

CernVM instances via the hierarchy of proxy servers or content delivery networks In this

paper we present current state of CernVM project and compare performance of CVMFS to

performance of traditional network file system like AFS and discuss possible scenarios that

could further improve its performance and scalability

1 Introduction

In order to allow physicists working on LHC experiments at CERN to effectively use their desktops

and laptops for analysis as well as Grid[1] interface we have created CernVM[2] a lightweight

Virtual Software Appliance intended to run on any OS platform providing consistent and effortless

installation of experiment software

CernVM is built using rBuilder tool from rPath[3] Thanks to this tool we are able to create a Virtual

Software Appliance that can run on any hardware (x86 and x86_64) and OS (Windows Mac Linux)

platform supportting a spectrum of hypervisors (VMware[4] Xen[6] KVM[5] VirtualBox[8]

HyperV[7]) This appliance in its simplest form fits in a compressed file of around 150 MB

containing an operating system sufficient to bootstrap the entire experiment software stack from a built

in dedicated network file system This approach allows us to develop and maintain a single Virtual

Software Appliance as a common platform for all four LHC experiments and lets each experiment

customize and feed such an appliance with their software releases in an efficient way

The basic building blocks of CernVM appliance are shown in Figure 1 Its operating system is based

on rPath Linux and uses Conary[9] package manager In addition to the usual package manager

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

ccopy 2010 IOP Publishing Ltd 1

functionality Conary is capable of analysing the build products and automatically classifying them

into sub-packages detecting all package build and runtime dependencies The resulting appliance

image contains the application together with just enough of the Operating System required to run it

This approach has a clear advantage since the fewer components we put in the image the less

maintenance is necessary

To facilitate the appliance configuration and management tasks the appliance runs rPath Appliance

Agent software (rAA) that provides two user interfaces Firstly a Web based one for easy interaction

with the end-user and secondly an XML-RPC one to allow interaction with an instance of CernVM

using scripting language to carry out bulk or repetitive operations

Figure 1 Basic components of CernVM Virtual Software Appliance

Since the underlying rPath Linux is binary compatible with Scientific Linux[10] we can reuse the

software packages already built and tested in that environment and simply run them unmodified under

CernVM The application running in the virtual machine will trigger the download of necessary

binaries and libraries by referencing them for the first time In response the file system will download

and cache new application files and reuse already downloaded ones The net result is that the amount

of software that has to be downloaded in order to run typical experiment tasks in the virtual machine is

reduced by an order of magnitude compared to the case when a full software release is installed

locally In the case of LHC experiments this means reduction from 8-10 GB to typically less than

1GB

Thanks to support for various hypervisors binary compatibility with SLC4 capability to run

unmodified software of all LHC experiments small OS footprint (150 MB) use of standard HTTP

protocol for its file system with aggressive caching that minimizes network traffic and multiple

configuration interfaces CernVM can easily be deployed and configured on either end user

workstation or various computing fabrics that support virtualization like Amazon EC2[11] or Nimbus

Scientific Cloud[12]

2 CernVM File System

The file system optimized for software distribution (CMVFS) plays a key role in the ldquothin appliancerdquo

model used by CernVM It is implemented as a File System in User Space (FUSE) module[13]

CVMFS has been designed to make a directory tree stored on a Web server look like a local read-only

file system on the virtual machine On the client side this requires only outgoing HTTPHTTPS

connectivity to the Web server Assuming that the files are read only CVMFS performs aggressive

file level caching Both files and file metadata are cached on local disk as well as on shared proxy

servers allowing the file system to scale to a very large number of clients

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

2

Figure 2 Software releases are first uploaded by the

experimentrsquos release manager to a dedicated virtual machine

tested and then published on a Web server acting as central

software distribution point

The procedures for building installing and validating software release remain in the hands of and in

the responsibility of each user community To each community we provide an instance of CernVM

that runs CVMFS in readwrite mode allowing experiment software manager to install test and

publish configured experiment software releases to our central distribution point (Figure 2) At present

all four LHC experiments are supported and already distribute about 100 GB of their software using

this method Adding another user community is a straightforward simple and scalable procedure since

all users may but do not necessarily have to share the same central software distribution point (HTTP

server)

First implementation of CVMFS was based on GROW-FS[14] file system which was originally

provided as one of the private file system options available in Parrot[15] Parrot traps the system IO

calls resulting in a performance penalty and occasional compatibility problems with some

applications Since in the virtual machine we have full access to the kernel and can deploy our own

kernel modules we have constructed CVMFS from modified Parrot and GROW-FS code and adapted

them to run as a FUSE kernel module adding in the process some new features to improve scalability

and performance and to simplify deployment The principal new features in CVMFS compared to

GROW-FS are

- Using FUSE kernel module allows for in-kernel caching of file attributes

- Capability to work in offline mode (providing that all required files are cached)

- Possibility to use the hierarchy of file catalogues on the server side

- Transparent file compressiondecompression

- Dynamical expansion of environment variables embedded in symbolic links

In the current deployment model we envisage a hierarchy of cache servers deployed between a client

and the central server (Figure 3a) allowing the delivery network to be organized as a fine-grained

distribution tree Proxy servers may serve CernVM clients as well as other proxy servers In this model

each proxy server stores the union set of required data of its sub tree By adding and removing proxy

servers the tree can be adapted to respond to required load

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

3

At present such a deployment model is fully supported and CernVM will always use the nearest proxy

server as pointed to by the configuration service at start up time In spite of reasonable flexibility this

approach is still static in nature requires additional manual effort to setup and maintain the proxy

hierarchy and clearly has a single point of failure

a) b)

Figure 3 In the current software version CernVM clients fetch the data files from central

repository via the hierarchy of proxy servers (a) as specified by the configuration file obtained

from the configuration server at boot time We plan to remove a single point of failure using

content distribution network on WAN and self organizing P2P-like cache sharing on LAN (b)

To remove a single point of failure we can exploit an existing commercial infrastructure like Amazons

CloudFront[17] service If we mirror our repository to Amazon Simple Storage Service (S3)[18]

CloudFront Service will assure that users access the content using some nearby mirror This is done by

assigning a canonical DNS name to an S3 resource by which the client transparently connects to the

closest mirror Mirroring is done on demand and expired content will be deleted from mirror servers

automatically An alternative to CloudFront that works on similar principles is the Coral Content

Distribution Network[19]

If we consider the scenario where a single site runs many CernVM instances we may still run into a

local bandwidth bottleneck if every CernVM client is independently fetching data from content

delivery network To tackle that problem we are adding to CernVM the capability to discover other

CernVM instances on a local network using Avahi[20] system for service discovery Since every

CernVM instance keeps a local cache of files it has already downloaded it can also share this cache

with neighbours (acting as an ad-hoc proxy) This way we can achieve a self-organizing network on a

LAN level in a manner similar to peer-to-peer networks while retaining most of the existing CernVM

components accessing a top level repository residing on a content distributions network such as

CloudFront or Coral possibly via a chain of additional proxy caches

This combined model (Figure 3b) has all of the ingredients necessary to assure scalability supporting

thousands of CernVM clients with the reliability expected from a file system and performance

matching that of traditional network file systems (such as NFS or AFS) on a local network

3 CVMFS Benchmarks

To exercise CernVM and to compare the performance of its CVMFS file system on local and wide

area networks we use the ROOT[16] application benchmark stressHepix in version 522 The

stressHepix benchmark is a collection of several ROOT benchmarks that cover typical HEP tasks Our

benchmarks also include compiling stressHepix test suite with GCC 34 While running the

benchmark the system has to find and fetch binaries libraries and header files from the network file

system but once loaded the files remain in a local cache Since the local cache is an ordinary

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

4

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

CernVM ndash a virtual software appliance for LHC applications

P Buncic1 C Aguado Sanchez1 J Blomer1 L Franco1 A Harutyunian23

P Mato1 Y Yao3

1 CERN 1211 Geneve 23 Geneva Switzerland 2Armenian e-Science Foundation Yerevan Armenia

3Yerevan Physics Institute after AI Alikhanyan Yerevan Armenia

4 Lawrence Berkeley National Laboratory 1 Cyclotron Road CA 94720

Abstract CernVM is a Virtual Software Appliance capable of running physics applications

from the LHC experiments at CERN It aims to provide a complete and portable environment

for developing and running LHC data analysis on any end-user computer (laptop desktop) as

well as on the Grid independently of Operating System platforms (Linux Windows MacOS)

The experiment application software and its specific dependencies are built independently from

CernVM and delivered to the appliance just in time by means of a CernVM File System

(CVMFS) specifically designed for efficient software distribution The procedures for building

installing and validating software releases remains under the control and responsibility of each

user community We provide a mechanism to publish pre-built and configured experiment

software releases to a central distribution point from where it finds its way to the running

CernVM instances via the hierarchy of proxy servers or content delivery networks In this

paper we present current state of CernVM project and compare performance of CVMFS to

performance of traditional network file system like AFS and discuss possible scenarios that

could further improve its performance and scalability

1 Introduction

In order to allow physicists working on LHC experiments at CERN to effectively use their desktops

and laptops for analysis as well as Grid[1] interface we have created CernVM[2] a lightweight

Virtual Software Appliance intended to run on any OS platform providing consistent and effortless

installation of experiment software

CernVM is built using rBuilder tool from rPath[3] Thanks to this tool we are able to create a Virtual

Software Appliance that can run on any hardware (x86 and x86_64) and OS (Windows Mac Linux)

platform supportting a spectrum of hypervisors (VMware[4] Xen[6] KVM[5] VirtualBox[8]

HyperV[7]) This appliance in its simplest form fits in a compressed file of around 150 MB

containing an operating system sufficient to bootstrap the entire experiment software stack from a built

in dedicated network file system This approach allows us to develop and maintain a single Virtual

Software Appliance as a common platform for all four LHC experiments and lets each experiment

customize and feed such an appliance with their software releases in an efficient way

The basic building blocks of CernVM appliance are shown in Figure 1 Its operating system is based

on rPath Linux and uses Conary[9] package manager In addition to the usual package manager

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

ccopy 2010 IOP Publishing Ltd 1

functionality Conary is capable of analysing the build products and automatically classifying them

into sub-packages detecting all package build and runtime dependencies The resulting appliance

image contains the application together with just enough of the Operating System required to run it

This approach has a clear advantage since the fewer components we put in the image the less

maintenance is necessary

To facilitate the appliance configuration and management tasks the appliance runs rPath Appliance

Agent software (rAA) that provides two user interfaces Firstly a Web based one for easy interaction

with the end-user and secondly an XML-RPC one to allow interaction with an instance of CernVM

using scripting language to carry out bulk or repetitive operations

Figure 1 Basic components of CernVM Virtual Software Appliance

Since the underlying rPath Linux is binary compatible with Scientific Linux[10] we can reuse the

software packages already built and tested in that environment and simply run them unmodified under

CernVM The application running in the virtual machine will trigger the download of necessary

binaries and libraries by referencing them for the first time In response the file system will download

and cache new application files and reuse already downloaded ones The net result is that the amount

of software that has to be downloaded in order to run typical experiment tasks in the virtual machine is

reduced by an order of magnitude compared to the case when a full software release is installed

locally In the case of LHC experiments this means reduction from 8-10 GB to typically less than

1GB

Thanks to support for various hypervisors binary compatibility with SLC4 capability to run

unmodified software of all LHC experiments small OS footprint (150 MB) use of standard HTTP

protocol for its file system with aggressive caching that minimizes network traffic and multiple

configuration interfaces CernVM can easily be deployed and configured on either end user

workstation or various computing fabrics that support virtualization like Amazon EC2[11] or Nimbus

Scientific Cloud[12]

2 CernVM File System

The file system optimized for software distribution (CMVFS) plays a key role in the ldquothin appliancerdquo

model used by CernVM It is implemented as a File System in User Space (FUSE) module[13]

CVMFS has been designed to make a directory tree stored on a Web server look like a local read-only

file system on the virtual machine On the client side this requires only outgoing HTTPHTTPS

connectivity to the Web server Assuming that the files are read only CVMFS performs aggressive

file level caching Both files and file metadata are cached on local disk as well as on shared proxy

servers allowing the file system to scale to a very large number of clients

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

2

Figure 2 Software releases are first uploaded by the

experimentrsquos release manager to a dedicated virtual machine

tested and then published on a Web server acting as central

software distribution point

The procedures for building installing and validating software release remain in the hands of and in

the responsibility of each user community To each community we provide an instance of CernVM

that runs CVMFS in readwrite mode allowing experiment software manager to install test and

publish configured experiment software releases to our central distribution point (Figure 2) At present

all four LHC experiments are supported and already distribute about 100 GB of their software using

this method Adding another user community is a straightforward simple and scalable procedure since

all users may but do not necessarily have to share the same central software distribution point (HTTP

server)

First implementation of CVMFS was based on GROW-FS[14] file system which was originally

provided as one of the private file system options available in Parrot[15] Parrot traps the system IO

calls resulting in a performance penalty and occasional compatibility problems with some

applications Since in the virtual machine we have full access to the kernel and can deploy our own

kernel modules we have constructed CVMFS from modified Parrot and GROW-FS code and adapted

them to run as a FUSE kernel module adding in the process some new features to improve scalability

and performance and to simplify deployment The principal new features in CVMFS compared to

GROW-FS are

- Using FUSE kernel module allows for in-kernel caching of file attributes

- Capability to work in offline mode (providing that all required files are cached)

- Possibility to use the hierarchy of file catalogues on the server side

- Transparent file compressiondecompression

- Dynamical expansion of environment variables embedded in symbolic links

In the current deployment model we envisage a hierarchy of cache servers deployed between a client

and the central server (Figure 3a) allowing the delivery network to be organized as a fine-grained

distribution tree Proxy servers may serve CernVM clients as well as other proxy servers In this model

each proxy server stores the union set of required data of its sub tree By adding and removing proxy

servers the tree can be adapted to respond to required load

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

3

At present such a deployment model is fully supported and CernVM will always use the nearest proxy

server as pointed to by the configuration service at start up time In spite of reasonable flexibility this

approach is still static in nature requires additional manual effort to setup and maintain the proxy

hierarchy and clearly has a single point of failure

a) b)

Figure 3 In the current software version CernVM clients fetch the data files from central

repository via the hierarchy of proxy servers (a) as specified by the configuration file obtained

from the configuration server at boot time We plan to remove a single point of failure using

content distribution network on WAN and self organizing P2P-like cache sharing on LAN (b)

To remove a single point of failure we can exploit an existing commercial infrastructure like Amazons

CloudFront[17] service If we mirror our repository to Amazon Simple Storage Service (S3)[18]

CloudFront Service will assure that users access the content using some nearby mirror This is done by

assigning a canonical DNS name to an S3 resource by which the client transparently connects to the

closest mirror Mirroring is done on demand and expired content will be deleted from mirror servers

automatically An alternative to CloudFront that works on similar principles is the Coral Content

Distribution Network[19]

If we consider the scenario where a single site runs many CernVM instances we may still run into a

local bandwidth bottleneck if every CernVM client is independently fetching data from content

delivery network To tackle that problem we are adding to CernVM the capability to discover other

CernVM instances on a local network using Avahi[20] system for service discovery Since every

CernVM instance keeps a local cache of files it has already downloaded it can also share this cache

with neighbours (acting as an ad-hoc proxy) This way we can achieve a self-organizing network on a

LAN level in a manner similar to peer-to-peer networks while retaining most of the existing CernVM

components accessing a top level repository residing on a content distributions network such as

CloudFront or Coral possibly via a chain of additional proxy caches

This combined model (Figure 3b) has all of the ingredients necessary to assure scalability supporting

thousands of CernVM clients with the reliability expected from a file system and performance

matching that of traditional network file systems (such as NFS or AFS) on a local network

3 CVMFS Benchmarks

To exercise CernVM and to compare the performance of its CVMFS file system on local and wide

area networks we use the ROOT[16] application benchmark stressHepix in version 522 The

stressHepix benchmark is a collection of several ROOT benchmarks that cover typical HEP tasks Our

benchmarks also include compiling stressHepix test suite with GCC 34 While running the

benchmark the system has to find and fetch binaries libraries and header files from the network file

system but once loaded the files remain in a local cache Since the local cache is an ordinary

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

4

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

functionality Conary is capable of analysing the build products and automatically classifying them

into sub-packages detecting all package build and runtime dependencies The resulting appliance

image contains the application together with just enough of the Operating System required to run it

This approach has a clear advantage since the fewer components we put in the image the less

maintenance is necessary

To facilitate the appliance configuration and management tasks the appliance runs rPath Appliance

Agent software (rAA) that provides two user interfaces Firstly a Web based one for easy interaction

with the end-user and secondly an XML-RPC one to allow interaction with an instance of CernVM

using scripting language to carry out bulk or repetitive operations

Figure 1 Basic components of CernVM Virtual Software Appliance

Since the underlying rPath Linux is binary compatible with Scientific Linux[10] we can reuse the

software packages already built and tested in that environment and simply run them unmodified under

CernVM The application running in the virtual machine will trigger the download of necessary

binaries and libraries by referencing them for the first time In response the file system will download

and cache new application files and reuse already downloaded ones The net result is that the amount

of software that has to be downloaded in order to run typical experiment tasks in the virtual machine is

reduced by an order of magnitude compared to the case when a full software release is installed

locally In the case of LHC experiments this means reduction from 8-10 GB to typically less than

1GB

Thanks to support for various hypervisors binary compatibility with SLC4 capability to run

unmodified software of all LHC experiments small OS footprint (150 MB) use of standard HTTP

protocol for its file system with aggressive caching that minimizes network traffic and multiple

configuration interfaces CernVM can easily be deployed and configured on either end user

workstation or various computing fabrics that support virtualization like Amazon EC2[11] or Nimbus

Scientific Cloud[12]

2 CernVM File System

The file system optimized for software distribution (CMVFS) plays a key role in the ldquothin appliancerdquo

model used by CernVM It is implemented as a File System in User Space (FUSE) module[13]

CVMFS has been designed to make a directory tree stored on a Web server look like a local read-only

file system on the virtual machine On the client side this requires only outgoing HTTPHTTPS

connectivity to the Web server Assuming that the files are read only CVMFS performs aggressive

file level caching Both files and file metadata are cached on local disk as well as on shared proxy

servers allowing the file system to scale to a very large number of clients

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

2

Figure 2 Software releases are first uploaded by the

experimentrsquos release manager to a dedicated virtual machine

tested and then published on a Web server acting as central

software distribution point

The procedures for building installing and validating software release remain in the hands of and in

the responsibility of each user community To each community we provide an instance of CernVM

that runs CVMFS in readwrite mode allowing experiment software manager to install test and

publish configured experiment software releases to our central distribution point (Figure 2) At present

all four LHC experiments are supported and already distribute about 100 GB of their software using

this method Adding another user community is a straightforward simple and scalable procedure since

all users may but do not necessarily have to share the same central software distribution point (HTTP

server)

First implementation of CVMFS was based on GROW-FS[14] file system which was originally

provided as one of the private file system options available in Parrot[15] Parrot traps the system IO

calls resulting in a performance penalty and occasional compatibility problems with some

applications Since in the virtual machine we have full access to the kernel and can deploy our own

kernel modules we have constructed CVMFS from modified Parrot and GROW-FS code and adapted

them to run as a FUSE kernel module adding in the process some new features to improve scalability

and performance and to simplify deployment The principal new features in CVMFS compared to

GROW-FS are

- Using FUSE kernel module allows for in-kernel caching of file attributes

- Capability to work in offline mode (providing that all required files are cached)

- Possibility to use the hierarchy of file catalogues on the server side

- Transparent file compressiondecompression

- Dynamical expansion of environment variables embedded in symbolic links

In the current deployment model we envisage a hierarchy of cache servers deployed between a client

and the central server (Figure 3a) allowing the delivery network to be organized as a fine-grained

distribution tree Proxy servers may serve CernVM clients as well as other proxy servers In this model

each proxy server stores the union set of required data of its sub tree By adding and removing proxy

servers the tree can be adapted to respond to required load

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

3

At present such a deployment model is fully supported and CernVM will always use the nearest proxy

server as pointed to by the configuration service at start up time In spite of reasonable flexibility this

approach is still static in nature requires additional manual effort to setup and maintain the proxy

hierarchy and clearly has a single point of failure

a) b)

Figure 3 In the current software version CernVM clients fetch the data files from central

repository via the hierarchy of proxy servers (a) as specified by the configuration file obtained

from the configuration server at boot time We plan to remove a single point of failure using

content distribution network on WAN and self organizing P2P-like cache sharing on LAN (b)

To remove a single point of failure we can exploit an existing commercial infrastructure like Amazons

CloudFront[17] service If we mirror our repository to Amazon Simple Storage Service (S3)[18]

CloudFront Service will assure that users access the content using some nearby mirror This is done by

assigning a canonical DNS name to an S3 resource by which the client transparently connects to the

closest mirror Mirroring is done on demand and expired content will be deleted from mirror servers

automatically An alternative to CloudFront that works on similar principles is the Coral Content

Distribution Network[19]

If we consider the scenario where a single site runs many CernVM instances we may still run into a

local bandwidth bottleneck if every CernVM client is independently fetching data from content

delivery network To tackle that problem we are adding to CernVM the capability to discover other

CernVM instances on a local network using Avahi[20] system for service discovery Since every

CernVM instance keeps a local cache of files it has already downloaded it can also share this cache

with neighbours (acting as an ad-hoc proxy) This way we can achieve a self-organizing network on a

LAN level in a manner similar to peer-to-peer networks while retaining most of the existing CernVM

components accessing a top level repository residing on a content distributions network such as

CloudFront or Coral possibly via a chain of additional proxy caches

This combined model (Figure 3b) has all of the ingredients necessary to assure scalability supporting

thousands of CernVM clients with the reliability expected from a file system and performance

matching that of traditional network file systems (such as NFS or AFS) on a local network

3 CVMFS Benchmarks

To exercise CernVM and to compare the performance of its CVMFS file system on local and wide

area networks we use the ROOT[16] application benchmark stressHepix in version 522 The

stressHepix benchmark is a collection of several ROOT benchmarks that cover typical HEP tasks Our

benchmarks also include compiling stressHepix test suite with GCC 34 While running the

benchmark the system has to find and fetch binaries libraries and header files from the network file

system but once loaded the files remain in a local cache Since the local cache is an ordinary

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

4

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

Figure 2 Software releases are first uploaded by the

experimentrsquos release manager to a dedicated virtual machine

tested and then published on a Web server acting as central

software distribution point

The procedures for building installing and validating software release remain in the hands of and in

the responsibility of each user community To each community we provide an instance of CernVM

that runs CVMFS in readwrite mode allowing experiment software manager to install test and

publish configured experiment software releases to our central distribution point (Figure 2) At present

all four LHC experiments are supported and already distribute about 100 GB of their software using

this method Adding another user community is a straightforward simple and scalable procedure since

all users may but do not necessarily have to share the same central software distribution point (HTTP

server)

First implementation of CVMFS was based on GROW-FS[14] file system which was originally

provided as one of the private file system options available in Parrot[15] Parrot traps the system IO

calls resulting in a performance penalty and occasional compatibility problems with some

applications Since in the virtual machine we have full access to the kernel and can deploy our own

kernel modules we have constructed CVMFS from modified Parrot and GROW-FS code and adapted

them to run as a FUSE kernel module adding in the process some new features to improve scalability

and performance and to simplify deployment The principal new features in CVMFS compared to

GROW-FS are

- Using FUSE kernel module allows for in-kernel caching of file attributes

- Capability to work in offline mode (providing that all required files are cached)

- Possibility to use the hierarchy of file catalogues on the server side

- Transparent file compressiondecompression

- Dynamical expansion of environment variables embedded in symbolic links

In the current deployment model we envisage a hierarchy of cache servers deployed between a client

and the central server (Figure 3a) allowing the delivery network to be organized as a fine-grained

distribution tree Proxy servers may serve CernVM clients as well as other proxy servers In this model

each proxy server stores the union set of required data of its sub tree By adding and removing proxy

servers the tree can be adapted to respond to required load

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

3

At present such a deployment model is fully supported and CernVM will always use the nearest proxy

server as pointed to by the configuration service at start up time In spite of reasonable flexibility this

approach is still static in nature requires additional manual effort to setup and maintain the proxy

hierarchy and clearly has a single point of failure

a) b)

Figure 3 In the current software version CernVM clients fetch the data files from central

repository via the hierarchy of proxy servers (a) as specified by the configuration file obtained

from the configuration server at boot time We plan to remove a single point of failure using

content distribution network on WAN and self organizing P2P-like cache sharing on LAN (b)

To remove a single point of failure we can exploit an existing commercial infrastructure like Amazons

CloudFront[17] service If we mirror our repository to Amazon Simple Storage Service (S3)[18]

CloudFront Service will assure that users access the content using some nearby mirror This is done by

assigning a canonical DNS name to an S3 resource by which the client transparently connects to the

closest mirror Mirroring is done on demand and expired content will be deleted from mirror servers

automatically An alternative to CloudFront that works on similar principles is the Coral Content

Distribution Network[19]

If we consider the scenario where a single site runs many CernVM instances we may still run into a

local bandwidth bottleneck if every CernVM client is independently fetching data from content

delivery network To tackle that problem we are adding to CernVM the capability to discover other

CernVM instances on a local network using Avahi[20] system for service discovery Since every

CernVM instance keeps a local cache of files it has already downloaded it can also share this cache

with neighbours (acting as an ad-hoc proxy) This way we can achieve a self-organizing network on a

LAN level in a manner similar to peer-to-peer networks while retaining most of the existing CernVM

components accessing a top level repository residing on a content distributions network such as

CloudFront or Coral possibly via a chain of additional proxy caches

This combined model (Figure 3b) has all of the ingredients necessary to assure scalability supporting

thousands of CernVM clients with the reliability expected from a file system and performance

matching that of traditional network file systems (such as NFS or AFS) on a local network

3 CVMFS Benchmarks

To exercise CernVM and to compare the performance of its CVMFS file system on local and wide

area networks we use the ROOT[16] application benchmark stressHepix in version 522 The

stressHepix benchmark is a collection of several ROOT benchmarks that cover typical HEP tasks Our

benchmarks also include compiling stressHepix test suite with GCC 34 While running the

benchmark the system has to find and fetch binaries libraries and header files from the network file

system but once loaded the files remain in a local cache Since the local cache is an ordinary

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

4

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

At present such a deployment model is fully supported and CernVM will always use the nearest proxy

server as pointed to by the configuration service at start up time In spite of reasonable flexibility this

approach is still static in nature requires additional manual effort to setup and maintain the proxy

hierarchy and clearly has a single point of failure

a) b)

Figure 3 In the current software version CernVM clients fetch the data files from central

repository via the hierarchy of proxy servers (a) as specified by the configuration file obtained

from the configuration server at boot time We plan to remove a single point of failure using

content distribution network on WAN and self organizing P2P-like cache sharing on LAN (b)

To remove a single point of failure we can exploit an existing commercial infrastructure like Amazons

CloudFront[17] service If we mirror our repository to Amazon Simple Storage Service (S3)[18]

CloudFront Service will assure that users access the content using some nearby mirror This is done by

assigning a canonical DNS name to an S3 resource by which the client transparently connects to the

closest mirror Mirroring is done on demand and expired content will be deleted from mirror servers

automatically An alternative to CloudFront that works on similar principles is the Coral Content

Distribution Network[19]

If we consider the scenario where a single site runs many CernVM instances we may still run into a

local bandwidth bottleneck if every CernVM client is independently fetching data from content

delivery network To tackle that problem we are adding to CernVM the capability to discover other

CernVM instances on a local network using Avahi[20] system for service discovery Since every

CernVM instance keeps a local cache of files it has already downloaded it can also share this cache

with neighbours (acting as an ad-hoc proxy) This way we can achieve a self-organizing network on a

LAN level in a manner similar to peer-to-peer networks while retaining most of the existing CernVM

components accessing a top level repository residing on a content distributions network such as

CloudFront or Coral possibly via a chain of additional proxy caches

This combined model (Figure 3b) has all of the ingredients necessary to assure scalability supporting

thousands of CernVM clients with the reliability expected from a file system and performance

matching that of traditional network file systems (such as NFS or AFS) on a local network

3 CVMFS Benchmarks

To exercise CernVM and to compare the performance of its CVMFS file system on local and wide

area networks we use the ROOT[16] application benchmark stressHepix in version 522 The

stressHepix benchmark is a collection of several ROOT benchmarks that cover typical HEP tasks Our

benchmarks also include compiling stressHepix test suite with GCC 34 While running the

benchmark the system has to find and fetch binaries libraries and header files from the network file

system but once loaded the files remain in a local cache Since the local cache is an ordinary

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

4

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

directory it fully benefits from Linux file system caches Table 1 lists properties of the stressHepix

working set The working set turns out to be a good approximation of the ROOT package in that it is

dominated by many small files However by working on blocks rather than on entire files AFS is still

able to save about 15 storage in the cache

Table 1 Properties of the stressHepix working set in the clientrsquos cache Note that AFS works

on 1KB blocks rather than on entire files ldquoLocal Copy refers to a minimal ROOT 522

installation which stressHepix is a subset of

Size [MB]

Number of

Files

Median of

File Sizes [KB]

10th

90th

Percentile

of File Sizes

AFS Cache 59 na na na

CVMFS Cache 69 628 40 320B 235 KB

Local Copy 309 4330 25 400B 348 KB

By the following benchmarks we intend to measure he overhead of CVMFS under typical workload

conditions Note that we do not measure applicationrsquos IO performance in fact all IO operations are

entirely performed on local storage Instead we measure the extra time penalty (t=tCVMFSAFS ndash tLOCAL)

resulting from application binaries search paths and include paths residing on a network file system

Figure 4 Benchmak setup

To put the numbers into context we compare CVMFS to the well-known AFS We use the same setup

as the CernVM production environment (Figure 4) The file server runs Apache 228 and Squid 26

for CVMFS and OpenAFS 148 for AFS having the repository stored on a read-only volume The

benchmarks are performed on a dedicated CernVM 101 Xen virtual machine connected to the server

via local Gbit Ethernet The performance loss due to virtualization[21] is consistently in the region of

5 In order to see the impact of wide area networks we shape the outgoing network traffic using tc

tool thereby we artificially include latency and bandwidth restrictions Since we observed instabilities

when shaping network traffic on Xen we shape on the server side only This seems to be valid for

simulation because the client only sends few small request packets

When all necessary files are cached (full cache) there is no performance loss whatsoever In the case

of CVMFS this is due to the simple 2-level lookup that first checks the local cache AFS has its cache

manager on the client side to register a call-back function on the AFS server which is invoked on

volume changes Thus there is no network traffic caused by the file system except for keep-alive

packets every few minutes

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

5

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

Figure 5 Time penalty as a function of latency with AFS and CVMFS

compared to local storage Running time of the benchmark on local storage

is about 3 minutes With standard window sizes of 16KiB TCP throughput

is directly affected by latency A round trip time of 100ms correspond to

observed parameters between client running on Amazon EC2 (US East

Coast) accessing file servers located at CERN

Figure 6 Impact of low bandwidth in wide area networks in addition to

high latency (100ms)

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

6

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

In the case of an empty cache the results shown in Figure 5 and Figure 6 begin to show effects of

network latency and limited network bandwidth Without particularly tuned TCP parameters we see

the strong TCP throughput decrease caused by the high bandwidth-delay product The linear growth of

time difference can by explained by the fact that we request a sequence of many small files In general

we see CVMFS performing consistently better than AFS in wide area networks We believe that this is

caused by the better flow control of TCP compared to the RxUDP stack which is used by AFS This

can be further improved if CVMFS is modified to keep the TCP connection alive (see next section)

Reasonable performance losses due to low throughput are not seen until throughput rates are 15-18

Mbs which is the download speed of current ADSL+ connections

31 Improvements in CVMFS

While the current version of CVMFS already shows highly competitive performance there is still

some room for improvement We have implemented a couple of optimisations into an experimental

version of CVMFS thus tackling two bottlenecks

311 Reducing HTTP overheads

Although the HTTP protocol overhead is small in terms of data volume in high latency networks we

suffer from the pure number of requests Each request-response cycle adds round trip time to the

measured time penalty In the current plain HTTP10 implementation this results in at least 3x(round

trip time) extra time per file for TCP handshake HTTP GET and TCP connection finalisation By

keeping the TCP connection open this ideally drops to just the round trip time overhead for HTTP

GET Exploiting the HTTP keep-alive feature does not affect scalability The servers and proxies may

safely close idle connections at any time in particular if they run out of resources

With pipelined HTTP11 or parallel requests we further amortise the protocol overhead over multiple

files In CVMFS we accomplish fetching a bunch of files by a pre-fetch thread A file foo may have a

list of pre-fetch hints ie files that are likely to be opened soon after opening foo These hints are

stored in the file system as metadata introducing only minimal overhead For binaries we have a high

degree of predictability typically 10 to 50 shared libraries that will be opened together with a binary

This enables us to produce pre-fetch hints just by following the link dependencies

312 Reducing transfer volume

By decoupling the local data cache from the catalogue we avoid downloading multiple copies of the

same file We modified the local cache to name downloaded files according to a secure hash of their

content (SHA1-Cache) This way redundancy due to duplicate files is automatically detected and

exploited resulting in less space consumed by the local cache and less network traffic In order to

ensure integrity of files we always calculate a secure hash during download Thus the SHA1-Cache

does not incur additional overhead

In addition to redundancy files we are distributing have a high compression ratio When using

compression on the server we have to take the computing time for decompression on CernVM clients

into account In high performance networks it might be faster to fetch more data and just copy it

compared to fetching less data that has to be further processed We have implemented a simple yet

effective compression scheme every file is separately compressed and stored in place of the

uncompressed original in the server repository and CVMFS decompresses files on the fly during

download This is entirely transparent for the content distribution network we do not have to change

anything on HTTP servers and proxies We chose the Deflate algorithm which shows an average

compression ratio of 31 for a typical repository Our benchmarks strongly support the decision to

enable compression by default Even in Gbit Ethernet we still see no additional extra time whatsoever

while getting a significant benefit in WAN networks Overall in slow networks the improved CVMFS

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

7

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

exhibits between five and eight times smaller time penalties than AFS while still being as fast as AFS

in Gbit Ethernet

Figure 7 Impact of CVMFS optimisations

Figure 7 shows the benefits of the particular optimisations added incrementally one after another

Compared to the current version of CVMFS the size of the repository stored on the server as well as

the mere network transfer volume is reduced at least by a factor of three

Table 2 Comparison of data volumes of CVMFS in its current version and CVMFS with our

experimental improvements Numbers are taken based on our ROOT benchmark

Repository size [MB] Repository size [MB]

CVMFS (current ) 309 70

CVMFS (experimental) 114 19

4 Supporting infrastructure

In order to provide scalable and reliable central services in support of CVMFS and appliance

generation using the rBuilder tool we have built an infrastructure consisting of four building blocks

scalable proxy frontend computing backend storage and networking services

1 Front end servers (reverse proxies) operating in DNS load balancing configuration and

forwarding the request based on URL content to the backend servers which run as virtual

machines on top of the physical servers

2 Computing Services are used to run all CVMFS services deployed as virtual machines

hosted on a pool of VMware ESX servers in such a way that CPU time and system memory

are aggregated into a common space and can be allocated for single or multiple virtual

machines This approach enables better resource utilization and simplifies the management

for High Availability (HA) and scalability

3 Another pair of servers running in a highly available configuration provides a shared file

system for the rest of the infrastructure and allows for live migration of virtual machines

between physical nodes It supports features such as advance storage management (dynamic

allocation dynamic growth snapshots and continuous data replication) and the capability to

implement consistent backup policies to some tertiary MSS

4 The supporting network infrastructure makes use of two separate network domains and

three different IP segments Public service provisioning is decoupled from internal service

requirements by using physically different channels with different QoS with extensive use

of IEEE 8023ad and 9K Jumbo frames to improve data IO and VM live migration

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

8

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

Figure 5 Supporting infrastructure for CernVM central services

5 Conclusions

In this RampD activity we have explored how virtualization technology can be used to provide a portable

user environment and interface to Grid We developed CernVM a Virtual Software Appliance

fulfilling the basic needs of all four LHC experiments This ldquothinrdquo appliance contains only a basic OS

required to bootstrap the LHC application software delivered to the appliance using CVMFS a file

system specially crafted for software delivery Using this mechanism CernVM once configured does

not need to be updated with each experiment software release since updates are automatically

propagated This results in an easy deployment scenario as only one image is required to support all

experiments The same is true if such virtual machines would be deployed to serve as job hosting

environments on voluntary resources (like BOINC[22]) or managed (like computing clusters grids

and clouds) By leveraging the use of standard protocols (HTTP) for all communication we can

effectively use a proxycache server infrastructure where it exists or create self-organizing proxy

networks in LAN environments to minimize the impact on a central software distribution point

Further by using existing content delivery network infrastructure public or commercial we can

remove a single point of failure and create a reliable and efficient file system for software delivery to

the thousands of CernVM clients with performance close to or better than traditional network file

systems such as AFS or NFS and the additional bonus that the system can work in disconnected

mode

6 References

[1] LHC Computing Grid httplcgwebcernchLCGpublicdefaulthtm

[2] CernVM Project httpcernvmcernch

[3] rPath httpwwwrpathorg

[4] VMware httpwwwvmwarecom

[5] Kernel Virtual Machine httpwwwlinux-kvmorg

[6] Xen httpwwwxenorg

[7] Microsoft HyperV httpwwwmicrosoftcomhyper-v-server

[8] VirtuaBox httpwwwvirtaulboxorg

[9] Conary Package Manager httpwikirpathcomwikiConary

[10] Scientific Linux httpwwwscientificlinuxorg

[11] Amazon Elastic Computing Cloud (EC2) httpawsamazoncomec2

[12] Nimbus Science Cloud httpworkspaceglobusorgclouds

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

9

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10

[13] File System in Userspace httpfusesourceforgenet

[14] Moretti C Sfiligoi I Thain D Transparently Distributing CDF Software with Parrot Feb 13-17

2006 Presented at CHEP06 Mumbay India

[15] Cooperative Computing Tools httpwwwcctoolsorg

[16] ROOT Project httprootcernch

[17] Amazon CloudFront Service httpawsamazoncomcloudfront

[18] Amazon Simple Storage Service httpawsamazoncoms3

[19] The Coral Content Distribution Network httpwwwcoralcdnorg

[20] Avahi httpavahiorg

[21] P Barham et al Xen and the art of virtualization ACM SIGOPS Operating Systems Review

Vol 37 Issue 5 (2003)

[22] BOINC an open-source software platform for computing using volunteered resources

httpboincberkeleyedu

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP PublishingJournal of Physics Conference Series 219 (2010) 042003 doi1010881742-65962194042003

10