automatic software deployment using user-level virtualization for cloud-computing

7
Future Generation Computer Systems 29 (2013) 323–329 Contents lists available at SciVerse ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Automatic software deployment using user-level virtualization for cloud-computing Youhui Zhang , Yanhua Li, Weimin Zheng Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China article info Article history: Received 1 November 2010 Received in revised form 26 March 2011 Accepted 5 August 2011 Available online 5 September 2011 Keywords: Cloud computing User-level virtualization Virtual machine Deployment abstract Cloud Computing offers a flexible and relatively cheap solution to deploy IT infrastructure in an elastic way. An emerging cloud service allows customers to order virtual machines to be delivered virtually in the cloud; and in most cases, besides the virtual hardware and system software, it is necessary to deploy application software in a similar way to provide a fully-functional work environment. Most existing systems use virtual appliances to provide this function, which couples application software with virtual machine (VM) image(s) closely. This paper proposes a new method based on the user-level virtualization technology to decouple application software from VM to improve the deployment flexibility. User-level virtualization isolates applications from the OS (and then the lower-level VM); so that a user can choose which software will be used after setting the virtual machines’ configuration. Moreover, the chosen software is not pre-installed (or pre-stored) in the VM image; instead, it can be streamed from the application depository on demand when the user launches it in a running VM to save the storage overhead. During the whole process, no software installation is needed. Further, the enormous existing desktop software can be converted into such on-demand versions without any modification of source code. We present the whole framework, including the application preparation, the runtime system design, the detailed deployment and usage workflow, and some optimizations. At last, test results show that this solution can be efficient in performance and storage. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Infrastructure cloud service providers (e.g., [1,2]) deliver virtual hardware and software in their datacenters, based on the demand from customers. Then, customers avoid capital expenditure by renting usage from the provider and they consume resources as a service. Usually, besides virtual hardware and system software, it is necessary to deploy application software in a similar way; therefore customers can get a fully-functional work environment conveniently with the required application software. Most existing solutions allow cloud customers to order Virtual Appliances (VAs) [2–4] to be delivered virtually on the cloud. For example, VA marketplaces [2,5,6] provide lots of categories of appliances, and each is a pre-built software solution, comprised of one or more Virtual Machines that are packaged. VA-based methods can reduce time and expenses remarkably associated with application deployment. Corresponding author. Tel.: +86 10 62783505x3. E-mail address: [email protected] (Y. Zhang). However, because VA couples the application software and VMs closely, it also has some drawbacks: (1) Lack of flexibility. For example, a customer needs software A and B to work together in a virtual machine, while the provider only has two separate VAs containing A and B respectively. Then, the provider has to create a new VM template to combine A and B together. In theory, such combinations are countless. (2) Inefficiency of storage. Each VA comprises one VM image at least, which means the OS has to be combined in the image. Therefore, the storage overhead is larger, although some technologies (e.g., Just enough OS [7], De-duplication [8,9]) have been employed to reduce the overhead. The essential reason of these drawbacks lies in that the VA solution heavily depends on the virtual machine technology and the latter only isolates system software from hardware. Therefore application software has to be packaged in the whole system for deployment. To solve this problem, this paper introduces a double-isolation mechanism that uses the user-level virtualization technology to further isolate application software from OS, while the VM-level isolation is still kept. Therefore, application software can be de- ployed in a fine granularity to increase the flexibility and decrease 0167-739X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2011.08.012

Upload: youhui-zhang

Post on 25-Nov-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic software deployment using user-level virtualization for cloud-computing

Future Generation Computer Systems 29 (2013) 323–329

Contents lists available at SciVerse ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Automatic software deployment using user-level virtualization forcloud-computingYouhui Zhang ∗, Yanhua Li, Weimin ZhengDepartment of Computer Science and Technology, Tsinghua University, Beijing 100084, China

a r t i c l e i n f o

Article history:Received 1 November 2010Received in revised form26 March 2011Accepted 5 August 2011Available online 5 September 2011

Keywords:Cloud computingUser-level virtualizationVirtual machineDeployment

a b s t r a c t

Cloud Computing offers a flexible and relatively cheap solution to deploy IT infrastructure in an elasticway. An emerging cloud service allows customers to order virtual machines to be delivered virtually inthe cloud; and in most cases, besides the virtual hardware and system software, it is necessary to deployapplication software in a similar way to provide a fully-functional work environment. Most existingsystems use virtual appliances to provide this function, which couples application software with virtualmachine (VM) image(s) closely.

This paper proposes a new method based on the user-level virtualization technology to decoupleapplication software from VM to improve the deployment flexibility. User-level virtualization isolatesapplications from the OS (and then the lower-level VM); so that a user can choose which software will beused after setting the virtual machines’ configuration. Moreover, the chosen software is not pre-installed(or pre-stored) in the VM image; instead, it can be streamed from the application depository on demandwhen the user launches it in a running VM to save the storage overhead. During the whole process, nosoftware installation is needed. Further, the enormous existing desktop software can be converted intosuch on-demand versions without any modification of source code.

We present the whole framework, including the application preparation, the runtime system design,the detailed deployment and usage workflow, and some optimizations. At last, test results show that thissolution can be efficient in performance and storage.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Infrastructure cloud service providers (e.g., [1,2]) deliver virtualhardware and software in their datacenters, based on the demandfrom customers. Then, customers avoid capital expenditure byrenting usage from the provider and they consume resources asa service.

Usually, besides virtual hardware and system software, itis necessary to deploy application software in a similar way;therefore customers can get a fully-functional work environmentconveniently with the required application software.

Most existing solutions allow cloud customers to order VirtualAppliances (VAs) [2–4] to be delivered virtually on the cloud. Forexample, VA marketplaces [2,5,6] provide lots of categories ofappliances, and each is a pre-built software solution, comprisedof one or more Virtual Machines that are packaged. VA-basedmethods can reduce time and expenses remarkably associatedwith application deployment.

∗ Corresponding author. Tel.: +86 10 62783505x3.E-mail address: [email protected] (Y. Zhang).

0167-739X/$ – see front matter© 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2011.08.012

However, because VA couples the application software andVMsclosely, it also has some drawbacks:

(1) Lack of flexibility. For example, a customer needs software Aand B to work together in a virtual machine, while the provideronly has two separate VAs containing A and B respectively. Then,the provider has to create a new VM template to combine A and Btogether. In theory, such combinations are countless.

(2) Inefficiency of storage.Each VA comprises one VM image at least, which means the OS

has to be combined in the image. Therefore, the storage overheadis larger, although some technologies (e.g., Just enough OS [7],De-duplication [8,9]) have been employed to reduce the overhead.

The essential reason of these drawbacks lies in that the VAsolution heavily depends on the virtual machine technology andthe latter only isolates system software from hardware. Thereforeapplication software has to be packaged in the whole system fordeployment.

To solve this problem, this paper introduces a double-isolationmechanism that uses the user-level virtualization technology tofurther isolate application software from OS, while the VM-levelisolation is still kept. Therefore, application software can be de-ployed in a fine granularity to increase the flexibility and decrease

Page 2: Automatic software deployment using user-level virtualization for cloud-computing

324 Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

the storage overhead. In this paper, we call such application soft-ware as on-demand software.

Based on this design philosophy, wemake the following contri-butions:(1) The whole deployment framework based on the double-isolationmechanism.

The deployment of application software on user-level virtual-ization is the focus. It includes the on-demand software prepa-ration, deployment, runtime system, customization and usageaccounting.(2) User-level virtualization of on-demand software.

Some essential technologies, like converting legacy softwareinto the on-demand style or the runtime system of user-level vir-tualization, are implemented. Especially, our methods can supportexisting application software without any modification of sourcecode.(3) A central distribution system for on-demand software.

One or more central data servers are used to provide soft-ware on demand for the deployed virtual machines, rather thanplace software within VMs in advance. Because of the common-ality of frequently-used applications in the Cloud Computing en-vironment, this technology can decrease the storage consumptionsignificantly.

Moreover, some access optimizations, including the content-addressable storage and local cache, are presented, too.(4) The system prototype.

In addition, tests show that this solution is efficient in perfor-mance and storage.

In the following sections, we first present the whole frameworkand the user-level virtualization technology for on-demandsoftware. The central distribution system for Cloud Computing andthe related optimizations are given in Section 3. The prototype isintroduced in Section 4, as well as the performance tests. Section 5gives related work; the conclusion and future work are presentedfinally.

2. The framework

2.1. Software deployment overview

To deploy on-demand software in the Cloud Computing envi-ronment, it is necessary to provide a system to own the followingfunctions:(1) Software preparation.

Most existing software needs to be installed before it canrun normally. However, in our design, the on-demand softwarerequested by a customer can be used instantly without anyinstallation process. Thus, we should convert software into the on-demandmode in advance, and all on-demand software is stored inthe software depository for users’ selection.

The details are presented in Section 2.2.(2) Software selection.

For most existing cloud service providers, a customer usuallychooses one ormore VAs before deployment,whichmeans that therequired software and its lower-level VM(s) are selected at once.

In contrast, we provide a more flexible selection procedure: acustomer can choose the wanted OS, as well as any number ofsoftware in separated stages. For example, Lisa orders a WindowsVM as her remote work environment on the cloud; and thenshe can select any on-demand software (only if it can run in theWindows OS) that she will use in the VM. It means that we canprovide any combination of VM and software, rather than dependon the limited number of existing VM templates.(3) On-demand deployment and usage accounting.

Fig. 1. On-demand software and virtual appliance.

After the preparation and selection, software is not to be storedin the VM image (like the Virtual Appliance does). In contrast,one or more central data servers are used to provide software ondemand for the deployed virtual machines. It means only whenthe customer actually uses the chosen software, will it be streamedfrom the data server and run locally without installation. In otherwords, on-demand software is stored remotely and run locally; alocal cache is also used to improve the access performance.

Inherently, this deploymentmode enables a fine-grained billingmechanism: the accurate running time of any on-demand softwarecan be gotten and used as the accounting basis.

The technical details are presented in Section 2.3 about theruntime design.(4) Software customization.

Another problem of the VA-based solution is how to save theuser’s customization. When Lisa finishes her work, she wantsto terminate the rent agreement, but reserve her customizationof application software, like the default homepage, browserbookmarks/history, cookies and even toolbars’ positions, etc.; thenit is possible for her to restore these favorites when she rents thesame virtual environment again.

For the VA-based solution, it is difficult to implement thisfunction efficiently. One way is to use application-specific tools todraw the customized configurations [10]. Another is to save thedifference between the current VM image and the original one,which will contain too much unrelated data.

We solve this problem through the runtime environment basedon user-level virtualization, which is independent of the concretesoftware and achieves higher storage efficiency.

The details are presentedwith the runtimedesign in Section 2.3.

2.2. Preparation of on-demand software

According to the on-demand software model we presentedin [11], any software canbe regarded as containing three parts: Part1 includes all resources provided by the OS; Part 2 contains whatare created/modified/deleted by the installation process; and Part3 is the data created/modified/deleted during the run time. Theresources here mainly refer to files/folders, environment variablesand/or the related system registry keys/values (for Windows OS).

Because the traditional solution only depends on the VM, it hasto carry the OS image in order to take Part 1, as well as Part 2, toconstruct the whole virtual appliance.

For the new solution, user-level virtualization isolates applica-tion software from the OS and our solution only makes softwarerun on compatible hosts (which implies that all resources of Part1 are available on the local system), so that only Part 2 is neededto be drawn to build the on-demand software. The difference be-tween these two solutions is illustrated in Fig. 1.

An installation snapshot is taken to build the on-demandsoftware: we start with a machine in a known state (for example,immediately after the OS was installed); then, we install softwareand finally identify everything that was added to the system bythat process of the installation. Typical additions mainly consist

Page 3: Automatic software deployment using user-level virtualization for cloud-computing

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329 325

Fig. 2. User-level virtualization runtime environment.

of directories and files and/or registry entries in Windows. Theadditions (Part 2) form the on-demand software at that time.

How to deal with Part 3 will be presented in the next section.

2.3. The runtime environment of on-demand software

Tomake the on-demand software run smoothly in a compatibleOS, it is necessary to construct a runtime environment where soft-ware can locate and access any necessary resources transparently,just as it has been installed.

Another issue is how to capture Part 3, which is createddynamically, to reflect the user’s customization.

In our system, on-demand software runs in a user-levelvirtualization environment layered on top of the local machine’sOS. This environment intercepts all resource-accessing APIs,including those accessing the system registry and files/ directories,from the software. In detail, the environment redirects all accessesfor Parts 2 and 3 to the actual storage positions (like the local cacheor the central data server) and guides other visits (for Part 1) to thelocal OS (as presented in Fig. 2).

For UNIX and UNIX-like OS (for example, Linux), the ptrace [12]mechanism can be used to intercept system APIs dynamically; forWindows OS, the Detour library [13] provided by Microsoft canalso complete the similar function.

During the run time, the software instancewill access resourcesof all parts on the fly: some resources are read-only while somemay be modified/added/deleted. Therefore, no part is fixed: theresource modified will be moved into Part 3. The principle lies inthat any modification is always saved in a separate position (Part3); any browsing operation (like list all files in one folder) willreturn the combination of corresponding results from all parts (ifthere is any duplication, Part 3 has the highest priority and Part 1is the lowest); for any read, the same strategy is adopted.

Then, the on-demand software can run without installation asthe runtime systemprovides all necessary resources transparently.And no trace will be left on the host because any modification canbe intercepted to store into Part 3, instead of the system’s defaultposition(s).

Compared with the VA solution, our system has some extrafeatures besides the deployment flexibility and storage efficiency:(1) On-demand software can see all local resources (except for

those overlaid by its Parts 2 and 3) and can communicate withother programs running on the same host machine (includingother on-demand software running in the virtualizationenvironment, and native software) through the local filesystem and local IPC.

In contrast, applications in one VA cannot communicatewith others in another VA directly. The comparison is given inFig. 3.

(2) A fine-grained difference between the original image andthe current one can be extracted. Because the runtimeenvironment operates above the OS, it can distinguish themodifications made by different software (for example, based

Fig. 3. Execution stack compared to VMM.

Fig. 4. System overview.

on the program name). The, the user’s customization of eachapplication can be extracted accurately for reuse, as requestedby Section 2.1.

For the VA-based system, it is difficult to do so because thevirtual machine monitor works under the guest OS and lacks thenecessary semantic information.

The whole system of the above-mentioned functions or proce-dures is described in Fig. 4. At first, normal software is convertedinto the on-demand version and stored in the central depository;then, the user can select any needed software on configuration.After system deployment, the user can launch software streamedfrom the central server(s) on-demand and the user-level virtual-ization environment is employed to make the software run with-out installation and keep the customization transparent.

3. Central distribution system

Elasticity [14] is one of the key features of Cloud Computing,which usually means to add or remove resources at a finegrain within a short lead time and allows matching resources toworkload much more closely.

To combine software with the VM image (like VA does) is notan efficient method. It brings about management complexitiesand storage inefficiency: on-demand software is distributed tomultiple running instances, and many copies of the same softwaremay exist in different instances.

Therefore, a central deployment mechanism is employed: on-demand software is located on central depository server(s); thestorage position can be mounted as a virtual local disk on acustomer’s VM (through a user-level file system framework); andthe customer owns the access right to visit the software he/shehas chosen. When the customer boots up the VM and launchessoftware, itwill be streamed from the depository,whichmeans thatthe transfer of the software’s bits to the local VM is overlappedwithits execution. This enables a fast software deployment withoutwaiting for an entire image to be downloaded prior to starting itsexecution.

One issue introduced by the central mode is write-conflict:various instances of the same software will run simultaneously,which may modify the same data object on the central depository.To solve this problem, a copy-on-write (COW) method is adopted:

Page 4: Automatic software deployment using user-level virtualization for cloud-computing

326 Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

Fig. 5. Framework of the user-level file system.

as mentioned in Section 2.3; any modification happening duringthe run time is considered belonging to Part 3, which is storedin a separate position located in the local VM. Then, the centraldepository is a read-only storage and any modification happenslocally.

3.1. User-level virtual file system

A user-level file system usually works as a proxy for file systemaccesses: file operation requests from the application to the targetfiles/folders/partitions will be forwarded to the correspondinguser-space callback functions that complete the real job and sendresults back.

The user-level file system often suffers from some performanceloss because it introduces a longer data transfer-path and morecontext-switch operations. However, it reduces the developmentcomplexities, and more importantly, it is a flexible solutiondependent on the OS to the minimum extent.

The most famous framework of a user-space file system isFUSE [15]. FUSE only works in Linux OS. For Windows OS,DOKAN [16] is a good alternative (Fig. 5).

Owing to the user-level file system framework, on-demandsoftware is stored in the central depository and presented asfiles/ folders on a virtual local drive in the customer’s VM. Then,the customer can use them just as they use any local-installedsoftware.

The interval between the opening of the main program file ofany on-demand software and its closing is regarded as the usagetime for accounting.

The user-level file system can also implement the COWmethoddirectly: when any modification of the virtual drive is captured, itwill fetch the whole original file from the central server to a localposition (outside the virtual disk) and redirect any following access(including the current modification) to this local version. It meansthat Part 3 of any on-demand software will be stored in the localVM.

3.2. Optimizations

(1) Content-addressable storage (CAS).The central depository is CAS-enabled. CAS is a mechanism for

storing information that can be retrieved based on its content. Itusually uses cryptographic hashing to reduce storage requirementsby exploiting commonality across multiple data objects.

In detail, we select a similar CAS strategy as [17] adopted: on-demand software is partitioned into shards. Shards may corre-spond to a single file and registry entry.We compute the hash valueof every shard, and the samevaluesmean the corresponding shardsare identical. Then, identical shards are given the same physicalname and are only stored once. One example is shown in Fig. 6 thetwo C shards are the same.

The depository maintains a key data-structure to map shardnames to physical names (a many-to-one mapping); then whenany shard is to be accessed by the user-level file system, its physicaldata can be located.

Fig. 6. CAS storage.

Because the depository is read-only (as we mentioned inSection 3.1), the CAS mechanism is able to work well.(2) Data cache.

The central mode will decrease the running performance ofon-demand software if any file access has to be redirected to theremote depository. To overcome this drawback, a local cache in thecustomer’s VM is enabled.

Based on the detailed analyses of access patterns of software,we found, for much commonly-used desktop software (for exam-ple, Office applications, PhotoShop, some media players, networkapplications and so on), the most frequently-used files belong tothose accessed during the start-up process, which only occupieda limited ratio (usually between 20% and 40%) of the whole stor-age capacity. Then, one local cache with limited space can achievea fairly high hit rate and improve the access performance remark-ably. For example, in our test cases, a 200MB local cache can reachabout 80% hit rate while the storage capacity of on-demand soft-ware is more than 800 MB.

The local cache technology is straightforward: any remote dataaccessed during the running process is stored in a local position.Then, during the subsequent running processes, the data can bevisited in the local position. To simplify management, the dataoffset and size of remote access are both set as 32 KB-aligned.Therefore, any remote read during the runtime will be convertedinto a request with the size of an integer multiple of 32 KB. Thisdesign alsomeans that some pr-fetch is impliedwhen a small pieceof data is wanted, which can reduce the number of remote visits.The replace strategy is based on the usage frequency.

Another beneficial fact lies in that, for any given software,the access pattern of its start-up process is almost fixed, whichmeans the pre-fetch mechanism can work well. Based on profilingtechnology, our virtual file system can learn the access sequenceof any software, and guide the pre-fetch accurately during thefollowing execution times.

4. Prototype

We have implemented such a Cloud Computing prototypebased on the XEN VMM [18] for course experiments. Many courses

Page 5: Automatic software deployment using user-level virtualization for cloud-computing

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329 327

Fig. 7. System overview of the prototype.

ask students to complete study assignments and/or software ex-periments, so that some computer systems with specific soft-ware are required. We plan to construct such a platform toprovide lecturers and students with the required systems in-stantly; and deployed resources will be revoked when the coursefinishes. Now this prototype is mainly for desktop applicationsbased on Windows OS.

Here we focus on its software deployment system.

4.1. Implementation

For user-level virtualization on the client end,weuse theDetourlibrary to intercept those Windows APIs accessing the systemregistry and files/folders. In detail, interception code is applieddynamically at runtime—Detour replaces the first few instructionsof the target API with an unconditional jump to the user-provideddetour function. They are inserted at execution time.

Moreover, we employ DOKAN to implement the user-levelvirtual file system based on the design described in Section 3,which is pre-installed in any VM image.

A further optimization here lies in that wemove the local cacheinto the virtualization environment from the file level, becauseany file API of on-demand software has been intercepted so thatit is possible to locate the cached data above the file level. Thisenhancement can reach performance improvement because anyhit access can be handled with less context-switch.

Many existing applications are converted into the on-demandmode, including Office applications, SUN JVM, MATLAB, Lotus Notes,PhotoShop, Internet Explorer, Outlook Express, Winzip, UltraEdit,FlashGet, Skype, Bittorrent and lots of other frequently-used soft-ware (Fig. 7).

All software is stored in a central storage server, where a file-level CAS [19] is implemented to improve the storage efficiency.Owing to this feature, the storage space occupied by the aboveapplications is reduced by about 11%.

The concrete workflow is presented as follows:(1) A user, Lisa, chooses some on-demand software, as well as the

VM.(2) The systemassigns somephysical server as the runninghost for

Lisa and assigns an IP to the VM; the corresponding VM image(no on-demand software is included) will be copied into theassigned server before booting up.

(3) With the RDP client-end program, Lisa logins into her VMthrough the campus network.

(4) In the background, the pre-installed user-level file system inthe VM connects to the storage server and mounts the localvirtual drive.

(5) Themodule of the user-level file systemalso acts as a shell: Lisacan browse her chosen software and launch any one throughthe shell. During the start-up process, the module applies in-terception code to construct the virtualization runtime envi-ronment.

(6) When any read operation is captured, the runtime environ-ment tries to locate it in the local cache at first; if missed, thecentral server may be visited through the user-level file sys-tem. For any modification, the COW mechanism is carried outas described in Section 3.

Till now, Lisa can use any chosen software without installationin her VM, because the central depository and the local runtimeenvironment provide any necessary resource transparently.

Moreover, any modification (Part 3 of on-demand software)is stored in a local position and Lisa can keep them separate.Then, after the system revokes this VM, Lisa can still restore thecustomization of herwork environmentwhen she requests the VMand on-demand software again.

4.2. Open problems

In the current version, only twomirrored central storage serversare deployed, and any user-level file system will be assigned aserver randomly as its backend. We believe, with the expansionof the system, this simple method will lack scalability although fornow it is enough.

Another shortage is that we only implement the central de-ployment mechanism for application software, not for VM images.Then, we have to copy images to the local storage of the assignedserver, which is not efficient in storage.

4.3. Performance test and analysis

Four PC servers are used as VMs’ hosts. All are Linux-XEN PCs,equipped with 2 GB DDR2 RAM and one Intel Core Duo CPU. Thehard disk is one 160 GB SATA drive.

One 64-bit Linux storage server, equipped with an Intel Core2 Duo E5500 CPU (2800 MHz) and 16 GB DDR3 RAM, provides a430 GB RAID-5 storage space (based on SAS disks). All machinesare connected with the 1000 M Ethernet.

In each VM, one local cache with 200 MB is reserved.(1) Performance metrics and test methods.

The most important measurement is the running performanceof on-demand software, compared with its original version. Twokinds of time are measured.

The first is start-up time: we launch some on-demand softwareapplication through the shell of the user-level file system in fourVMs on PC servers (one VM on each server) and record their start-up time respectively. One issue here is how to judge whetherthe start-up process finishes or not. Fortunately, Microsoft gives aspecial API,WaitForInputIdle, to judgewhether the newprocess hasfinished its initialization and is ready to respond to a user’s inputor not. As it returns, the average elapsed time is recorded as thestart-up overhead.

The second is the running time: after start-up, we use somescripts to control software to complete a series of operations (suchas open a document and edit it before close, if taking the Officeapplication as the example), which looks like being triggered by areal user. Some software tools can record the user’s inputs of thekeyboard and mouse and replay them, which helps us to do so.Moreover, between any two continuous operations, some randomwaiting time (less than 1 s) is inserted to simulate the human’sthinking time. Then the auto-execution time is recorded.

Ten applications, OpenOffice, PhotoShop, Lotus Notes, Firefox, VLC(a powerful media player), Winzip, UltraEdit, Skype, Gimp (an opensource picture editor), Acrobat Reader, are used for tests.

Another measurement is of the storage efficiency. We foundthat one VM image only containing Windows XP occupies about1.5 GB space and if all the above-mentioned applications areinstalled in the VM, more than 800 MB space will be further used.Owing to the central distribution mode, 600 MB storage space canbe saved, considering the local 200 MB cache.(2) Test cases.

The running time and start-up time of locally-installed softwarein a VM are measured as the basis for comparison.

Page 6: Automatic software deployment using user-level virtualization for cloud-computing

328 Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

Fig. 8. Comparisons of start-up time.

Fig. 9. Comparisons of running time.

Then, we launch software in the virtualization environmentwithout the user-level file system, which means all on-demandsoftware is stored in the VM and no remote access happens. Thiscase is used to show the performance loss caused by the user-levelvirtualization itself.

Finally, the user-level file system is employed as well as thelocal cache, and the cache hit rate is set as 0%, 80% and 100%respectively.

All measured results are compared with the correspondingbasic values, and average values of the ratios are presented in thenext section.(3) Results.

Fig. 8 presents ratios of start-up times.We can find, the runtimeenvironment itself causes very limited overhead, about 1%, becauseit works on the user space totally.

The user-level file system itself, combined with the runtimeenvironment, introduces about 19% overhead (the case where thehit ratio is 100%). When the hit ratio is 80%, which is the commoncase with the 200 MB local cache in our test, the extra overhead is26% because 20% of visits have to access the remote server. Theworst case is 91%, which happens when the cache is empty, forexample, the VM is used for the first time.

Fig. 9 gives the results of running time (after start-up). Itshows that, in this aspect, our system introduces less relativeoverheads: the runtime environment itself causes almost no extraoverhead, less than 0.3%. For the user-level file system, becausemost frequently-used files belong to those accessed during thestart-up process, the actual hit ratio is much higher in the runtime, more than 90%. Moreover, the inserted interval between anytwo continuous simulated operations can hide pre-fetch latencies,which further decreases the relative overhead.(4) Analysis.

As we know, the initialization of a process contains the phase ofcode execution and the IO phase interleaved: if any invalid page isaccessed by the execution phase, a page-fault will be triggered toread the data.

The first phase ismainly complemented in the physicalmemorywhile the other interacts with the IO system. During the start-up

stage, the system cache is almost empty so that most data shouldbe read from external IO modules at a much lower speed than thecode execution. Therefore the analysis focuses on the performanceof data access.

Three types of data are accessed during the start-up stage.1. Part 1.2. Local-cached data.3. Networked data.For any given software and host, the amount and the access rate ofPart 1 are fixed. Therefore, to increase the other two access ratesand the hit ratio of the local cache is the key point to improvethe performance. Fortunately, formuch desktop software, themostfrequently-used files belong to those accessed during the start-upprocess; and the access pattern of its start-up process is almostfixed. Therefore, our local cache with limited space can achieve afairly high hit rate, and the test results prove it.

5. Related work

5.1. On-demand software

On-demand software is regarded as the future usage mode forsoftware.Most on-demand software isweb-based applications andthe existing desktop software cannot be used in this mode.

Therefore, user-level virtualization technologies have beendeveloped to convert legacy software into on-demand software,like [17,20,11,21].

Microsoft Application Virtualization [20] allows applications tobe deployed in real-time to any client from a virtual applicationserver. It removes the need for local installation of the applicationsto reduce the labor involved in deploying, updating, andmanagingthem.

Bowen and Joshua [17] have a Progressive Deployment System(PDS), which is a virtual execution environment and infrastructuredesigned for deploying software on demand. PDS intercepts aselected subset of system calls on the target machine to providea partial virtualization at the operating system level.

Our previous work [11,21] provides a solution to stream soft-ware to local PCs across the Internet, based on lightweight virtu-alization and p2p transportation technologies. Some technologiesand models of them are employed here to implement the runtimevirtualization environment.

5.2. Automatic service deployment for Cloud Computing

Most existing service deployment systems for Cloud Computingare based on VAs.

An early work on VA is [3]. It attempted to address the com-plexity of system administration by making the labor of applyingsoftware updates independent of the number of computers. It de-veloped a compute utility, called Collective, which assigns virtualappliances to hardware dynamically and automatically.

Later, VAs have been employed by some grid researchers todeploy services or software for grid systems, including [4,22,23],and have been gradually moved to cloud computing [24–26].

For example, VMPlant [4] provides automated configurationand creation of flexible VAs that can be dynamically instantiatedto provide homogeneous execution environments across Gridresources. Bradshaw et al. [22] describes the requirementsand services required to ensure the scalable management anddeployment of VAs for grid computing.

Kecskemeti et al. [23] describe an extension to the GlobusWorkspace Service to create virtual appliances and deploy themfor Grid services. And then the same research group proposes anautomated virtual appliance creation service [24] that aids thedevelopers to create their own virtual appliances efficiently forinfrastructure as a service cloud systems.

Epstein et al. [25] gives a framework for virtual appliancedistribution for a distributed cloud infrastructure service, which

Page 7: Automatic software deployment using user-level virtualization for cloud-computing

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329 329

addresses a fundamental storage staging problem in this context.Rodero-Merino et al. [26] proposes a mechanism to allow forservices’ automatic deployment and escalation depending on theservice status, and such a service management system sitting ontop of different cloud providers is implemented.

In contrast, our solution is using the user-level virtualization toimprove the flexibility and storage efficiency of deployment.

Compared with the preliminary deployment system [12] of us,this paper gives a more complete solution with some enhance-ments on optimization.

6. Conclusion and future work

This paper provides a framework using user-level virtualizationtechnology to decouple application software from VM imagesto improve the deployment flexibility for cloud computing. Themain functions or procedures, including application preparation,runtime system, deployment and usage workflow, are presented.Compared with VA-based solutions, it also improves the storageefficiency, and users’ customization can been separated inherentlyand efficiently for reuse.

Moreover, a central deployment mechanism is designed tomanage/distribute all software, which cooperates the user-levelfile system inVMs to provide software data on demand. In addition,the CAS method is used to decrease the storage capacity required,and a local cache on the VM end improves the access performanceremarkably.

We implemented such a prototype and construct some teststo give performance metrics. Results show that this solution isefficient in running performance: for start-up time, the runtimeenvironment itself causes very limited overhead, about 1%, whilethe whole system introduces 26% extra overhead with one 200MBlocal cache on each VM; for running times, the extra overhead isless, about 10%.

In the next step, we plan to solve the existing problems asmen-tioned in Section 4.2: to design more advanced storage strategieswith an adaptive schedule supporting a larger-scale system; to em-ploy some CAS technologies improves the efficiency of manage-ment and assignment of VM images.

Acknowledgments

The work is supported by the Open Research Fund Program ofthe Beijing Key Lab of Intelligent Telecommunications Softwareand Multimedia, the High Technology Research and DevelopmentProgramof China underGrantNo. 2008AA01A201 and theNationalGrand Fundamental Research 973 Program of China under GrantNo. 2007CB310900.

References

[1] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing andemerging it platforms: vision, hype, and reality for delivering computing asthe 5th utility, Future Generation Computer Systems 25 (6) (2009) 599–616.

[2] VMWare public virtual appliances, 2010. URL: http://www.vmware.com/appliances/.

[3] C. Sapuntzakis, D. Brumley, R. Chandra, N. Zeldovich, J. Chow, M.S.Lam, M. Rosenblum, Virtual appliances for deploying and maintainingsoftware, in: LISA’03, Proceedings of the 17th USENIX Conference on SystemAdministration, USENIX Association, Berkeley, CA, USA, 2003, pp. 181–194.

[4] I. Krsul, A. Ganguly, J. Zhang, J.A.B. Fortes, R.J. Figueiredo, VMplants: providingand managing virtual machine execution environments for grid computing,in: Proceedings of the ACM/IEEE SC2004 Conference on High PerformanceNetworking and Computing, IEEE Computer Society, 2004, pp. 1–7.

[5] Public EC2 Amazon machine images, 2010. URL: http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=171.

[6] TurnKey Linux, 2010. URL: http://www.turnkeylinux.org/.[7] Ubuntu JeOS instructions, 2010. URL: http://www.ubuntu.com/server/

features/virtualisation.[8] Anthony Liguori, Eric Van Hensbergen, Experiences with content addressable

storage and virtual disks, in: WIOV’ 08: Proceedings of the First Workshop onI/O Virtualization, USENIX Association, San Diego, CA, USA, 2008.

[9] N. Tolia, M. Kozuch, M. Satyanarayanan, et al. Opportunistic use of contentaddressable storage for distributed file systems, in: Proceedings of the2003 USENIX Annual Technical Conference. San Antonio, TX, USA, 2003,pp. 127–140.

[10] Migosoftware, 2010. URL: http://www.migosoftware.com/default.php.[11] Youhui Zhang, Gelin Su, Weiming Zheng, Converting legacy desktop ap-

plications into on-demand personalized software, in: IEEE Transactions onServices Computing, IEEE computer Society Digital Library (14 Jun. 2010)doi:10.1109/TSC.2010.32.

[12] ptrace — Linux man page, 2010. URL: http://linux.die.net/man/2/ptrace.[13] Galen Hunt, Doug Brubacher, Detours: binary interception of win32 functions,

in: Proceedings of the Third USENIX Windows NT Symposium, July, 1999.[14] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D.

Patterson, A. Rabkin, I. Stoica, M. Zaharia, Above the clouds: a Berkeley viewof cloud computing, Tech. Rep. UCB/EECS-2009-28, University of California atBerkley, February 2009.

[15] Filesystem in userspace, 2010. URL: http://fuse.sourceforge.net/.[16] Dokan—user mode file system for windows, 2010. URL: http://dokan-

dev.net/en/.[17] Bowen Alpern, Joshua Auerbach, PDS: a virtual execution environment for

software deployment, in: Proceedings of the First ACM/USENIX InternationalConference on Virtual Execution Environments, Chicage, Illinois, USA, 2005.

[18] Computer Laboratory — Xen virtual machine monitor, 2010. URL:http://www.cl.cam.ac.uk/research/srg/netos/xen/.

[19] Youhui Zhang, Dongsheng Wang, Applying file information to block-levelcontent addressable storage, Tsinghua Science & Technology 14 (1) (2009)41–49.

[20] Microsoft Application Virtualization, 2010. UEL: http://www.microsoft.com/systemcenter/appv/default.mspx.

[21] Youhui Zhang, Xiaoling Wang, Liang Hong, Portable desktop applicationsbased on P2P transportation and virtualization, in: LISA’08: Proceedingsof the 22nd Large Installation System Administration Conference, USENIXAssociation, San Diego, CA, USA, 2008, pp. 133–144.

[22] R. Bradshaw, N. Desai, T. Freeman, K. Keahey, A scalable approach to deployingand managing appliances, in: Proceedings of TeraGrid ’07 conference,Madison, Wisconsin, USA, 2007.

[23] G. Kecskemeti, P. Kacsuk, G. Terstyanszky, T. Kiss, T. Delaitre, Automatic servicedeployment using virtualisation, in: Proceedings of 16th Euromicro Interna-tional Conference on Parallel, Distributed and Network-Based Processing, PDP2008, IEEE Computer Society, Toulouse, France, 2008, pp. 628–635.

[24] Gabor Kecskemeti, Gabor Terstyanszky, Peter Kacsuk, Zsolt Nemétha, Anapproach for virtual appliance distribution for service deployment, FutureGeneration Computer Systems (2010) doi:10.1016/j.future.2010.09.009.

[25] A. Epstein, D.H. Lorenz, E. Silvera, I. Shapira, Virtual appliance contentdistribution for a global infrastructure cloud service, in: Proceedings of 2010IEEE InfoComm, San Diego, CA, USA, 2010, pp. 1–9.

[26] Luis Rodero-Merino, Luis M. Vaquero, Victor Gil, Fermín Galán, Javier Fontán,Rubén S. Montero, Ignacio M. Llorente, From infrastructure delivery to servicemanagement in clouds, Future Generation Computer Systems 26 (8) (2010)1226–1240.

Youhui Zhang is an Associate Professor in the Departmentof Computer Science at the University of Tsinghua, China.His research interests include cloud computing, networkstorage and microprocessor architecture. He received hisPh.D. Degree in Computer Science in 2002.

Yanhua Li is a Ph.D. student in the Department ofComputer Science at the University of Tsinghua, China.His research interests include cloud computing andmicroprocessor architecture. He received his BachelorDegree in Computer Science in 2009.

Weimin Zheng is a Professor in the Department ofComputer Science at the University of Tsinghua, China. Hisresearch interests include high performance computing,network storage, parallel compiler.