evaluation of xen: performance and use in parallel...

27
Evaluation of Xen: Performance and Use in Parallel Applications EECE 496 Project Report Prepared by Caleb Ho (38957023) Supervisor: Matei Ripeanu Date: April 12, 2007

Upload: others

Post on 26-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

Evaluation of Xen: Performance and

Use in Parallel Applications

EECE 496 Project Report

Prepared by Caleb Ho (38957023)

Supervisor: Matei Ripeanu

Date: April 12, 2007

Page 2: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

ii

ABSTRACT

Xen is an open-source virtual machine monitor under heavy development. In this project,

the performance of Xen and its use in parallel applications are investigated. It is found

that Xen performs close to the native performance in the area of computation, but lacking

in other areas. Furthermore, to increase fault tolerance of parallel applications, the naïve

checkpoint technique is analyzed and determined feasible using Xen’s save/restore

functionalities.

Page 3: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

iii

TABLE OF CONTENTS

ABSTRACT........................................................................................................................ ii

TABLE OF CONTENTS................................................................................................... iii

LIST OF ILLUSTRATIONS............................................................................................. iv

GLOSSARY ....................................................................................................................... v

LIST OF ABBREVIATIONS............................................................................................ vi

1.0 INTRODUCTION ........................................................................................................ 1

2.0 METHODOLOGY ....................................................................................................... 3

2.1 Performance Evaluation of Xen................................................................................ 3

2.2 Checkpoint techniques for parallel applications ....................................................... 4

3.0 EXPERIMENTS........................................................................................................... 6

3.1 Performance Evaluation of Xen................................................................................ 6

3.1.1 UnixBench v4.0.1 .............................................................................................. 6

3.1.2 Intel MPI Benchmark Suite (IMB) v3.0 ............................................................ 7

3.2 Checkpoint techniques for parallel applications ....................................................... 8

4.0 RESULTS ................................................................................................................... 11

4.1 Performance Evaluation of Xen.............................................................................. 11

4.1.1 UnixBench Results........................................................................................... 11

4.1.2 Intel MPI Benchmark (IMB) results ................................................................ 12

4.2 Checkpoint techniques for parallel applications ..................................................... 13

4.2.1 save/restore space/disk results ......................................................................... 13

4.2.2 Naïve checkpoint results .................................................................................. 14

4.3 Difficulties and Challenges..................................................................................... 14

4.4 Future work............................................................................................................. 15

5.0 CONCLUSIONS......................................................................................................... 17

6.0 REFERENCES ........................................................................................................... 18

APPENDICES .................................................................................................................. 19

Appendix A: UnixBenchResultsParse.py ..................................................................... 19

Appendix B: genShellScript.py .................................................................................... 20

Appendix C: MPIResultsParse.py ................................................................................ 20

Appendix D: save.py..................................................................................................... 21

Appendix E: restore.py ................................................................................................. 21

Page 4: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

iv

LIST OF ILLUSTRATIONS

Figure 1. Raw Output of the Benchmark Suite.............................................................. 7

Figure 2. PingPong Operation [9].................................................................................... 8

Figure 3. UnixBench Results .......................................................................................... 11

Figure 4. UnixBench Results Test Legend .................................................................... 12

Figure 5. MPI PingPong results..................................................................................... 13

Table 1. Naïve checkpoint results .................................................................................. 14

Page 5: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

v

GLOSSARY

Checkpoint – save the state of an operation to be restored later in the case of a failure

Cluster – a collection of nodes that work on a computation problem together by dividing

the problem into smaller tasks

Guest – a virtual machine created in Xen

Initrd – a temporary file system used by the Linux kernel during boot

Kernel – a piece of software responsible for providing secure access to the machine's

hardware to various computer programs

Native machine – the machine running the operating system without virtualization

Node - a computational processor or machine in parallel computing

Open-source – a program whose source code is made available for use or modification

Para-virtualization – a software interface that runs on top of the virtual machine

monitor to mimic the underlying hardware

Full-virtualization – a complete simulation of the underlying hardware by the virtual

machine monitor that requires special hardware support

Parallel application – a program that uses cooperative nodes to perform parallel

computing

Parallel computing - the simultaneous execution of the same task on multiple processors

or machines in order to obtain results faster [1].

PingPong – message passing between two nodes, where the nodes take turns to a

message to each other

Virtual Machine – also called “hardware virtual machine”, is a self-contained operating

environment that behaves as if it is a separate computer. In Xen, virtual machines that are

created are called guests.

Xen – an open-source virtual machine monitor

Page 6: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

vi

LIST OF ABBREVIATIONS

CPU – Central processing unit

I/O – input output

IMB – Intel MPI Benchmark Suite

MPI - Message Passing Interface

MPICH2 – an MPI implementation version 2

Page 7: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

1

1.0 INTRODUCTION

Virtual machines are often used in software development, testing, and analysis as they

provide benefits such as isolation, standardization, consolidation, ease of testing, and

mobility [2]. There are currently several virtual machine monitors available in the market,

of which one of the most popular is Xen. Xen is an open-source virtual machine monitor

under heavy development that has shown exceptional level of performance [3].

Furthermore, Xen has a built-in save/restore functionality that allows a user to save and

restore the state of a virtual machine.

In this project, the performance of Xen and its use in parallel applications are

investigated. Specifically, one of the main of objectives of this project is to execute a

quantitative performance comparison between the native machine and Xen. Because

there are currently only a handful of characterizations of Xen as it is still a developing

product, its performance results would be an interesting study to the virtual machine

community.

Another objective of this project is to analyze the feasibility of using the save/restore

functionalities in Xen for parallel applications. During parallel computing, the failure of a

node is normal due to factors such as hardware failures, power outages, or software

problems, which would lead to failure of the entire computation. In order to retain the

computational efforts before a particular node fails, common techniques such as

duplication, logging, and check-pointing can be used [3]. Using Xen, users might be able

to implement check-pointing algorithms on their parallel computing applications to

increase their fault tolerance.

In this project, several Xen virtual machines on one physical machine are installed and

configured. Afterwards, the performance of Xen is evaluated using two benchmark

suites: UnixBench [5], and Intel MPI Benchmark Suite [9]. Lastly, different test cases are

designed and executed to analyze the feasibility of using the save/restore functionalities

of Xen to checkpoint parallel applications. This project was performed alone, and was

Page 8: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

2

supervised by Matei Ripeanu.

This project specifically deals with Xen, and checkpoint techniques using Xen

save/restore functionalities. This report divides into the following primary sections:

methodology, experiments, results, and conclusions.

Page 9: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

3

2.0 METHODOLOGY

This project naturally can be categorized into two parts. Firstly, a performance evaluation

of Xen is to be executed. Secondly, a feasibility analysis of using Xen's save/restore

functionalities for check-pointing parallel applications is to be designed and done.

In order to design the appropriate tests, the different modes under which Xen can create a

virtual machine should be considered. There are two modes – para-virtualization, and full

virtualization. Para-virtualization requires the operating system to be explicitly ported to

run on top of the Xen to provide a software interface that is similar to that of the

underlying hardware; while full-virtualization provides a complete simulation of the

underlying hardware, but require special hardware support. In order to evaluate both

modes, the test hardware chosen have the support required for full virtualization. For this

project, we have chosen a Dell E520 that has an Intel processor that supports full

virtualization, which is the only real constraint for the selection of machine.

Before conducting the experiments, a correct testbed must be configured and set up. The

testbed is on a Dell E520 with virtualization technology, running on Fedora Core 6

distribution of Linux. First, a new partition of the harddrive is created using GParted [4]

so that Linux can be installed. Xen is then installed: using the fc6-kernel (2.6.19-

1.2911.fc6) for native tests, xen-kernel (2.6.19-1.2911.fc6xen) for para-virtualized tests,

and xen-hvm-loader (/xen/boot/hvmloader) for full-virtualized tests.

2.1 Performance Evaluation of Xen

The performance evaluation of Xen can be further categorized into the following areas:

computation, process creation and execution, file system operations, concurrency,

process I/O, and network I/O. In order to cover all these areas of performance, two

benchmarks were chosen: UnixBench 4.0.1, and Intel MPI Benchmark Suite (IMB) v3.0.

UnixBench, consists of ten different tests that cover all the areas mentioned except I/O;

Page 10: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

4

whereas IMB consists of over ten that measures I/O performance covers, and it is based

on the MPICH2 implementation of the Message Passing Interface (MPI) standard. IMB is

chosen because MPI is needed for the network I/O tests, as it will be detailed later. While

other benchmarks could have been chosen, these two benchmark suites were chosen

because they are free, complete, and simple to run.

In order to evaluate network I/O, a virtual cluster, which is a network of virtual machines,

is configured and set up using Xen on a single physical machine. Using a single physical

machine instead of multiple physical machines allows less equipment to be bought for

testing. In addition, Xen provides built-in functions for saving and restoring a virtual

machine's state. For coordination between the virtual machine nodes during parallel

computing, MPICH2, which is an implementation of the Message Passing Interface

(MPI) Standard, is used. This implementation is widely used in the parallel computing

community.

2.2 Checkpoint techniques for parallel applications

Parallel computing is often used to speed up computation problems that could take days,

months, or even years to complete. Some practical applications of parallel computing in

the scientific and engineering computing field include computational electromagnetics,

industrial environmental flows, and groundwater flow models [2].

During parallel computing, the failure of a node is normal due to factors such as hardware

failures, power outages, or software problems, which would lead to failure of the entire

computation. In order to retain the computational efforts before a particular node fails,

common techniques such as duplication, logging, and check-pointing can be used [3].

Check-pointing is a common technique that allows a user to save the current state of an

operation, and then later restore to a pre-failure state if an error ever occurs. Xen has

built-in functions, namely “save”, and “restore”, allow a user to save the state of a virtual

machine to a file; and to restore that virtual machine at the saved state from a file at a

Page 11: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

5

later time.

A difficulty with check-pointing for parallel applications arises because every node needs

to have the same state. In other words, for a parallel application with node A and node B,

and that the user invokes a checkpoint at time t, there could be an inconsistent state

between node A and B. For example, at the point of the checkpoint, a message might be

in transit from node A to B, where node A knows about the transfer, but B does not know

about the message since it has not received it – the two nodes would have different view

of the system and hence save different states.

There are checkpoint techniques that can be done without Xen’s save/restore. For

example, synchronization can be done at the application level, where the application

would have functionality to signal its internal functions for a checkpoint. Although this

method is more reliable, it also makes the job of the developer more difficult. In the rest

of this report, all references to check-pointing would be referred to using Xen’s

save/restore functionalities unless otherwise specified.

The original intent was to design and implement various checkpoint techniques using

Xen’s save/restore functionalities. Because of time constraints, only one checkpoint

technique is analyzed, which is to checkpoint naively by scripts without any kind of

explicit synchronization between the nodes. Specifically, “save” would be called for the

nodes involved at the same instance, and then “restore” would be called immediately

after “save” is complete – which is equivalent of a simple checkpoint with no guarantees.

In order to evaluate the success/failure condition of a checkpoint technique, a modified

version of IMB’s PingPong test is used. The test would run continuously, check-pointed,

and resume running after various idle intervals. If the test continues to run, it is

considered to be successful.

Page 12: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

6

3.0 EXPERIMENTS

3.1 Performance Evaluation of Xen

To evaluate Xen, the two benchmark suites, UnixBench 4.0.1, and Intel MPI Benchmark

Suite (IMB) v3.0 are used. In the following sections, the tests within these suites are

described briefly.

3.1.1 UnixBench v4.0.1

The UnixBench test is an open-source benchmarking tool [5], and it consists of 27 tests

that test the areas of computation, process creation and execution, file-system operations,

and concurrency. The metrics given are either bytes per second, or loops per second,

where higher number denotes better performance.

Some of the computation benchmarks include Dhrystone [6], Arithmetic tests of integer,

double, float, and various other types, compiler throughput test, and a Tower of Hanoi

recursion test [7]. For processes, tests that measure system call overhead, process

creation, execl throughput [8], pipe throughput, and context switching are used. For file-

system operations, various block sizes are tested for both reads and writes. For

concurrency, shell scripts that running concurrently are used. Overall, this benchmark

suite is a good tool to evaluate the areas mentioned for a system.

The benchmark suite is run 10 times each on the native machine (i.e. with no

virtualization), on one para-virtualized guest, and on one full-virtualized guest separately.

The average and standard deviation for each test is computed, and the performance

measurements for the virtualized guests are normalized to the native performance such

that a comparison can be done easily.

In order to run the test and capture the results, simple shell scripts were used. The

Page 13: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

7

following figure is the raw output screenshot of the results of running the suite once.

Figure 1. Raw Output of the Benchmark Suite

In order to format the data and perform computations of averages and standard deviation,

A Python script was written to parse the results and generate a summary file. The script

has to open all the related test results, take the corresponding results of each test, and

compute the average and standard deviation of each test. The script file is included in

Appendix A.

3.1.2 Intel MPI Benchmark Suite (IMB) v3.0

The Intel MPI Benchmark Suite (IMB) is an open-source benchmarking tool that is

targeted towards benchmarking the I/O of a system. Specifically, it is based on the

MPICH2 implementation of the Message Passing Interface (MPI) standard, which is

often used in parallel applications. It consists of thirteen benchmarks, and for this project,

on the most basic one, PingPong, was used to compare the I/O performance of Xen. The

following diagram is an illustration of PingPong, where X bytes is the variable size of the

message to be sent in a ping pong. The time for the message to send and received again is

used to measure the performance of the operation. Hence, a shorter time difference

Page 14: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

8

denotes better the performance.

Figure 2. PingPong Operation [9]

Similarly to UnixBench, the tests are to be done on the native machine, para-virtualized

guest, and full-virtualized guest. Differently from UnixBench benchmarks, the PingPong

benchmark requires two processes to operate. Hence, the benchmark is run ten times in

each of the following setup:

� 2 processes on 1 machine (no virtualization)

� 2 processes on 1 para-VM

� 2 processes on 2 para-VM

� 2 processes on 1 full-VM

� 2 processes on 2 full-VM

By performing the above tests, we can evaluate the I/O of Xen. Ideally, a test case for 2

processes on 2 machines would be done, but such case requires two physical machines.

Again, scripts were written to deploy the tests and format the results, which can be found

in Appendix B and Appendix C.

3.2 Checkpoint techniques for parallel applications

First, the space/time tradeoff for Xen’s save/restore functionalities is analyzed. By using

“time <command>”, the user time and the CPU usage can be determined.

Page 15: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

9

In order to evaluate different checkpoint techniques, a modified version of the Intel MPI

Benchmark Suite’s PingPong test is again used. Specifically, the messages used in the

PingPong and configured to be large (150 MB) so that the test would run for more than 5

minutes such that failure conditions can be observed. For simplicity, only two nodes are

used in the evaluation.

In order to determine the success criterion of a checkpoint technique, failure condition of

the PingPong test is first identified. Xen has a pause/unpause functionalities for virtual

machines. The transient failure of a node can be simulated by pausing the execution a

node for a desirable period of time. In order to test the time required for the PingPong test

to detect a failure, one node is paused for indefinite amount of time until an error occurs

on the other node. After running five trials, the test detects a failure in the range of 20 to

22 minutes, which can be explained the default time-out period of 20 minutes for a

Transfer Control Protocol (TCP).

Because of time constraints, only one checkpoint technique is analyzed, which is to

checkpoint naively by scripts without any kind of explicit synchronization between the

nodes. Specifically, “save” would be called for the nodes involved at the same instance.

This can be done by the following code,

xm save nodeA &

xm save nodeB &

which would allow the saving of both virtual machine nodes at the same time in the

background. The saving of the nodes is handled the operating system and Xen, which

would provide no guarantees on the states. The full script can be found in Appendix D.

The nodes can then be restored after a desirable amount of time with the following code,

xm restore nodeA &

xm restore nodeB &

which would restore both virtual machine nodes at the same time in the background.

Again, the two nodes would not be restored simultaneously. The full script can be found

Page 16: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

10

in Appendix E. Because of the nature of TCP, packets can be lost and retransmitted.

Hence, the application can tolerate some loss naturally, and the failure condition can still

be evaluated.

The following test cases are performed to evaluate the naive check-pointing, where the

two nodes can be identified as node A and node B:

� save and stop node B, then restore it after 5 minutes

� save and stop node B, then restore it after 10 minutes

� save and stop node B, then restore it after 20 minutes

� save and stop nodes A and B, then restore it after 5 minutes

� save and stop nodes A and B, then restore it after 10 minutes

� save and stop nodes A and B, then restore it after 1 hour

� save and stop nodes A and B, then restore it after 1 day

By performing the above tests, the feasibility of a checkpoint technique for a PingPong

application can be determined.

Page 17: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

11

4.0 RESULTS

4.1 Performance Evaluation of Xen

4.1.1 UnixBench Results

The following graph is the results for the UnixBench benchmark, where the different

colors represent the different testbeds – native, para-virtualized, and full-virtulized. The

x-axis is the benchmark test denoted in Figure 4 on the next page; while the y-axis is the

normalized performance values. A higher number denotes better performance.

Unix Bench results

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Tests

No

rmali

zed

valu

es

Native

ParaVirtualized

Fully Virtualized

Figure 3. UnixBench Results

As seen in the above graph, the performance of para-virtualization is close to native for

computation, while the filesystem performance of para-virtualization even exceeds the

native performs. A possible explanation is that since para-virtualization is performed in

software, Xen would have likely cached the operation request and sent a “complete” to

Page 18: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

12

the application before actually completing the task.

However, performance relating to processes/pipes (test 5-7) and concurrency (tests 17-

19), para-virtualization has only half the performance of native, while full-virtualization

performs even worse.

Figure 4. UnixBench Results Test Legend

4.1.2 Intel MPI Benchmark (IMB) results

The following graph is the results for the MPI PingPong benchmark, where the different

colors represent the different testbeds –

� 2 processes on 1 machine (native1Machine)

� 2 processes on 1 para-VM (para1VM)

� 2 processes on 2 para-VM (para2VM)

� 2 processes on 1 full-VM (Full1VM)

� 2 processes on 2 full-VM (Full2VM)

The x-axis is the different message block sizes; while the y-axis is the throughput values.

Page 19: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

13

A higher number denotes better performance.

Figure 5. MPI PingPong results

As seen in the above graph, the performance of para-virtualization is a bit slower than

native, while full-virtualization is significantly slower than native performance for all

block sizes.

4.2 Checkpoint techniques for parallel applications

4.2.1 save/restore space/disk results

The “save” function of Xen on the test machine takes on average less than one second

user time and 3% CPU, but requires ~133MB of disk space; while the restore function of

Xen takes less than one second of user time.

Page 20: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

14

4.2.2 Naïve checkpoint results

Test case Results

save and stop node B, then restore it after 5 minutes No Failure

save and stop node B, then restore it after 10 minutes No Failure

save and stop node B, then restore it after 25 minutes Failure

save and stop nodes A and B, then restore it after 5 minutes No Failure

save and stop nodes A and B, then restore it after 10 minutes No Failure

save and stop nodes A and B, then restore it after 1 hour No Failure

save and stop nodes A and B, then restore it after 1 day No Failure

Table 1. Naïve checkpoint results

According to the results above, Naïve check-pointing using Xen’s save/restore

functionalities is feasible for applications similar to the PingPong benchmark. Because

total failure occurs after 20 minutes, checkpoints should be made at least once every 20

minutes.

4.3 Difficulties and Challenges

There have been numerous challenges and roadblocks in the process of setting up the

testbed. Several reasons contributed to the problem: hardware incompatibility, my

unfamiliarity with the platform and software, and the immaturity of Xen. The details of

the challenges are described in this section.

At first, the testbed was to be installed and deployed on a IBM Thinkpad laptop, which

was chosen for its portability. However, during the course of setup, it was found that Xen

3.0 required Physical Address Extension (PAE), which is a feature the laptop did not

support. This requirement was not obvious in documentation of Xen at the time of

purchasing the laptop for this project. Consequently, identifying the problem and looking

for workarounds caused delay in the original schedule of the project. After considerable

effort to have Xen operational on the laptop, a new desktop computer was purchased

Page 21: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

15

instead – which is the Dell E520 with full virtualization support.

The installation of Linux on the test hardware (Dell E520) did not go as smooth as

expected. There was trouble partitioning the disk using a method I was previously using

(QtParted [11]) because the Linux rescue CD was not mounting for hardware

configuration reasons. An alternative method (GParted [4]) was found, but the problem

has already caused delay to the schedule.

There were also problems during the installation and configuration of Xen. Because para-

virtualization requires a software interface to mimic the underlying hardware, a

modification to the Xen kernel was required for my hardware. Originally, third-party pre-

built disk images [10] were to be used to reduce development time. However, due to

hardware differences, these images did not function. Finally, after many failed attempts

and trials of different workarounds suggested by the Xen community, it was found that

the recent Xen kernel was missing essential modules for my setup, and a workaround was

used by building a new initrd based on the original initrd with the missing modules.

Problems were encountered with the virtual Ethernet hardware. Up to date, there is still

no active Internet connection from the nodes to the outside world. However, to

workaround this problem, a virtual local area network between the nodes has been set up

using static IP addresses, which is sufficient for this project's purposes.

4.4 Future work

In this project, the Fedora Core 6 distribution of Linux was used. Instead, future

experiments can involve using different versions of Windows, as well as other operating

systems that support Xen. Furthermore, in this project, only one physical machine was

used. Therefore, in the future, effects of using two physical machines for testing network

I/O could be investigated.

Since only one checkpoint technique was investigated, more techniques should be

Page 22: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

16

designed and tested in the future. In addition, instead of using a third-party test for

checkpoint feasibility, one could develop custom software. Such software can have the

benefits of giving the tester more control over the tests, as well as providing more

debugging information such as the number of messages lost and retransmitted.

Page 23: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

17

5.0 CONCLUSIONS

In this project, the performance of Xen and its use in parallel applications were

investigated. Firstly, Xen was installed in a Fedora Core 6 Distribution of Linux, and the

performance differences of native, para-virtualized, and full-virtualized machines were

compared using benchmark suites. Secondly, different test cases were designed and

executed to analyze the feasibility of using the save/restore functionalities of Xen to

checkpoint parallel applications. Specifically, a naïve checkpoint approach was used.

As Xen is still a maturing product, difficulties were faced were setting up the testbed due

to bugs in Xen and its lack of documentation.

For the performance evaluation, it was found that para-virtualization and full-

virtualization performed close to the native performance in the area of computation.

However, para-virtualization performed only half as well in the areas of processes/pipes

and concurrency compared to native; whereas full-virtualization performed only half as

well in the same areas as para-virtualization. In the area of I/O, para-virtualization

performed close to native performance, while full-virtualization performed ten times

worse. While Xen performs well in the area of computation, the overheads introduced in

the Xen virtualization, especially full virtualization, might be significant to user

applications.

For the checkpoint evaluation, it was found that the naïve checkpoint technique can be

used for parallel applications similar to the PingPong test performed. A modification to

the feasibility test could be developed, and more checkpoint techniques could be

investigated in the future.

Page 24: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

18

6.0 REFERENCES

[1] “Parallel Computing”, http://en.wikipedia.org/wiki/Parallel_computing, April 2007.

[2] Chao, Wellie. "The Pros and Cons of Virtual Machines in the Datacenter",

http://www.devx.com/vmspecialreport/Article/30383, January 2006.

[3] “The difference between Xen & VMware”.

http://linux.inet.hr/the_difference_between_xen_and_vmware.html, November 2006.

[4] “GParted”. http://gparted.sourceforge.net/, April 2007.

[5] “UnixBench”. http://www.unixbench.org/, April 2007.

[6] “Dhrystone”. http://en.wikipedia.org/wiki/Dhrystone, April 2007.

[7] “Tower of Hanoi”. http://en.wikipedia.org/wiki/Tower_of_Hanoi, April 2007.

[8] “execl()”.http://mkssoftware.com/docs/man3/execl.3.asp, April 2007

[9] Intel Corporation. “Intel Cluster Toolkit 3.0 for Linux”, April 2007.

[10] “Jailtime.org: Downloadable Images for Xen”. http://www.jailtime.org, April 2007.

[11] “QTParted”. http://qtparted.sourceforge.net/, April 2007.

Page 25: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

19

APPENDICES

Appendix A: UnixBenchResultsParse.py

import sys

testList = []

for filename in sys.argv[1:]:

newList = []

curFile = open(filename)

fileList = curFile.readlines()

curFile.close

start = False

for line in fileList:

if line.find('Dhrystone') != -1 and line.find('lps') != -1:

start = True

if start is True:

line = line.replace(" lps", "lps").replace(" KBps", "KBps").replace(" lpm", "lpm")

splitLine = line.split(" ")

i = 0

for eachItem in splitLine:

if eachItem.find("lps") != -1 or eachItem.find("KBps") != -1 or \

eachItem.find("lpm") != -1:

number = splitLine[i]

newList.append([ " ".join(splitLine[0:i]).rstrip(), number ])

break

i = i+1

if line.find('Recursion Test') != -1 and line.find('lps') != -1:

break

testList.append(newList)

#print testList

numTests = len(testList)

for i in range(0, len(testList[0])):

total = 0.0

for j in range(0, len(testList)):

number = float(testList[j][i][1].replace("lps", "").replace("KBps", "").replace("lpm", ""))

total = total + number

average = total/numTests

for j in range(0, len(testList)):

number = float(testList[j][i][1].replace("lps", "").replace("KBps", "").replace("lpm", ""))

total = total + pow((number - average), 2)

stddev = pow(total/numTests, 0.5)

errorRange = 0.0

if (average != 0.0):

errorRange = stddev/average

print testList[0][i][0], "\t", "%.2f" % average, "\t", "%.2f" % stddev, "\t", "%.2f" % (errorRange)

Page 26: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

20

Appendix B: genShellScript.py

test = "";

for i in range(10):

test = test + "mpirun -n 2 ./IMB-MPI1 | tee test" + str(i) + ".txt;";

print test

Appendix C: MPIResultsParse.py

import sys

fullList = []

current = 0

benchmark = ["PingPong", "PingPing", "Sendrecv"]

for i in range(0 , len(benchmark)):

testList = []

for filename in sys.argv[1:]:

newList = []

curFile = open(filename)

fileList = curFile.readlines()

curFile.close

start1 = False

start2 = False

for line in fileList:

if line.find(benchmark[current]) != -1 and line.find('Benchmarking') != -1:

start1 = True

if start2 is True:

splitLine = line.replace("\n", "").split(" ")

if len(splitLine) <= 3:

break

splitLineMod = []

for item in splitLine:

if item != '':

splitLineMod.append (item)

value = splitLineMod[0]

if (len(newList) <= i):

newList.append(value)

else:

newList[i] = value

i = i + 1

if start2 is False and start1 is True:

if line.find("sec") != -1:

start2 = True

i = 0

testList.append(newList)

fullList.append([benchmark[current], testList])

current = current + 1

for i in fullList:

print i[0]

for j in i[1]:

Page 27: Evaluation of Xen: Performance and Use in Parallel ...matei/496/OldProjects/2007.04-Virtualization-Cale… · Evaluation of Xen: Performance and Use in Parallel Applications EECE

21

for k in j:

print k

Appendix D: save.py

import sys, os

for i in sys.argv[1:]:

os.system("xm save " + i + " &")

Appendix E: restore.py

import sys, os

for i in sys.argv[1:]:

os.system("xm restore " + i + " &")