porting of cernvm to aarch64€¦ · lhc@home 2.0) [19] to the latest µcernvm framework. this...

6
Porting of μCernVM to AArch64 Felix Scheffler, 23/09/2016 Main Supervisor: Jakob Blomer ([email protected]) 2 nd Supervisor: Gerardo Ganis ([email protected]) 1 Background Virtual Machines (VMs) Virtualisation plays a vital role in computing. The aim is to distribute physical resources, e.g. CPU power, RAM, or disk space, among several virtual appliances. A so-called hypervisor is responsible for allocating and managing resources for several guest operating systems (OSs) or VMs. Virtualisation is motivated by the ease of setting up new testing or production environments across different physical platforms or OSs, the isolation of individual resources as well as improved efficiency [1]. In particular, the same physical resources can be used for different applications on demand. This is why virtualisation is an integral part in data centres world-wide. At CERN, hundreds of VMs are created and destroyed every hour [2]. This flexibility is impossible with physical machines. Virtualisation in High Energy Physics (HEP) VMs that comprise whole OSs are usually in the range of several GBs in size. This makes it hard to efficiently distribute, but also to quickly start them. In HEP, this shortcoming is addressed through μCernVM [3]. The image to be distributed comprises a stripped- down Linux OS that connects to a CernVM-Filesystem (CVMFS) [4] repository that resides on a dedicated web server. In contrast to “usual” VMs, anything that is needed from this repository is only downloaded on demand, aggressively cached and eventually released again. ARM and Virtualisation ARM has been the market leader in mobile computing for several years. Recently, they have started to also enter the server market. However, in this segment, the predominant architecture is still x86-64. In 2011, ARM introduced virtualisation support with ARMv7 to increase competitiveness and to harness the benefits of virtualisation as outlined. Since then, considerable effort has been put into the development of a native Linux-based virtualisation solution. Because Linux can be run on nearly every ARM device, harnessing existing Linux features to generate such a solution greatly enhances portability and standardisation. The native hypervisor in Linux is its Kernel-based VM (KVM). With kernel version 3.9, the Linaro Enterprise Group (LEG) successfully managed to merge an implementation based on KVM upstream [5]. Currently, LEG is also working on an OpenStack cloud [6] based on ARM’s first 64-bit (AArch64) architecture, ARMv8 [7]. Beyond operating in a small scale (e.g. on a single machine), this enables developers and users to create, test and run VMs in a large-scale open-source cloud environment. Project Motivation ARM has a strong standing in mobile devices and computing, which is a market that is driven particularly by energy considerations. The HEP community is confronted with computations carried out in the range of millions of jobs per day [8]. As a result, computing is not only a technical, but also an economic challenge. In terms of performance-to-energy ratio, ARM is already in good shape compared to Intel and AMD [9]. Even in terms of only performance, ARM is getting closer to x86 systems [5]. Thus, porting μCernVM to ARM potentially opens up a new market to harness. Should ARM become established in the server market, it is desirable to have an HEP virtualisation solution for AArch64. In regard to HEP software (independent of virtualisation), CMSSW is already ported to AArch64 [10]. Development environment Today, the market for physical ARM hardware is still comparably immature. In the server segment, HP is represented with the HPE ProLiant m400 (Moonshot) [11]. In the setting of small-scale development boards, Geekbox [12] provides a good price-performance ratio. Both platforms are used for porting μCernVM to AArch64. In particular, the HPE ProLiant m400 ships with an AppliedMicro X-Gene 8-core 64-bit System-on-Chip (SoC) with up to 2.4Ghz/core. The GeekBox comes with the RK3368, an 8-core 64bit SoC with up to 1.5Ghz/core produced by Rockchip.

Upload: others

Post on 21-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

1

Background Virtual Machines (VMs) Virtualisation plays a vital role in computing. The aim is to distribute physical

resources, e.g. CPU power, RAM, or disk space, among several virtual appliances. A so-called hypervisor

is responsible for allocating and managing resources for several guest operating systems (OSs) or VMs.

Virtualisation is motivated by the ease of setting up new testing or production environments across

different physical platforms or OSs, the isolation of individual resources as well as improved efficiency

[1]. In particular, the same physical resources can be used for different applications on demand. This

is why virtualisation is an integral part in data centres world-wide. At CERN, hundreds of VMs are

created and destroyed every hour [2]. This flexibility is impossible with physical machines.

Virtualisation in High Energy Physics (HEP) VMs that comprise whole OSs are usually in the range of

several GBs in size. This makes it hard to efficiently distribute, but also to quickly start them. In HEP,

this shortcoming is addressed through µCernVM [3]. The image to be distributed comprises a stripped-

down Linux OS that connects to a CernVM-Filesystem (CVMFS) [4] repository that resides on a

dedicated web server. In contrast to “usual” VMs, anything that is needed from this repository is only

downloaded on demand, aggressively cached and eventually released again.

ARM and Virtualisation ARM has been the market leader in mobile computing for several years.

Recently, they have started to also enter the server market. However, in this segment, the

predominant architecture is still x86-64. In 2011, ARM introduced virtualisation support with ARMv7

to increase competitiveness and to harness the benefits of virtualisation as outlined. Since then,

considerable effort has been put into the development of a native Linux-based virtualisation solution.

Because Linux can be run on nearly every ARM device, harnessing existing Linux features to generate

such a solution greatly enhances portability and standardisation. The native hypervisor in Linux is its

Kernel-based VM (KVM). With kernel version 3.9, the Linaro Enterprise Group (LEG) successfully

managed to merge an implementation based on KVM upstream [5]. Currently, LEG is also working on

an OpenStack cloud [6] based on ARM’s first 64-bit (AArch64) architecture, ARMv8 [7]. Beyond

operating in a small scale (e.g. on a single machine), this enables developers and users to create, test

and run VMs in a large-scale open-source cloud environment.

Project Motivation ARM has a strong standing in mobile devices and computing, which is a market that is

driven particularly by energy considerations. The HEP community is confronted with computations

carried out in the range of millions of jobs per day [8]. As a result, computing is not only a technical,

but also an economic challenge. In terms of performance-to-energy ratio, ARM is already in good shape

compared to Intel and AMD [9]. Even in terms of only performance, ARM is getting closer to x86

systems [5]. Thus, porting µCernVM to ARM potentially opens up a new market to harness. Should

ARM become established in the server market, it is desirable to have an HEP virtualisation solution for

AArch64. In regard to HEP software (independent of virtualisation), CMSSW is already ported to

AArch64 [10].

Development environment Today, the market for physical ARM hardware is still comparably

immature. In the server segment, HP is represented with the HPE ProLiant m400 (Moonshot) [11]. In

the setting of small-scale development boards, Geekbox [12] provides a good price-performance ratio.

Both platforms are used for porting µCernVM to AArch64. In particular, the HPE ProLiant m400 ships

with an AppliedMicro X-Gene 8-core 64-bit System-on-Chip (SoC) with up to 2.4Ghz/core. The GeekBox

comes with the RK3368, an 8-core 64bit SoC with up to 1.5Ghz/core produced by Rockchip.

Page 2: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

2

Porting of µCernVM The entire porting process is broken down into the following tasks: 1.) compiling

a custom Linux kernel, 2.) creating the µCernVM image by combining the kernel with a custom initrd

and 3.) compiling a minimum set of packages to set up a preliminary CVMFS repository.

1.) Custom Linux kernel µCernVM is based on a lightweight Linux kernel compiled from sources.

Together with the initrd and a small set of device drivers, it is about 15MB in size. As such, it is much

smaller than a standard Scientific Linux 6 kernel (>100MB). Kernel configuration options are primarily

based on the existing x86-64 kernel. Remaining options are adjusted interactively through the Linux

kernel build system.

2.) µCernVM image The kernel is combined with a custom initrd into a distributable image. The initrd

contains BusyBox [13], CVMFS, a collection of bash scripts as well as some more packages. Its purpose

is to only boot a preliminary OS that eventually loads the full OS by connecting to the preconfigured

CVMFS repository. Since CVMFS is designed as a read-only filesystem, any user running µCernVM

needs to have a writable scratch area as well. This disk space is hosted by the local storage set up by

the initrd. Both layers are combined through a union file system. In the case of µCernVM, this is AUFS

[14]. The fundamental difference between the AArch64 and the x86-64 distributions of µCernVM is

the system startup. Compared to x86-64, AArch64 uses UEFI/GPT instead of BIOS/MBR standards (see

Appendix). Beyond that, the virtualised boot process basically follows the higher-level software stack

of the x86-64 µCernVM framework. Thus, other parts of the initrd, e.g. contextualisation, did not need

to be adapted.

3. CVMFS repository The CVMFS repository set up as a test environment for this project is a customised

CentOS 7 installation1. Currently, the environment is bound to a limited set of packages sufficient to

run CMS and ROOT in a command-line user interface.

Benchmarking To compare VM and host runtime performance, a subset of ROOT6 [15] and CMS2

benchmarks is run natively and virtualised. Results are shown in Figure 1. As expected, we find that

VM performance is worse than host performance in all cases. To help pinpointing where performance

is lost, a low-level I/O benchmark is performed as well. Figure 2 shows that especially network

bandwidth is significantly lower for the VM3. This is remarkable since network paravirtualisation, i.e.

virtio [16], is enabled. This requires further investigations. However, since the CMS benchmark is run

with a warm cache, it is assumed that network performance is not a major driver of this result. In terms

of serial disk I/O, caching effects are reduced by doing an fsync() call before measuring the actual

throughput4. In addition, the VM is configured such that caching to the host is eliminated5. Apart from

network paravirtualisation, no other devices are paravirtualised. In particular, the use of disk virtio

drivers starts to pay off with multiple threads [17]. In the case of single threading (as with dd), the

effect of using virtio is thus expected to be negligible. This is also verified experimentally (data not

shown). It can be further assumed that the comparably bad CMS result is not connected to whether

using disk virtio drivers or not. This can be concluded from Figure 1 (CMS2 and CMS3). Here, the same

CMS benchmark is run in parallel (2 runs) on a VM with bus=scsi and one with bus=virtio.

1 The upstream repositories can be found at http://mirror.centos.org/altarch/7/os/aarch64/Packages/. 2 Code obtained from https://github.com/cvmfs/cvmfs/test 3 Network performance was measured using iperf3. 4 time dd bs=1M count=4000 if=/dev/zero of=test.log conv=fsync 5 virt-install allows setting the cache value to either ‘none’, ‘writeback’ or ‘writethrough’. In this test case, ‘none’ was chosen (which is also configured to be the default value).

Page 3: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

3

Figure 1: Comparison of ROOT6 and CMS benchmarks run on AArch64 µCernVM and host

Figure 2: Low-level I/O benchmark

Page 4: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

4

Future work and conclusions Following the successful porting of µCernVM to AArch64, the next step is to get the image running on

cloud infrastructure, preferably OpenStack. In addition, it is recommended to run further test

benchmarks under varying conditions (VM or host configurations, load configurations, parallel threads

etc.) to gain more insight into current bottlenecks. In regard to porting µCernVM to other

architectures, we now have some experience and empirical values of how much effort is involved. The

entire work is merged upstream.

Acknowledgements Special thanks to TechLab for providing access to ARM64 infrastructure. [18]

Appendix Porting µCernVM to IA-32 Beyond the AArch64 port, µCernVM is also ported to the Intel 32bit

architecture (IA-32). This is motivated by bringing Test4Theory (also known as Virtual LHC@Home or

LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000

machines [20] (of which around 100 to 200 are connected at any given point in time [21]).

UEFI boot process To manage µCernVM instances on AArch64, QEMU is used as hypervisor (together

with KVM as accelerator). Upon booting, respective virtualisation software that acts as a top layer on

KVM (in our case libvirt and virt-install) loads an architecture-dependent firmware image. In the case

of CentOS 7 and libvirt, this is located under /usr/share/AAVMF/AAVMF_CODE.fd. NVRAM variables

are stored under /usr/share/AAVMF/AAVMF_VARS.fd. Both are located in respective upstream

repositories and are thus automatically available by installing the packages. AAVMF essentially is a

porting of OVMF to AArch64. OVMF enables UEFI support for VMs on x86-64 systems [22].

In compliance with UEFI specifications [23], µCernVM is required to be distributed as an EFI System

Partition (ESP), which is essentially a partition that is formatted by a custom FAT32-variant. Together

with the EFISTUB boot option (CONFIG_EFI_STUB=y), a respective Linux kernel can be interpreted as

just another UEFI application to be executed. After loading a predefined set of UEFI

images/applications, the last application in this chain is the UEFI shell. It is started in the root

directory of the ESP. In compliance with the UEFI shell specifications [24], the UEFI shell searches for

a file called startup.nsh. It contains a command line (<path-to-kernel-image> initrd=<path-to-initrd>

other-kernel-command-line-parameters) that can be interpreted by AAVMF. Based on this command

line, AAVMF is able to start the kernel with the initrd and other (optional) parameters. All three

components, i.e. kernel, initrd and startup.nsh, are located under the root of the ESP.

The command for starting the VM is virt-install -n <VM name> --boot uefi --memory <RAM in MB> --

vcpus <no. of CPUs> --cpu host --disk path=<path-to-hdd>,format=raw --cdrom <path-to-

contextualisation-iso> --virt-type kvm --accelerate. Note that the initrd also needs to take care of

probably repairing any GPT-partitioned disk prior to mounting CVMFS. This is due to the fact that

cloud providers usually do a simple dd if=/dev/zero of=<path-to-hdd> bs=<bs> count=0 seek=<seek>;

dd if=<path-to-cernvm-image> of=<path-to-hdd>. While this does not pose a problem to MBR-

partitioned disks, it certainly does for GPT-partitioned ones. In this case, two actions need to be

taken. First, the secondary GPT table needs to be moved to the (new) end of the disk. Second, the

primary table needs to be updated with the corresponding position of the secondary table.

Otherwise, the newly created space cannot be used. This pitfall was resolved by issuing sgdisk -e

<path-to-hdd>.

Page 5: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

5

References

[1] J. Shuja, A. Gani, K. Bilal, A. U. R. Khan, S. A. Madani, S. U. Khan and A. Y. Zomaya, “A Survey of

Mobile Device Virtualization: Taxonomy and State of the Art,” ACM Computing Surveys, vol. 49,

no. 1, April 2016.

[2] CERN, “Data Centre (in numbers),” [Online]. Available: http://information-

technology.web.cern.ch/about/computer-centre. [Accessed 20 September 2016].

[3] J. Blomer, D. Berzano, P. Buncic, I. Charalampidis, G. Ganis, G. Lestaris, R. Meusel and V.

Nicolaou, “Micro-CernVM: slashing the cost of building and deploying virtual machines,”

Journal of Physics: Conference Series, vol. 513, no. 032009, pp. 1-7, 2014.

[4] CERN, “CernVM File System (CernVM-FS),” [Online]. Available:

https://cernvm.cern.ch/portal/filesystem. [Accessed 20 September 2016].

[5] C. Dall and J. Nieh, “KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor,”

March 2014. [Online]. Available: http://systems.cs.columbia.edu/files/wpid-asplos2014-

kvm.pdf. [Accessed 5 September 2016].

[6] Linaro Enterprise Group, “Linaro announces ARM Based Developer Cloud,” 7 March 2016.

[Online]. Available: http://www.linaro.org/news/linaro-announces-arm-based-developer-cloud-

2/. [Accessed 20 September 2016].

[7] ARM, “ARMv8-A Architecture,” ARM Ltd., [Online]. Available:

http://www.arm.com/products/processors/armv8-architecture.php. [Accessed 5 September

2016].

[8] CERN, “Computing,” [Online]. Available: https://home.cern/about/computing. [Accessed 18

September 2016].

[9] B. Tudor and Y. M. Teo, “On understanding the energy consumption of ARM-based multicore

servers,” SIGMETRICS Perform. Eval. Rev., vol. 41, no. 1, pp. 267-278, 2013.

[10] D. Abdurachmanov, “ARM64/AArch64 for Scientific Computing at the CERN CMS Particle

Detector,” September 2015. [Online]. Available:

https://indico.cern.ch/event/443246/contributions/1098100/attachments/1154061/

1658004/Linaro.SFO.Preparation.Talk.Draft.pdf. [Accessed 20 September 2016].

[11] Hewlett Packard Enterprise, “HPE ProLiant m400 Server Cartridge,” Hewlett Packard Enterprise,

[Online]. Available: http://www8.hp.com/us/en/products/proliant-servers/product-

detail.html?oid=7398907. [Accessed 6 September 2016].

[12] GeekBox, “GeekBox - The pioneering versatile open source TV box,” [Online]. Available:

http://www.geekbox.tv/. [Accessed 20 September 2016].

[13] R. Landley, B. Reutner-Fischer and D. Vlasenko, “BusyBox: The Swiss Army Knife of Embedded

Linux,” [Online]. Available: https://busybox.net/about.html. [Accessed 20 September 2016].

Page 6: Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000 machines [20] (of which around 100 to 200 are

Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016

Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])

6

[14] J. R. Okajima, “AUFS,” [Online]. Available: http://aufs.sourceforge.net/. [Accessed 12

September 2016].

[15] R. Brun and F. Rademakers, “Running the ROOT benchmark suite,” CERN, 8 September 2006.

[Online]. Available: https://root.cern.ch/root/Benchmark.html. [Accessed 7 September 2016].

[16] M. T. Jones, “Virtio: An I/O virtualization framework for Linux - Paravirtualized I/O with KVM

and lguest,” 29 January 2010. [Online]. Available:

http://www.ibm.com/developerworks/library/l-virtio/. [Accessed 15 September 2016].

[17] K. Huynh and S. Hajnoczi, “KVM / QEMU Storage Stack Performance Discussion,” IBM, 3

November 2010. [Online]. Available:

http://www.ibm.com/support/knowledgecenter/linuxonibm/liaav/LPCKVMSSPV2.1.pdf.

[Accessed 15 September 2016].

[18] CERN, “TechLab,” [Online]. Available: https://twiki.cern.ch/twiki/bin/viewauth/IT/TechLab.

[Accessed 20 September 2016].

[19] CERN, “Test4Theory,” [Online]. Available: http://lhcathome.web.cern.ch/projects/test4theory.

[Accessed 20 September 2016].

[20] CERN, “Test4Theory - Server status,” [Online]. Available:

http://lhcathome2.cern.ch/vLHCathome/server_status.php. [Accessed 20 September 2016].

[21] CERN, “MC Production,” [Online]. Available: http://mcplots-dev.cern.ch/production.php.

[Accessed 21 September 2016].

[22] L. Ersek, “Open Virtual Machine Firmware (OVMF) Status Report,” July 2014. [Online].

Available: http://www.linux-kvm.org/downloads/lersek/ovmf-whitepaper-c770f8c.txt.

[Accessed 20 September 2016].

[23] UEFI.org, “Unified Extensible Firmware Interface Specification,” January 2016. [Online].

Available: http://www.uefi.org/sites/default/files/resources/UEFI%20Spec%202_6.pdf.

[Accessed 12 September 2016].

[24] UEFI.org, “UEFI Shell Specification,” 26 January 2016. [Online]. Available:

http://www.uefi.org/sites/default/files/resources/UEFI_Shell_2_2.pdf. [Accessed 12

September 2016].