could the “c” in hpc stand for cloud?
TRANSCRIPT
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 1/12
Thought Leadership White Paper
IBM Systems and Technology Group November 2012
Could the “C” in HPCstand or Cloud?
By Christopher N. Porter, IBM Corporation
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 2/12
2 Could the “C” in HPC stand or Cloud?
Introduction Most IaaS (inrastructure as a service) vendors such as
Rackspace, Amazon and Savvis use various virtualization
technologies to manage the underlying hardware they
build their oerings on. Unortunately the virtualization
technologies used vary rom vendor to vendor and are
sometimes kept secret. Thereore, the question about
virtual machines versus physical machines or high
perormance computing (HPC) applications is germaneto any discussion o HPC in the cloud.
This paper examines aspects o computing important in HPC
(compute and network bandwidth, compute and network
latency, memory size and bandwidth, I/O, and so on) and how
they are aected by various virtualization technologies. The
benchmark results presented will illuminate areas where cloud
computing, as a virtualized inrastructure, is sucient or some
workloads and inappropriate or others. In addition, it will
provide a quantitative assessment o the perormance
dierences between a sample o applications running on
various hypervisors so that data-based decisions can be made
or datacenter and technology adoption planning.
A business case or HPC clouds
HPC architects have been slow to adopt virtualization
technologies or two reasons:
1. The common assumption that virtualization impacts
application perormance so severely that any gains in
fexibility are ar outweighed by the loss o application
throughput.
2. Utilization on traditional HPC inrastructure is very high
(between 80 - 95 percent).Thereore, the typical driving
business cases or virtualization (or example, utilization o
hardware, server consolidation or license utilization) simply
did not hold signicant enough merit to justiy the added
complexity and expense o running workload in virtualized
resources.
In many cases, however, HPC architects would be willing tolose some small percentage o application perormance to
achieve the fexibility and resilience that virtual machine based
computing would allow. There are several reasons architects
may make this compromise, including:
• Security: Some HPC environments require data and host
isolation between groups o users or even between the users
themselves. In these situations VMs and VLANs can be used
in consort to isolate users rom each other and isolate data to
the users who should have access to it.
• Applicationstackcontrol:In a mixed application
environment where multiple applications share the samephysical hardware, it can be dicult to satisy the
conguration requirements o each application, including OS
versions, updates and libraries. Using virtualization makes that
task easier since the whole stack can be deployed as part o the
application.
• Highvalueassetmaximization:In a heterogeneous HPC
system the newest machines are oten in highest demand. To
manage this demand, some organizations use a reservation
system to minimize conficts between users. When using VMs
or computing, however, the migration acility available within
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 3/12
IBM Systems and Technology Group 3
most hypervisors allows opportunistic workloads to use high
value assets by even ater a reservation window opens or a
dierent user. I the reserving user submits workload against a
reservation, then the opportunistic workload can be migrated
to other assets to continue processing without losing any CPU
cycles.
• Utilizationimprovement:I the losses in application
perormance are very small (single digit percentages), then
adoption o virtualization technology may enable incrementalsteps orward in overall utilization in some cases. In these
cases, virtualization may oer an increase in overall HPC
throughput or the HPC environment.
• Largeexecutiontimejobs:Several HPC applications oer
no checkpoint restart capability. VM technology can capture
and checkpoint the entire state o the virtual machine,
however, allowing or checkpoint o these applications. I jobs
run long enough to be at the same MTBF or the solution as a
whole, then the checkpoint acility available within virtual
machines may be very attractive. Additionally, i server
maintenance is a common or predictable occurrence, then
checkpoint migration or suspension o a long running job
within a VM could prevent loss o compute time.
• Increasesinjobreliability: Virtual machines, i used on a 1:1
basis with batch jobs (meaning each job runs within a VM
container), provide a barrier between their own environment,
the host environment and any other virtual machine
environments running on the hypervisor. As such, “rogue”
jobs which try and access more memory or cpu cores than
expected can be isolated rom well behaved jobs allocated
resources as expected. Such a situation without virtual
machine containment, where jobs share a physical host oten
cause problems in the orm o slowdowns, swapping or even
OS crashes.
Management tools
Achieving HPC in a cloud environment requires a ew well
chosen tools including a hypervisor platorm, workload
manager and an inrastructure management toolkit. The
management toolkit provides the policy denition,
enorcement, provisioning management, resource reservation
and reporting. The hypervisor platorm provides the
oundation or the virtual portion o cloud resources and the
workload manager provides the task management.
The cloud computing management tools o IBM® Platorm
Computing™—IBM® Platorm™ Cluster Manager –
Advanced Edition and IBM® Platorm™ Dynamic Cluster—
turn static clusters, grids and datacenters into dynamic shared
computing environments. The products can be used to create
private internal clouds or hybrid private clouds, which use
external public clouds or peak demand. This is commonly
reerred to as “cloud bursting” or “peak shaving.”
Platorm Cluster Manager – Advanced Edition creates a cloud
computing inrastructure to eciently manage application workloads applied to multiple virtual and physical platorms. It
does this by uniting diverse hypervisor and physical
environments into a single dynamically shared inrastructure.
Although this document describes the properties o virtual
machines, Platorm Cluster Manager – Advanced Edition is
not in any way limited to managing virtual machines. It
unlocks the ull computing potential lying dormant in existing
heterogeneous virtual and physical resources according to
workload-intelligent and resource-aware policies.
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 4/12
4 Could the “C” in HPC stand or Cloud?
Platorm Cluster Manager – Advanced Edition optimizes
inrastructure resources dynamically based on perceived
demand and critical resource availability using an API or a web
interace. This allows users to enjoy the ollowing business
benets:
• By eliminating silos resource utilization can be improved
• Batch job wait times are reduced because o additional
resource availability or fexibility • Users perceive a larger resource pool
• Administrator workload is reduced through multiple layers o
automation
• Power consumption and server prolieration is reduced
Subsystem benchmarksHardware environment and settings
KVMandOVMtesting
Physical hardware: (2) HP ProLiant BL465cG5 with Dual
Socket Quad Core AMD 2382 + AMD-V and 16 GB RAM
OS Installed: RHEL 5.5 x86_64 Hypervisor(s): KVM in RHEL 5.5, OVM 2.2, RHEL 5.5 Xen
(para-virtualized)
Number o VMs per physical node: Unless otherwise noted,
benchmarks were run on a 4 GB memory VM.
Interconnects: The interconnect between VMs or hypervisors
was never used to run the benchmarks. The hypervisor hosts
were connected to a 1000baseT network.
CitrixXentesting
Physical hardware: (2) HP ProLiant BL2x220c in a c3000
chassis with dual socket quad core 2.83 GHz Intel® CPUs and
8 GB RAM
OS Installed: CentOS Linux 5.3 x86_64
Storage: Local Disk
Hypervisor: Citrix Xen 5.5
VM Confguration: (Qty 1) 8 GB VM with 8 cores, (Qty 2) 4
GB VMs with 4 cores, (Qty 4) 2 GB VMs with 2 cores, (Qty 8)1 GB VMs with 1 core
NetPIPE
NetPIPE is an acronym that stands or Network Protocol
Independent Perormance Evaluator.1 It is a useul tool or
measuring two important characteristics o networks: latency
and bandwidth. HPC application perormance is becoming
increasingly dependent on the interconnect between compute
servers. Because o this trend, not only does parallel application
perormance need to be examined, but also the perormance
level o the network alone rom both the latency and the
bandwidth standpoints.
The terms used or each data series in this section are dened
as ollows:
• no_bkpln: Reers to communications happening over a
1000baseT Ethernet network
• same_bkpln:Reers to communications traversing a
backplane within a blade enclosure
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 5/12
IBM Systems and Technology Group 5
• diff_hyp:Reers to virtual machine to virtual machine
communication occurring between two separate physical
hypervisors
• pm2pm:Physical machine to physical machine
• vm2pm: Virtual machine to physical machine
• vm2vm: Virtual machine to virtual machine
Figures 1 and 2 illustrate that the closer the two entities
communicating are, the higher the bandwidth and lower thelatency between them. Additionally they show that when there
is a hypervisor layer between the entities, the communication is
slowed only slightly, and latencies stay in the expected range
or 1000baseT communication (60 - 80 µsec). When two
dierent VMs on separate hypervisors communicate—even
when the backplane is within the blade chassis—the latency is
more than double. The story gets even worse (by about 50
percent) when the two VMs do not share a backplane and
communicate over TCP/IP.
This benchmark illustrates that not all HPC workloads are
suitable or a virtualized environment. When applications runin parallel and are latency sensitive (as many MPI based
applications are), using virtualized resources may be something
that should be avoided. I there is no choice but to use
virtualized resources, then the scheduler must have the ability
to choose resources that are adjacent to each other on the
network or the perormance is likely to be unacceptable. This
conclusion also applies to transactional applications where
latency can be the largest part o the ‘submit to receive cycle
time.’
Figure 1: Network bandwidth between machines
Figure 2: Network latency between machines
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 6/12
6 Could the “C” in HPC stand or Cloud?
IOzone
IOzone is a le system benchmarking tool, which generates
and measures a variety o le operations.2 In this benchmark,IOzone was only run or write, rewrite, read and reread to
mimic the most popular unctions an I/O subsystem perorms.
This steady state I/O test clearly demonstrates that KVM
hypervisors are severely lacking when it comes to I/O to disk
in both reads and writes. Even in the OVM case, in a best case
scenario the perormance o the I/O is nearing 40 percent
degradation. Write perormance or Citrix Xen is also limited.
However, read perormance exceeds that o the physical
machine by over 7 percent. This can only be attributed to a
read-ahead unction in Xen, which worked better than the
native Linux read-ahead algorithm.
Figure 3: IOzone 32 GB fle (Local disk) Figure 4: IOzone 32 GB fle (Local disk)
Regardless, this benchmark, more than others, provides a
warning to early HPC cloud adopters o the perormance risks
o virtual technologies. HPC users running I/O boundapplications (Nastran, Gaussian, certain types o ABAQUS
jobs, and so on) should steer clear o virtualization until these
issues are resolved.
Application benchmarksSotware compilation
Compiler used: gcc-4.1.2
Compilation target: Linux kernel 2.6.34 (with ‘decong’ option).
All transient les were put in a run specic subdirectory using
the ‘O’ option in make. Thus the source is kept in read-only
state and writes are into the run specic sub-directory.
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 7/12
IBM Systems and Technology Group 7
Figure 5 shows the dierence in compilation perormance or a
physical machine running a compile on an NFS volume
compared to Citrix Xen doing the same thing on the sameNFS volume. Citrix Xen is roughly 11 percent slower than the
physical machine perorming the task. Also included is the
dierence between compiling to a local disk target versus
compiling to the NFS target on the physical machine. The
results illustrate how NFS perormance can signicantly aect
a job’s elapsed time. This is o crucial importance because most
virtualized private cloud implementations utilize NFS as the
le system instead o using local drives to acilitate migration.
SIMULIA® Abaqus
SIMULIA® Abaqus3 is the standard o the manuacturing
industry or implicit and explicit non-linear nite element
Figure 5: Compilation o kernel 2.6.34 Figure 6 : Parallel ABAQUS explicit (e2.inp)
solutions. SIMULIA publishes a benchmark suite that
hardware vendors use to distinguish their products.4 “e2” and
“s6” were used or these benchmarks.
The ABAQUS explicit distributed parallel runs were
perormed using HP MPI (2.03.01) and scratch les were
written to local scratch disk. This comparison, unlike the
others presented in this paper, was done in two dierent ways:
1. The data series called “Citrix” is or a single 8 GB RAM VM
with 8 cores where the MPI ranks communicated within a
single VM.
2. The data series called “Citrix – Dierent VMs” represents
multiple separate VMs dened on the hypervisor host
intercommunicating.
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 8/12
8 Could the “C” in HPC stand or Cloud?
Figure 7: Parallel ABAQUS standard (s6.inp)
As expected, the additional layers o virtualized networking
slowed the communication speeds (also shown in the NetPIPE
results) and reduced scalability when the job had higher rank counts. In addition, or communications within a VM, the
perormance or a virtual machine compared to the physical
machine was almost identical.
ABAQUS has a dierent algorithm or solving implicit Finite
Element Analysis (FEA) problem called “ABAQUS Standard.”
This method does not run distributed parallel, but can be run
SMP parallel which was done or the “s6” benchmark.
Figure 8: Serial FLUENT 12.1
Typically ABAQUS Standard does considerably more I/O to
scratch disk than its explicit counterpart. However, this is
dependent upon the amount o memory available in theexecution environment. It is clear again that when an
application is only CPU or memory constrained, a virtual
machine has almost no detectable perormance impact.
ANSYS® FLUENT
ANSYS® FLUENT5 belongs to a large class o HPC
applications reerred to as computational fuid dynamics (CFD)
codes. The “aircrat_2m” FLUENT model was selected based
on size and run or 25 iterations. The “sedan_4m” model was
chosen as a suitable sized model or running in parallel.
Hundred iterations were perormed using this model.
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 9/12
IBM Systems and Technology Group 9
Figure 9: Distributed parallel FLUENT 12.1 (sedan_4m - 100 iterations)
Though CFD codes such as FLUENT are rarely run serially
because o memory requirements or solution time
requirements, the comparison in Figure 8 shows that thesolution time or a physical machine and a virtual machine are
dierent by only 1.9 percent where the virtual machine is the
slower o the two. The “aircrat_2m” model was simply too
small to scale well in parallel, and provided strangely varying
results, so the sedan_4m model was used.6
The result or the parallel case (Figure 9) illustrate that at two
CPUs the virtual machine outperorms the physical machine.
This is most likely caused by the native Linux scheduler
moving processes around on the physical host. I the
application had been bound to particular cores, then this eect
would disappear. In the our and eight CPU runs the dierence
between physical and virtual machines is negligible. Thissupports the theory that the Linux CPU scheduler is impacting
the two CPU job.
LS-DYNA®
LS-DYNA®7 is a transient dynamic nite element analysis
program capable o solving complex real world time domain
problems on serial, SMP parallel, and distributed parallel
computational engines. The “rened_neon_30ms” model was
chosen or benchmarks reviewed in this section. HP MPI
2.03.01, now owned by IBM Platorm Computing was the
message passing library used.
Figure 10: LS-DYNA - MPP971 - Refned Neon
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 10/12
10 Could the “C” in HPC stand or Cloud?
The MPP-DYNA application responds well when run in a low
latency environment. This benchmark supports the notion that
distributed parallel LS-DYNA jobs are still very sensitive to
network latency, even when using a backplane o a VM. A serial
run shows a virtual machine is 1 percent slower. Introduce
message passing, however, and at eight CPUs the virtual
machine is nearly 40 percent slower than the physical machine.
The expectation is that i the same job was run on multiple
VMs as was done or ABAQUS Explicit parallel jobs, the eect would be even greater, where physical machines signicantly
outperorm virtual machines.
Conclusion As with most legends, there is some truth to the notion that
VMs are inappropriate or HPC applications. The benchmark
results demonstrate that latency sensitive and I/O bound
applications would perorm at levels unacceptable to HPC
users. However, the results also show that CPU and memory
bound applications and parallel applications that are not
latency sensitive perorm well in a virtual environment. HPC
architects who dismiss virtualization technology entirely may
thereore be missing an enormous opportunity to inject
fexibility and even a perormance edge into their HPC
designs.
The power o Platorm Cluster Manger - Advanced Edition
and IBM® Platorm™ LSF® is their ability to work in consort
to manage both o these types o workload simultaneously in asingle environment. These tools allow their users to maximize
resource utilization and fexibility through provisioning and
control at the physical and virtual levels. Only IBM Platorm
Computing technology allows or environment optimization at
the job-by-job level, and only Platorm Cluster Manager –
Advanced Edition continues to optimize that environment
ater jobs have been scheduled and new jobs have been
submitted. Such an environment could realize orders o
magnitude increases in eciency and throughput while
reducing the overhead o IT maintenance.
Signifcant results
• The KVM hypervisor signifcantly outperorms the OVM hypervisor on AMD servers, especially when several VMs run
simultaneously.
• Citrix Xen I/O read and rereads are very ast on Intel servers.
• OVM outperorms KVM by a signifcant margin or I/O intensive applications running on AMD servers.
• I/O intensive and latency sensitive parallel applications are not a good ft or virtual environments today.
• Memory and CPU bound applications are at perormance parity between physical and virtual machines.
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 11/12
Notes
7/29/2019 Could the “C” in HPC stand for Cloud?
http://slidepdf.com/reader/full/could-the-c-in-hpc-stand-for-cloud 12/12
For more information To learn more about IBM Platorm Computing, please
contact your IBM marketing representative or IBM
Business Partner, or visit the ollowing website:
ibm.com /platormcomputing
© Copyright IBM Corporation 2012
IBM CorporationSystems and Technology GroupRoute 100Somers, NY 10589
Produced in the United States o AmericaNovember 2012
IBM, the IBM logo, ibm.com, Platorm Computing, Platorm Cluster Manager, Platorm Dynamic Cluster and Platorm LSF are trademarks o International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks o IBMor other companies. A current list o IBM trademarks is available on the
web at “Copyright and trademark inormation” at ibm.com /legal/copytrade.shtml
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, IntelCentrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentiumare trademarks or registered trademarks o Intel Corporation or itssubsidiaries in the United States and other countries.
Linux is a registered trademark o Linus Torvalds in the United States,other countries, or both.
This document is current as o the initial date o publication and may bechanged by IBM at any time. Not all oerings are available in every countryin which IBM operates.
The perormance data discussed herein is presented as derived under
specic operating conditions. Actual results may vary. THEINFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED,INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the termsand conditions o the agreements under which they are provided.
Actual available storage capacity may be reported or both uncompressedand compressed data and will vary and may be less than stated.
1 http://www.scl.ameslab.gov/netpipe/
2 http://www.iozone.org/
3 ABAQUS is a trademark o Simulia and Dassault Systemes (http://www.
simulia.com)4 See http://www.simulia.com/support/v67/v67_perormance.html ordescription o the benchmark models and availability
5 Fluent is a trademark o ANSYS, Inc (http://www.fuent.com)
6 The largest model provided by ANSYS, “truck_14m”, was not an optionor this benchmark asthe model was too large to t into memory.
7 LS-DYNA is a trademark o LSTC (http://www.lstc.com/ )
Please Recycle
DCW03038-USEN-0