migration techniques in hpc environments - stefan's techblog · migration techniques in hpc...

35
Migration Techniques in HPC Environments Simon Pickartz 1 , Ramy Gad 2 , Stefan Lankes 1 , Lars Nagel 2 , Tim Süß 2 , André Brinkmann 2 , and Stephan Krempel 3 1 RWTH Aachen University 2 Johannes Gutenberg Universität 3 ParTec Cluster Competence Center GmbH In Conjunction with the FAST Project 08/26/2014

Upload: others

Post on 16-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Migration Techniques in HPC Environments

Simon Pickartz1, Ramy Gad2, Stefan Lankes1, Lars Nagel2,Tim Süß2, André Brinkmann2, and Stephan Krempel3

1RWTH Aachen University 2Johannes Gutenberg Universität3ParTec Cluster Competence Center GmbH

In Conjunction with the FAST Project

08/26/2014

Agenda

The FAST Project

Migration in HPC EnvironmentsProcess-levelVirtualizationContainer-based

EvaluationOverheadMigration Time

Conclusion

08/26/2014 | ACS Automation of Complex Power Systems | 2Simon Pickartz

The FAST Project

08/26/2014 | ACS Automation of Complex Power Systems | 3Simon Pickartz

Exascale Systems

Characteristics/AssumptionsIncreasing amount of cores per nodeIncrease of CPU performance will not be matched by I/O performance

→ Most applications cannot exploit this parallelismConsequences

Exclusive node assignment has to be revokedDynamic scheduling during runtime will be necessary

→ Migration of processes between nodes indispensable

08/26/2014 | ACS Automation of Complex Power Systems | 4Simon Pickartz

Find a Suitable Topology forExascale Applications

A twofold scheduling approach1. Initial placement of job by a global

scheduler according to KPIsLoad of the individual nodesPower consumptionApplications’ resource requirements

2. Dynamic runtime adjustmentsMigration of processesTight coupling with the applicationsFeedback to the global scheduler

Cent

ralS

ched

uler

...

Agen

tAg

ent

Agen

t

App 1

App 2

App 1

App 1

App 2

App 2

08/26/2014 | ACS Automation of Complex Power Systems | 5Simon Pickartz

Migration in HPC Environments

08/26/2014 | ACS Automation of Complex Power Systems | 6Simon Pickartz

Process-level Migration

Process

OS

Host A

Process

OS

Host B

ProcessProcessProcess

IB Fabric

08/26/2014 | ACS Automation of Complex Power Systems | 7Simon Pickartz

Process-level Migration

Move a process from one node to anotherIncluding its execution context(i. e., register state and physical memory)

A special kind of Checkpoint/Restart (C/R) operationSeveral frameworks available

Condor’s checkpoint librarylibckptBerkley Lab Checkpoint/Restart (BLCR)

08/26/2014 | ACS Automation of Complex Power Systems | 8Simon Pickartz

Berkley Lab Checkpoint/Restart

Specifically designed for HPC applicationsTwo components

Kernel module for performing the C/R operationShared-library enabling user-space access

Cooperation with the C/R procedure via callback interfaceClose/open file descriptorsTear down/reestablish communication channels. . .

→ Residual dependencies that have to be resolved!

08/26/2014 | ACS Automation of Complex Power Systems | 9Simon Pickartz

Virtual Machine Migration

Process

VM

Hypervisor

Host A

Process

VM

Hypervisor

Host B

Process

VMProcess

VM

Process

VM

IB Fabric

08/26/2014 | ACS Automation of Complex Power Systems | 10Simon Pickartz

Virtual Machine Migration

Migration of a virtualized execution environmentReduction of the residual dependencies

File descriptors still valid after the migrationCommunication channels are automatically restored (e. g. TCP)

Performance degradation of I/O devices can be compensated byPass-through (e. g. Intel VT-d)Single Root I/O virtualization (SR-IOV)

Various hypervisors availableXenKernel Based Virtual Machine (KVM). . .

08/26/2014 | ACS Automation of Complex Power Systems | 11Simon Pickartz

Kernel Based Virtual Machine

A Linux kernel module that benefits from existing resourcesSchedulerMemory management. . .

Implements full-virtualization requiring hardware supportVMs are scheduled by the host like any other processAlready provides a migration framework

Live-migrationCold-migration

08/26/2014 | ACS Automation of Complex Power Systems | 12Simon Pickartz

Software-based Sharing

Hardware Physical NIC

Hypervisor Software Virtual Switch

eth0VMeth0VM

08/26/2014 | ACS Automation of Complex Power Systems | 13Simon Pickartz

Pass-Through

Hardware Physical NIC

Hypervisor

eth0VMeth0VM

08/26/2014 | ACS Automation of Complex Power Systems | 14Simon Pickartz

Single Root I/O Virtualization

Hardware VFVF PF

Hypervisor

eth0VMeth0VM

08/26/2014 | ACS Automation of Complex Power Systems | 15Simon Pickartz

Container-based Virtualization/

Hardware

Host Kernel

Hypervisor

GuestKernel

GuestKernel

App App App

(a) Virtual Machines

Hardware

Shared Kernel

Virtualization Layer

App AppApp

(b) Containers

08/26/2014 | ACS Automation of Complex Power Systems | 16Simon Pickartz

Container-based Virtualization/

Hardware

Host Kernel

Hypervisor

GuestKernel

GuestKernel

App App App

(a) Virtual Machines

Hardware

Shared Kernel

Virtualization Layer

App AppApp

(b) Containers

08/26/2014 | ACS Automation of Complex Power Systems | 16Simon Pickartz

Container-based Migration

Full-virtualization results in multiple kernels on one nodeIdea of container-based virtualization

→ Reuse the existing host kernel for the management of multipleuser-space instances

Host and guest have to use the same operating systemCommon representatives

OpenVZLinuX Containers (LXC)

08/26/2014 | ACS Automation of Complex Power Systems | 17Simon Pickartz

Evaluation

08/26/2014 | ACS Automation of Complex Power Systems | 18Simon Pickartz

Test Environment

4-node Cluster2 Sandy-bridge Systems2 Ivy-bridge Systems

InfiniBand FDR Mellanox FabricUp to 56 GiB/sSupport for SR-IOV

OpenMPI 1.7 with BLCR support (except for the LXC results)

08/26/2014 | ACS Automation of Complex Power Systems | 19Simon Pickartz

Throughput

128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi0

2,000

4,000

6,000

Size in Byte

Thr

ough

put

inM

iB/s

ib_write_bw (Ivy-Bridge)

NativePass-ThroughSR-IOV

08/26/2014 | ACS Automation of Complex Power Systems | 20Simon Pickartz

Throughput

128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi 32 Mi0

2,000

4,000

6,000

Size in Byte

Thr

ough

put

inM

iB/s

OpenMPI (GCC 4.4.7, Ivy-Bridge)

NativePass-ThroughSR-IOVBLCR

08/26/2014 | ACS Automation of Complex Power Systems | 21Simon Pickartz

Latency

Native Pass-Through SR-IOV BLCR

InfiniBand Open MPI0

0.5

1

1.5

Late

ncy

inµ

s(R

TT

/2)

08/26/2014 | ACS Automation of Complex Power Systems | 22Simon Pickartz

NAS Parallel Benchmarks(Open MPI)

Native BLCR KVM

FT BT LU0

10

20

30

Spee

din

GFl

ops/

s

08/26/2014 | ACS Automation of Complex Power Systems | 23Simon Pickartz

NAS Parallel Benchmarks(Parastation)

Native LXC KVM

FT BT LU0

10

20

30

Spee

din

GFl

ops/

s

08/26/2014 | ACS Automation of Complex Power Systems | 24Simon Pickartz

mpiBLAST(Open MPI)

Native BLCR KVM

8 Procs. 16 Procs. 32 Procs.0

5

10

15

20

25Ru

ntim

ein

Min

utes

08/26/2014 | ACS Automation of Complex Power Systems | 25Simon Pickartz

mpiBLAST(Parastation)

Native LXC KVM

8 Procs. 16 Procs. 32 Procs.0

5

10

15

20

25Ru

ntim

ein

Min

utes

08/26/2014 | ACS Automation of Complex Power Systems | 26Simon Pickartz

Migration Time

BLCR KVM

Overall Downtime0

0.5

1

1.5

2

2.5

Tim

ein

s

08/26/2014 | ACS Automation of Complex Power Systems | 27Simon Pickartz

Conclusion

08/26/2014 | ACS Automation of Complex Power Systems | 28Simon Pickartz

Conclusion

Inconclusive microbenchmarks analysisOverhead is highly application dependentMigration

Significant overhead of KVM concerning overall migration timeQualified by the downtime testKVM already supports live-migrationBLCR requires all processes to be stopped during migration

FlexibilityProcess-level migration generates residual dependencies

→ A non-transparent approach would be requiredVM/Container-based migration reduces these dependencies

08/26/2014 | ACS Automation of Complex Power Systems | 29Simon Pickartz

Outlook

We will focus on VM migration within FASTContainers may provide even better performance

Not as flexible as VMs (e. g., same OS)More isolation than process-level migration

Investigation of the application dependency observed during ourstudiesEnable VM migration with attached pass-through devices

→ More about FAST on http://www.en.fast-project.de

08/26/2014 | ACS Automation of Complex Power Systems | 30Simon Pickartz

Thank you for your kind attention!

Simon Pickartz1, Ramy Gad2, Stefan Lankes1, Lars Nagel2,Tim Süß2, André Brinkmann2, and Stephan Krempel3

1RWTH Aachen University 2Johannes Gutenberg Universität3ParTec Cluster Competence Center GmbH – [email protected]

Institute for Automation of Complex Power SystemsE.ON Energy Research Center, RWTH Aachen UniversityMathieustraße 1052074 Aachen

www.eonerc.rwth-aachen.de

Throughput(Fedora 20)

128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi0

2,000

4,000

6,000

Size in Byte

Thr

ough

put

inM

iB/s

ib_write_bw (Ivy-Bridge)

NativePass-ThroughSR-IOVLXC

08/26/2014 | ACS Automation of Complex Power Systems | 32Simon Pickartz

Throughput(Fedora 20)

128 1 Ki 8 Ki 65 Ki 512 Ki 4 Mi 32 Mi0

2,000

4,000

6,000

Size in Byte

Thr

ough

put

inM

iB/s

Parastation (GCC 4.4.7, Ivy-Bridge)

NativePass-ThroughSR-IOVLXC

08/26/2014 | ACS Automation of Complex Power Systems | 33Simon Pickartz

Latency(Fedora 20)

Native Pass-Through SR-IOV LXC

InfiniBand Parastation0

0.5

1

1.5

Late

ncy

inµ

s(R

TT

/2)

08/26/2014 | ACS Automation of Complex Power Systems | 34Simon Pickartz