evaluation of container virtualized megadock system in distributed computing environment
TRANSCRIPT
![Page 1: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/1.jpg)
Evaluation of Container Virtualized MEGADOCK System
in Distributed Computing Environment
March 23th, 2017SIG BIO 49@Japan Advanced Institute of Science and Technology
Kento Aoyama1,2, Yuki Yamamoto1,2, Masahito Ohue1,3, Yutaka Akiyama1,2,3
1) Department of Computer Science, School of ComputingTokyo Institute of Technology
2) Education Academy of Computational Life Sciences (ACLS)Tokyo Institute of Technology
3) Advanced Computational Drug Discovery Unit, Institute of Innovative ResearchTokyo Institute of Technology
![Page 2: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/2.jpg)
“Docker” 2
https://www.docker.com/what-container
No. of pulled containers from DockerHub
![Page 3: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/3.jpg)
Docker and Bioinformatics 3
A. Paolo, D. Tommaso, A. B. Ramirez, E. Palumbo, C. Notredame, and D.
Gruber, “Benchmark Report : Univa Grid Engine , Nextflow , and Docker
for running Genomic Analysis Workflows.”
Docker Integration Benchmark Report
@Centre for Genomic Regulation
(Barcelona, Spain)
• Univa Grid Engine (Job Scheduler)
• Nextflow (Workflow manager)
• Docker (Linux Container)
• Reproducibility
• Portability
![Page 4: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/4.jpg)
To develop the Container-Native HPC Bioinformatics Application
Using Linux Container
which has …
• Low Dependency on Environment
• High-Performance• Parallel execution performance
• Overhead of virtualization
• Dynamically Scaling
Research Purpose 4
![Page 5: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/5.jpg)
• To evaluate the Performance of Docker Container-Virtualizationin Bioinformatics Application
Target Application
• MEGADOCK[1]
• FFT-grid-based Protein-Protein Docking software
• Multi-threading, Multi-node, Multi-GPU (OpenMP, MPI, GPU)
• Extremely compute intensive workloads
Today’s Report 5
[1] Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-performance protein-protein docking
software for heterogeneous supercomputers”, Bioinformatics, 30(22): 3281-3283, 2014.
![Page 6: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/6.jpg)
BackgroundLinux Container
Docker
Container & Bioinformatics
6
![Page 7: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/7.jpg)
Kernel-Shared Virtualization
• Lightweight : small size, fast deploy, easy sharing
• Performance : few virtualization overhead, faster than VM
Linux Container 7
Hardware
Linux Kernel
Container
App
Bins/Libs
Container
App
Bins/Libs
Hardware
Virtual
Machine
App
Guest
OS
Bins/Libs
Virtual
Machine
App
Guest
OS
Bins/Libs
Hypervisor
Virtual Machines Containers
![Page 8: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/8.jpg)
Linux Container
• virtualizes the host resource as containers• Filesystem, hostname, IPC, PID, Network, User, etc.
• can be used like Virtual Machines
Linux Kernel Features
• Containers are sharing same host kernel
• namespace[1], chroot, cgroup, SELinux, etc.
Container-based Virtualization 8
[1] E. W. Biederman. “Multiple instances of the global Linux namespaces.”,
In Proceedings of the 2006 Ottawa Linux Symposium, 2006.
Machine
Linux Kernel Space
Container
Process
Process
Container
Process
Process
![Page 9: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/9.jpg)
Linux Container – Performance [1] 9
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual
machines and Linux containers,” IEEE International Symposium on Performance Analysis of Systems and
Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)
0.96 1.00 0.98
0.78 0.83
0.99
0.82
0.98
0.00
0.20
0.40
0.60
0.80
1.00
PXZ [MB/s] Linpack [GFLOPS] Random Access [GUPS]
Perf
orm
an
ce R
ati
o
[base
d N
ati
ve]
Native Docker KVM KVM-tuned
![Page 10: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/10.jpg)
Docker [1]
• Most popular Linux Container management platform
• Many useful components and services
Linux Container Management Tools 10
[1] Solomon Hykes and others. “What is Docker?” - https://www.docker.com/what-docker
[2] W. Bhimji, S. Canon, D. Jacobsen, L. Gerhardt, M. Mustafa, and J. Porter, “Shifter : Containers for
HPC,” Cray User Group, pp. 1–12, 2016.
[3] “Singularity” - http://singularity.lbl.gov/
[1]
[2] [3]
![Page 11: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/11.jpg)
Easy container sharing – Docker Hub 11
Portability & Reproducibility• Easy to share the application environment via Docker Hub
• Containers can be executed on other host machine
Ubuntu
Docker Engine
Container
App
Bins/Libs
Image
App
Bins/Libs
Docker Hub
Image
App
Bins/Libs
Push Pull
Dockerfile
apt-get install …
wget …
…
make
CentOS
Docker Engine
Container
App
Bins/Libs
Image
App
Bins/Libs
Generate
Share
![Page 12: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/12.jpg)
AUFS (Advanced multi layered unification filesystem) [1]
• Docker default filesystem as AUFS
• Layers can be reused in other container image
• AUFS helps software Reproducibility
Docker - Filesystem 12
[1] Advanced multi layered unification filesystem. http://aufs.sourceforge.net, 2014.
Docker Container (image)
f49eec89601e 129.5 MB ubuntu:16.04 (base image)
366a03547595 39.85 MB
ef122501292c 133.6 MB
e50c89716342 660.4 KB
tag: beta
tag: version-1.0
tag: version-1.0.2
tag: version-1.25aec9aa5462c 24.17 MB
tag: latest0d3cccd04bdb 6.07 MB
![Page 13: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/13.jpg)
Why in the field of Bioinformatics?
• Types of Applications• Data Analysis, Machine Learning
• MD Simulation, Docking calc. , etc.
• Data-centric workload• Compute : Large
• Data I/O : Case by case
• Communication : Small
• Container performs well on compute-Intensive workload[1]
For Bioinformatics Apps : 1 13
[1] W. Felter, et al. “An updated performance comparison of virtual
machines and Linux containers,” IEEE International Symposium on
Performance Analysis of Systems and Software, pp.171-172, 2015.
![Page 14: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/14.jpg)
Reproducibility• Different version of library can make different result
• e.g.) Genomic analysis pipeline [Paolo, 2016]
Container A’
Container A
Container BContainer A
For Bioinformatics Apps : 2 14
Library A
Application A Application B
version >= 1.2 version < 1.1
Application A
Library version 1.3Result A’
Application A
Library version 1.2Result A
conflict
different
result
Dependency
Isolation
Application
Reproducibility
Dependency conflict• Different application can requires different version of same library
![Page 15: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/15.jpg)
Performance• Few performance overhead
Reproducibility• Dependency Isolation from other applications/libraries
Portability, Generality• Sharing/Porting to other environment
Features for Bioinformatics Apps 15
Features Native VM Container
Performance
ScalabilityGreat Bad Good
Reproducibility Bad Good Great
Portability
GeneralityBad Great Great
![Page 16: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/16.jpg)
Proposed Method
16
![Page 17: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/17.jpg)
MEGADOCK 17
Masahito Ohue, et al. “MEGADOCK 4.0: an ultra-high-
performance protein-protein docking software for
heterogeneous supercomputers”, Bioinformatics,
30(22): 3281-3283, 2014.
High-performance protein-protein interaction predictions
• FFT-grid based docking software
• Extremely compute-intensive
• OpenMP/MPI/GPU support
• Great HPC Performance
![Page 18: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/18.jpg)
Container-based Application Distribution 18
ResourceResource
MEGA
DOCK
Resource
MEGA
DOCK
Add/Remove
Container
Resource
MEGA
DOCK
Add/Remove
Application
Layer
Compute
Resource
Layer
• All application dependencies exist in the Container• Easy-to-test application
• Easy-to-scale size of resources
Test Environment Production Environment
![Page 19: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/19.jpg)
Experiments
19
![Page 20: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/20.jpg)
Experiment IEvaluate container virtualization overhead on Physical Machine
• Physical Machine (single-node) + Docker
• Physical Machine (single-node, GPU) + NVIDIA-Docker
Experiment IIEvaluate container virtualization overhead on Cloud Environment
• Virtual Machines (multi-node) + Docker
• Virtual Machines (multi-node, GPU) + NVIDIA-Docker
Experiments 20
![Page 21: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/21.jpg)
Measurement
• megadock-gpu exec. time
• time command (6 times, median)
Dataset
• 100 pair-pdb (KEGG pathway)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
• GPU : 1 GPU / 1 process / 1 node
Overview of Experiment I 21
Physical Machine
MPI
MPI
MPI
MPI
Physical Machine
Docker
MPI
MPI
MPI
MPI
Physical Machine
GPU
MEGADOCK
GPU
Physical Machine
NVIDIA Docker
MEGADOCK
GPU
GPU
(b)(a)
(d)(c)
Test Case Native Docker
CPU (MPI) (a) (b)
GPU (c) (d)
![Page 22: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/22.jpg)
Hardware/Software Specification 22
Software Env. Physical Machine Docker NVIDIA Docker (GPU)
OS (image) CentOS 7.2.1511 ubuntu:14.04 nvidia/cuda8.0-devel
Linux Kernel 3.10.0 3.10.0 3.10.0
GCC 4.8.5 4.8.4 4.8.4
FFTW 3.3.5 3.3.5 3.3.5
OpenMPI 1.10.0 1.6.5 N/A
Docker Engine 1.12.3 N/A N/A
NVCC 8.0.44 N/A 8.0.44
NVIDIA Docker 1.0.0 rc.3 N/A N/A
NVIDIA Driver 367.48 N/A 367.48
CPU Intel Xeon E5-1630, 3.7 [GHz] ×8 [core]
Memory 32 [GB]
Local SSD 128 [GB]
GPU NVIDIA Tesla K40
![Page 23: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/23.jpg)
Execution time 23
7353.80
1646.09
7850.57
1638.05
0
1500
3000
4500
6000
7500
9000
CPU (MPI) GPU
Tim
e [
sec]
Native Docker
+6.32 % slower
![Page 24: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/24.jpg)
Profile Result (CPU time) 24
Process native [sec] docker [sec] diff Ratio (all)
FFT3D 7.40E+04 7.63E+04 +3.01% 76.84%
MPIDP-Master 8010.98 8325.9 +3.78% 8.38%
Create Voxel 3743.7 3993.29 +6.25% 4.02%
FFT Convolution 3551.08 3576.43 +0.71% 3.60%
Score Sort 2462.61 2459.7 -0.12% 2.48%
Output Detail 2139.94 2225.96 +3.86% 2.24%
Ligand Preparation 1035.51 1849.11 +44.00% 1.86%
MPI_Barrier 236.95 231.05 -2.55% 0.23%
MPI_Init 0.94 4.54 79.30% 0.00%
… … … … …
![Page 25: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/25.jpg)
(a) MEGADOCK-Azure[2]
Measurement
• megadock-dp exec. time
• time command (3 times, median)
Dataset
• ZDOCK benchmark 1.0 [1]
(59 * 59 = 3481 pairs)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
All file input/output in Local SSD
Overview of Experiment II-(a) 25
Virtual
Machine
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
VM
MPI
MPI
MPI
MPI
Master Process
Worker Process
(Other)
[1] R. Chen, et al. “A protein-protein docking benchmark,” Proteins: Structure,
Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.
[2] Masahito Ohue, et al. ”MEGADOCK-Azure: High-performance protein-protein interaction prediction system on Microsoft Azure HPC”, IIBMP2016.
![Page 26: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/26.jpg)
(b) MEGADOCK + Docker on Microsoft Azure
Measurement
• megadock-dp exec. time
• time command (3 times, median)
Dataset
• ZDOCK benchmark 1.0(59 * 59 = 3481 pairs)
Options (OpenMP, OpenMPI)
• MPI : 12 threads / 4 MPI process / 1 node
All file input/output in Local SSD
Docker Swarm
• All Containers in 1 overlay network
Overview of Experiment II-(b) 26
Virtual Machine
Docker
MPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
DockerMPI
MPI
MPI
MPI
Docker Swarm
(Docker Network)
Master Process
Worker Process
(Other)
[1] R. Chen, J. Mintseris, J. Janin, and Z. Weng, “A protein-protein docking benchmark,”Proteins: Structure, Function and Genetics, vol. 52, no. 1, pp. 88-91, 2003.
![Page 27: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/27.jpg)
VM Instance/Software Specification 27
Software Env. Virtual Machine Docker
OS (image) SUSE Linux Enterprise Server 12 ubuntu:14.04
Linux Kernel 3.12.43 3.12.43
GCC 4.8.3 4.8.4
FFTW 3.3.4 3.3.5
OpenMPI 1.10.2 1.6.5
Docker Engine 1.12.6 N/A
VM Instance Standard_D14_v2
CPU Intel Xeon E5-2673, 2.40 [GHz] × 16 [core]
Memory 112 [GB]
Local SSD 800 [GB]
![Page 28: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/28.jpg)
Execution time 28
145,534
25,515
13,132
6,006 4,098
117,219
25,145
12,331
6,344 3,971
0
25,000
50,000
75,000
100,000
125,000
150,000
1 5 10 20 30
Tim
e [
se
c]
# of VMs
VM Docker on VM
May be a measurement mistake
![Page 29: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/29.jpg)
Scalability (Strong Scaling, based VM=1) 29
0
5
10
15
20
25
30
35
40
45
0 100 200 300 400 500
Sp
ee
d-u
p
# of worker cores
Ideal VM Docker on VM
VM=5
VM=1
VM=10
VM=20
VM=30
comparable scalability
![Page 30: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/30.jpg)
Experiment I• MEGADOCK + Docker on Physical Machine
showed 6.32% lower performance.
• Docker can cause 0-4% compute-performance down[1]
• Communications via Docker NAT (Network Address Translation)
• MEGADOCK (GPU) + NVIDIA-Docker on Physical Machineshowed comparable performance to native.• GPU calc. is independent from container virtualization
• Container virtualization has few overhead on memory bandwidth
Experiment II• MEGADOCK + Docker on Microsoft Azure
performed comparable scalability.• Container virtualization overhead is smaller than other cloud environment factor
Result & Discussion 30
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual
machines and Linux containers”, IEEE International Symposium on Performance Analysis of Systems
and Software, pp.171-172, 2015. (IBM Research Report, RC25482 (AUS1407-001), 2014.)
![Page 31: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/31.jpg)
• Performance overhead of Docker container-virtualization is small.• suitable for GPU-accelerated-App and Cloud Environment
• Container-Virtualization can isolate application environment from host environment.• same container image can be used on various machines
• Physical machine on local environment
• Virtual machine on cloud environment
• Docker is useful for computational research work
Conclusion 31
![Page 32: Evaluation of Container Virtualized MEGADOCK System in Distributed Computing Environment](https://reader033.vdocuments.net/reader033/viewer/2022052706/58e4a5b31a28abf5428b6ed7/html5/thumbnails/32.jpg)
Multi-Node & Multi-GPU Evaluation on Cloud• NVIDIA-Docker is not available on Docker Swarm mode
• Kubernetes[1] officially support 1GPU/1node
• (experimental-feature: multi-GPU support)
Container-based Task Distribution• Web-Service-Application like container-based distribution
• easy to scale computing resource
• easy to extends multiple task (e.g. GHOST-MP, MEGADOCK)
Future Work 32
[1] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and
Kubernetes,” acmqueue, vol. 14, no. 1, p. 24, 2016.