cloud versus in-house cluster: evaluating amazon cluster compute instances for running mpi...
TRANSCRIPT
![Page 1: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/1.jpg)
Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute
Instancesfor Running MPI Applications
Yan Zhai, Mingliang Liu, Jidong Zhai
Xiaosong Ma, Wenguang Chen
Tsinghua University &
NCSU & ORNL
![Page 2: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/2.jpg)
HPC in cloud?
Cloud service viable for HPC applications?Yes
Mostly for loosely-coupled codes
Has cloud grabbed majority of HPC users’ mind?No
For tightly-coupled codes, Performance still major concern
Lower performance -> higher cost
![Page 3: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/3.jpg)
Amazon EC2 CCI
Emerging of the high performance cloud like Amazon EC2 CCI (Cluster Computing Instance)High end computation hardwareExclusive resource usage Updated inter connection (10GbE network)Has CCI
changed cloud HPC landscape?
![Page 4: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/4.jpg)
Our work
Several months of evaluating EC2 CCIComprehensive performance and cost evaluations
Focused on tightly coupled MPI programs Micro, macro benchmarks, and real world applications
Exploring IO configurability issues
![Page 5: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/5.jpg)
Outline
Background & Motivation
Evaluation and observationsWill HPC cloud save you money?Application performance resultsWish list to cloud service providers
Conclusion
![Page 6: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/6.jpg)
Will HPC cloud save you money?
Cost: driving factor for going for cloud
Cloud vs. in-house clusterPay-as-you-go vs. fixed hardware investmentWorkload-dependent decision
Relative performance of individual applications Mixture of applications Expected utilization level of in-house cluster
![Page 7: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/7.jpg)
Runtime performance
Cloud and 16-node in-house cluster configuration:
Cloud Local
CPU Xeon X5570 (8 cores each)
Xeon X5670(12 cores each)
Memory 23GB 48GB
Network 10GbE QDR Infiniband
FS NFS NFS
OS Amazon Linux AMI 2011.02.1
RHEL 5.5
Virtualization Para-virtualization No
![Page 8: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/8.jpg)
Selected applications
GRAPES [1] (weather simulation) CPU- and memory-intensive Moderate communication
MPI-Blast [2] (biological sequence matching) Large input Relatively little communication
POP [3](ocean modeling) Communication-intensive Large number of small messages
![Page 9: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/9.jpg)
GRAPES results
32 64 1280
50100150200250300350400450
LOCALCLOUDTi
me(s
)
Process number
![Page 10: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/10.jpg)
MPI-Blast results
18 34 66 1300
50100150200250300350400450500
LOCALCLOUDTi
me(s
)
Process number
![Page 11: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/11.jpg)
POP results
16 32 64 1280
200
400
600
800
1000
1200
LOCALCLOUDTi
me(s
)
Process number
![Page 12: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/12.jpg)
Performance summary
Cloud offers performance close to in-house cluster For some applications …
Communication still severe concern For communication-heavy apps Major problem: large latency
Similar observation from benchmarking results [4]
NPB class C and D Intel MPI Benchmarks STREAM memory benchmark
![Page 13: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/13.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
![Page 14: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/14.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Effective time elapsed in application
![Page 15: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/15.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Time period before the local cluster becomes out of date
![Page 16: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/16.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Cost of cloud per instance, 1.6$/(hour*instance)
![Page 17: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/17.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Time to finish one job of A in cloud
![Page 18: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/18.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Cost to buy and deploy local cluster
![Page 19: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/19.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Effective time used to run applications
![Page 20: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/20.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Time to finish one job of A in local
![Page 21: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/21.jpg)
Coming back to cost Issue
Local cluster: cost depends on actual utilization level
For given application A, Cloud more cost-effective if
Cost for one job of application A in local side. If right side is larger, then cloud is
more effective
![Page 22: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/22.jpg)
Parameters used in local cluster
Expense item Amount
Dell 5670 Servers (include service)
$6508/node
Infiniband NIC $612/node
Infiniband Switch $6891
SAN with NFS server and RAID5 $36753
Hosting (energy included) $15251/rack/year
Assumed life span 3 year
![Page 23: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/23.jpg)
Utilization rate threshold for applications
GRAPES MPIBLAST POP0
10
20
30
40
50
60
Utilization Rate
Uti
lizati
on R
ate
Thre
shold
(%)
![Page 24: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/24.jpg)
Utilization rate threshold for applications
GRAPES MPIBLAST POP0
10
20
30
40
50
60
Utilization Rate
Uti
lizati
on R
ate
Thre
shold
(%)
This means if you use local cluster more than about 25% to run
GRAPES per year, you’d better stay local
![Page 25: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/25.jpg)
Further considerations in cost
Calculation biased toward local cluster Assumes 24x7 availability in 3 years
No failures, maintenance, holidays …
Labor cost not counted
Cloud provides continuous hardware upgrades Yesterday: Amazon announced
New CCI instances Lowered price for current configuration: $1.60->$1.30
Heavy HPC users may get further cloud discount Reserved instances on AWS
![Page 26: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/26.jpg)
Reduced pricing effect
GRAPES MPIBLAST POP0
10
20
30
40
50
60
Increment for new cloud priceOld utilization rate
Uti
lizati
on R
ate
Thre
shold
(%)
![Page 27: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/27.jpg)
Reserved Instance discount
Use reserved instances for 3-years:$5053 first-pay is required$0.45/(hour * instance) can be enjoyed
Cloud more effective for application A if:
![Page 28: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/28.jpg)
Reserved Instance discount
Use reserved instances for 3-years:$5053 first-pay is required$0.45/(hour * instance) can be enjoyed
Cloud more effective for application A if:3 x 365 x 24
hours
![Page 29: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/29.jpg)
Reserved Instance discount
Use reserved instances for 3-years:$5053 first-pay is required$0.45/(hour * instance) can be enjoyed
Cloud more effective for application A if:
Under a certain utilization rate, the time required for cloud to produce same amount of
jobs as local
![Page 30: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/30.jpg)
Reserved instance discount effect
GRAPES MPIBLAST POP0
10
20
30
40
50
60
Increment for reserved in-stance discountIncrement for new cloud priceOld Utilization Rate
Uti
lizati
on R
ate
Thre
shold
(%)
![Page 31: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/31.jpg)
Summary to cost
Rough steps to evaluate cost effectivenessEstimate local utilization rateShort term run to acquire per job timeCalculate threshold utilization rate If estimate utilization rate > calculated threshold
Local is more cost-effective
Else Cloud is more cost-effective
![Page 32: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/32.jpg)
Our wish list to cloud service providers
Improved network latency
Pre-configured OS image Optimized library for specific cloud platform
More flexible charging Current model designed for commercial servers Fine-granule accounting for clusters
To allow large-scale development and testing
System scale Current upper limit: dozens of nodes
![Page 33: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/33.jpg)
Outline
Background & Motivation
Evaluation and observationsWill HPC cloud save you money?Application performance resultsWish list to cloud service providers
Conclusion
![Page 34: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/34.jpg)
Conclusion
Amazon EC2 CCI becoming competitive choice for HPC Even when running tightly-coupled simulations May deliver similar performance as in-house clusters
Except for codes with heavy communication
Flexibility and elasticity valuable Users may try out different resource types No up-front hardware investment Per user, per-application system software
M. Liu et al., “One Optimized I/O Configuration per HPC application : Leveraging the Configurability of Cloud”, APSys 2011
![Page 35: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/35.jpg)
Acknowledgment
Research sponsored by Intel Collaborators: Bob Kuhn, Scott Macmillan, Nan Qiao
![Page 36: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/36.jpg)
references [1] D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji,
and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientic design. Chinese Science Bulletin, 53(22):3433{3445, 2008.
[2] A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.
[3] LANL. Parallel ocean program (pop). http://climate.lanl.gov/Models/POP, April 2011.
[4] T. University. Technique report. http://www.hpctest.org.cn/resources/cloud.pdf.
![Page 37: Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c735503460f94925ad2/html5/thumbnails/37.jpg)
Thanks!