federated hpc clouds applied to radiation therapy
DESCRIPTION
Presentation delivered in the Research Track at ISC CLOUD'13 at Heidelberg (Germany) on Sep. 24th 2013. It describe the Virtual Cluster Architecture developed during BonFIRE project and the reasons to do it. Some proof-of-concept experiments are also presentedTRANSCRIPT
Federated HPC Clouds applied to
Radiation Therapy
A. Gómez, L.M. Carril, R. Valin,
J.C. Mouriño, C. Cotelo
ISC Cloud‘13, Heidelberg (Germany)
Sep. 23-24th, 2013
Overview
Context.
Virtual Cluster Architecture.
Experiments on BonFIRE.
Conclusions.
The research leading to these results has received funding from the European Commision's
Seventh Framework Programme (FP7/2007-2013) under grant agreement number 257386
Context: eIMRT service
CTs Treatment Results
Results
TPS
Second calculation
Personalized: One patient, one treatment
eIMRT architecture
IaaSSaaS
Workflow based
on Monte Carlo
simulations
eIMRT WorkfloweIMRT code: Prepares inputs
for BEAMnrc MC. Seconds in
master computer
BEAMnrc MC simulations.
Independent jobs on CEs.
eIMRT code: collects outputs and
prepares inputs for DOSXYZnrc
Seconds in master computer
eIMRT code: collects outputs and generates final output.. Seconds in master computer
DOSXYZnrc MC simulations.
Independent jobs on CEs.
SaaS issues
Local cluster: – Could not be enough with many clients.
– Interferences between customer’s requests.
– Shared resources: Time-to-solution not guaranteed.
Grid:– Interferences between clients.
– Shared resources: Time-to-solution not guaranteed.
Cloud:– One treatment, one virtual cluster.
– No interferences between treatments, customers.
– But, How to guarantee the time-to-solution in a multi-tenant out-of-control infrastructure?
IaaS issues for HPC/HTC SaaS
Failures of sites. Needs Fault-tolerant design.
Application Performance Variability between deployments. Needs elasticity.– Different IaaS back-end servers.
– Multi-tenancy. Sharing resources among IaaS
customers.
– Different Cloud providers.
– Evolution of IaaS infrastructure.
J. Schad, et al, Runtime Measurements in the Cloud:
Observing, Analyzing, and Reducing Variance., Proceedings of
the VLDB Endowment, Vol. 3, No. 1, 2010
Proposal: Autonomous Virtual
Cluster Architecture
Virtual Cluster Architecture
Virtual Cluster single site
NFS
Cluster
management:
OGS + custom
scripts
Virtual Cluster-two sites
Fault-tolerant VC two sites
Elasticity Engine
Controls number of CEs based on KeyApplication Performance measurements.
Enlarges the cluster to keep performance and fulfill deadlines.
Decreases size if App. Performance is higher than needed, to decrease costs.
Proof-of-Concept Experiments
BonFIRE Infrastructure
Vendor Freq.
(GHz)
Cores RAM
(GB)
Intel 2.33 2*2 4
AMD 1,7 2*12 48
Intel 2,5 2*4 32
Intel 2.93 2*4 24
INRIA: Vendor Freq. (GHz) Cores RAM
(GB)
Intel 3.2 2*2 2
Intel 2.66 2*2 8
AMD 2.6 4*12 196
AMD 2 2 4
Intel I7 2.53 2 4
Intel I7 2.1 4 8
Intel Atom 1 2
AMD
T56N1.65 2 2
HLRS:
Cloud Manager:
OpenNebula 3.0
DISTRIBUTED VIRTUAL CLUSTER
EXPERIMENT
VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012
Application execution. One vs Two sites
VC Conf.: Distributed VC (_dist)
BonFIRE sites:
– INRIA: Master + CEs
– HLRS: CEs
Deployment time decreases.
App:Two sites faster than one site.
But because second site has better
CPUs.
Impact of deployment ~ 10% total
time.
SPECIFIC DEADLINE OBJECTIVE
EXPERIMENT
VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012
Horizontal elasticity
Monitoring application
performance works.
We have modified software
to produce information more
frequently.
Execution with deadline.
Elasticity works.
FAULT TOLERANCE EXPERIMENT
WITH ELASTICITY
VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012
Virtual Cluster
SYNC
Fault-tolerance
BonFIRE sites:
– HLRS (Master + 4 CEs)
– INRIA (Shadow + 4 CEs)
Demanded performance
(500H/s)
Fault simulated putting HLRS
VMs in CANCEL.
INRIA Shadow took control of
cluster.
Elasticity worked, demanding
more CEs to INRIA.
CONCLUSIONS
VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012
Conclusions
Distributed VC can be used to speed up HTC applications.
Elasticity Engine based on Key Application Performance indicator for HTC works.
High QoS can be provided in VC using distributed VC + elasticity.
BonFIRE provides infrastructure for experiments about new concepts and services on Cloud.