federated hpc clouds applied to radiation therapy

Post on 03-Jul-2015

272 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation delivered in the Research Track at ISC CLOUD'13 at Heidelberg (Germany) on Sep. 24th 2013. It describe the Virtual Cluster Architecture developed during BonFIRE project and the reasons to do it. Some proof-of-concept experiments are also presented

TRANSCRIPT

Federated HPC Clouds applied to

Radiation Therapy

A. Gómez, L.M. Carril, R. Valin,

J.C. Mouriño, C. Cotelo

ISC Cloud‘13, Heidelberg (Germany)

Sep. 23-24th, 2013

Overview

Context.

Virtual Cluster Architecture.

Experiments on BonFIRE.

Conclusions.

The research leading to these results has received funding from the European Commision's

Seventh Framework Programme (FP7/2007-2013) under grant agreement number 257386

Context: eIMRT service

CTs Treatment Results

Results

TPS

Second calculation

Personalized: One patient, one treatment

eIMRT architecture

IaaSSaaS

Workflow based

on Monte Carlo

simulations

eIMRT WorkfloweIMRT code: Prepares inputs

for BEAMnrc MC. Seconds in

master computer

BEAMnrc MC simulations.

Independent jobs on CEs.

eIMRT code: collects outputs and

prepares inputs for DOSXYZnrc

Seconds in master computer

eIMRT code: collects outputs and generates final output.. Seconds in master computer

DOSXYZnrc MC simulations.

Independent jobs on CEs.

SaaS issues

Local cluster: – Could not be enough with many clients.

– Interferences between customer’s requests.

– Shared resources: Time-to-solution not guaranteed.

Grid:– Interferences between clients.

– Shared resources: Time-to-solution not guaranteed.

Cloud:– One treatment, one virtual cluster.

– No interferences between treatments, customers.

– But, How to guarantee the time-to-solution in a multi-tenant out-of-control infrastructure?

IaaS issues for HPC/HTC SaaS

Failures of sites. Needs Fault-tolerant design.

Application Performance Variability between deployments. Needs elasticity.– Different IaaS back-end servers.

– Multi-tenancy. Sharing resources among IaaS

customers.

– Different Cloud providers.

– Evolution of IaaS infrastructure.

J. Schad, et al, Runtime Measurements in the Cloud:

Observing, Analyzing, and Reducing Variance., Proceedings of

the VLDB Endowment, Vol. 3, No. 1, 2010

Proposal: Autonomous Virtual

Cluster Architecture

Virtual Cluster Architecture

Virtual Cluster single site

NFS

Cluster

management:

OGS + custom

scripts

Virtual Cluster-two sites

Fault-tolerant VC two sites

Elasticity Engine

Controls number of CEs based on KeyApplication Performance measurements.

Enlarges the cluster to keep performance and fulfill deadlines.

Decreases size if App. Performance is higher than needed, to decrease costs.

Proof-of-Concept Experiments

BonFIRE Infrastructure

Vendor Freq.

(GHz)

Cores RAM

(GB)

Intel 2.33 2*2 4

AMD 1,7 2*12 48

Intel 2,5 2*4 32

Intel 2.93 2*4 24

INRIA: Vendor Freq. (GHz) Cores RAM

(GB)

Intel 3.2 2*2 2

Intel 2.66 2*2 8

AMD 2.6 4*12 196

AMD 2 2 4

Intel I7 2.53 2 4

Intel I7 2.1 4 8

Intel Atom 1 2

AMD

T56N1.65 2 2

HLRS:

Cloud Manager:

OpenNebula 3.0

DISTRIBUTED VIRTUAL CLUSTER

EXPERIMENT

VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012

Application execution. One vs Two sites

VC Conf.: Distributed VC (_dist)

BonFIRE sites:

– INRIA: Master + CEs

– HLRS: CEs

Deployment time decreases.

App:Two sites faster than one site.

But because second site has better

CPUs.

Impact of deployment ~ 10% total

time.

SPECIFIC DEADLINE OBJECTIVE

EXPERIMENT

VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012

Horizontal elasticity

Monitoring application

performance works.

We have modified software

to produce information more

frequently.

Execution with deadline.

Elasticity works.

FAULT TOLERANCE EXPERIMENT

WITH ELASTICITY

VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012

Virtual Cluster

SYNC

Fault-tolerance

BonFIRE sites:

– HLRS (Master + 4 CEs)

– INRIA (Shadow + 4 CEs)

Demanded performance

(500H/s)

Fault simulated putting HLRS

VMs in CANCEL.

INRIA Shadow took control of

cluster.

Elasticity worked, demanding

more CEs to INRIA.

CONCLUSIONS

VCOC, FIRE Engineering Workshop, Ghent, Nov. 6th – 7th 2012

Conclusions

Distributed VC can be used to speed up HTC applications.

Elasticity Engine based on Key Application Performance indicator for HTC works.

High QoS can be provided in VC using distributed VC + elasticity.

BonFIRE provides infrastructure for experiments about new concepts and services on Cloud.

THANKS

Questions?

agomez@cesga.es

top related