clouds at other sites - uvic.ca
TRANSCRIPT
CloudsatothersitesT2-typecomputing
RandallSobie
UniversityofVictoria
RandallSobieIPP/Victoria 1
Overview
• CloudsareusedinavarietyofwaysforTier-2typecomputing
– MCsimulation,productionandanalysis
– Commercial/private,in-house/distributed
• Motivationforusingclouds
– Easeofuse,reducedmanpowercosts,resourcesharing
– Separationofapplicationandsystemadministration
– Leveragesoftwaredevelopmentbycommercialworld
• Howarecloudsbeingused?
– VMprovisioning,jobmanagement,benchmarks,storage,networking,monitoring
RandallSobieIPP/Victoria 2
CloudcomputinginHEP
RandallSobieIPP/Victoria 3
Opportunistic
DedicatedVirtual
cluster
CloudcomputinginHEPistypicallyproviding5-20%oftheprocessingofcurrentprojects
“Dedicated”clouds(OwnedbyHEP)
“Opportunistic”clouds(privateandcommercial)
Clouddeployments
RandallSobieIPP/Victoria 4
Traditionalbare-metal
Specificpurposecloud(e.g..LTDABaBar,HLTclouds)
Standalone/privatecloud(e.g.PNNL,NorduGrid)
Bare-metalorin-housecloudwithexternalcloud(e.g..CERN,BNL)
Distributedclouds(e.g.UK,Canada,Australia,INFNClouds)
Examplesofclouddeployments(meanttoillustrateouruseofclouds)
RandallSobieIPP/Victoria 5
RandallSobieIPP/Victoria 6
Research Cloud
CREAM CE
Dynamic Torque
TORQUE + Maui
TORQUE + Maui
control VMs
distribute jobs via SSH
(Currently 700 cores)
14,000 HEPSpec ~ (1400 cores)
Australia-ATLAs Tier 2
(Belle II) LCG.Melbourne.au
Australian Belle II Grid Site
Dynamic Torque
SingleCREAMCEservicesATLASTier-2(Torque)andBelleIIsite(DynamicTorque)
RandallSobieIPP/Victoria 7
Why private cloud?
Chosen for flexibility, efficient use of compute resources for services Provides easy load-balancing and availability features Provides templating features Easy re-use of templates to test and instantiate new server instances Non-systems staff can provision their own instances of services Software Defined Networking is more malleable than physical
networking, encourages better networking practices, including security
Lessons learned
VM’s and/or containers provide needed flexibility to support multiple collaborations and different user needs Ceph storage is very robust and flexible
VM’s impose a 15%-20% performance penalty on HEP compute workload without careful tuning Move to containers on bare metal planned
OpenStack features do not help us make sure a certain number of instances are up and healthy and consistent
Kubernetes looks appealing in this respect
RandallSobieIPP/Victoria 8
GridPP(P.Love/A.McNab)UniversityOpenstackinstances• CloudsatHEPinstitutions(Oxford/Imperial).• ECDFcloudinEdinburghhasrecentlymadeavailabletotheHEPUKVacuumdeployments• Keytoourlight-weightTier-2strategywhereweoperatewithminimal
manpoweratthesite(<1000cores).
DatacentredcommercialOpenstack• ScaleofaTier-2facility.• Freeaccesstothetheirsystem(ATLAS)whilsttheywerecommissioningthings;
paidforaccesswhenfundsavailable.• NetworkconnectivitytotheUKacademicnetworkisonly1Gbitbuttheyhave
planstoupgrade
RandallSobieIPP/Victoria 9
Italy(INFN;MassimoSgaravattoetal)PrivateOpenStackCloud(Padova-Legnaro)calledCLOUDAREAPADOVANAUsedby~25usergroups/projectthatfinanciallycontributedfortheresourcesBatchprocessing• Relyingontheelastiqframework,HTCondorbatchclustersareinstantiated.
• Thesebatchclustersare'dynamic':newworkernodesareautomaticallyaddedorareremoveddependingonload.
• CMSCloudprojectisintegratedwiththelocalTier-2.• E.g.CMSVMscanaccesstheT2storage(dcache)usingthesamelocal
protocol(dCAP)usedbytheT2WNs.
• PlanstodeploytheSynergyservice,whichallowstomanagetheresourceallocationusingafair-shareapproach,withoutastaticpartitioningofsuchresourcesamongtherelevantusercommunities.
RandallSobieIPP/Victoria 10
NorduGrid
RandallSobieIPP/Victoria 11
BernSwitzerlandSWITCHengines–SwissNRENcommercialcloud(OpenStack)(freeduringdevelopmentphase)
RandallSobieIPP/Victoria 12
CloudSchedulerCloudScheduler
HTCondor
Job
VM
ComputeCloud
VMImage
Repository
Starts job
Start VMs
Submit user script
CanadaDistributedcloudsystemforATLASandBelleII• IntegratedintoPanda/DIRAC• Inproductionfor3-4years• AlsousedbyCanadianastronomy
• uCernVM,CVMFS,Squid-discovery(Shoal)• DistributedVMimagerepository• Datawrittentolocalstorageandtransferred• BenchmarksrunatVMboot• VMtimemeasurementsforaccounting• Reasonablemonitoring
• UpdatingsystemforOpenNebula• Studyingdatafederations(e.g.Dynafed)• Context-awareness
• Challengesincludemanagingresourcesacrossmanyadministrativedomains
10-15cloudsmanagedbyHTCondor/CloudScheduler(4000-5000cores)800-1000cores(each)EC2/Azure(Egressfeeswaived)
RandallSobieIPP/Victoria 13
Cloudresources10clouds4300cores
CanadianWLCG“cloud”–includesAustralianT2FridayOctober6
Jobscheduling/VMprovisioning
• VarietyofmethodsforrunningHEPworkloadsonclouds
– VM-DIRAC(LHCbandBelleII)
– VAC/Vcycle(UK)
– HTCondor/CloudScheduler(Canada)
– HTC/GlideinWMS(FNAL),HTC/VM(PNNL),HTC/APR(BNL)
– Dynamic-Torque(Australia)
– CloudAreaPadovana(INFN)
– ARC(NorduGrid)
• Eachmethodhasitsownmeritsandoftenwasdesignedtointegrated
cloudsintoanexistinginfrastructure(e.g.local,WLCGandexperiment)
RandallSobieIPP/Victoria 14
Commercialandprivateclouds
• Commercialclouduse
– PrimarilyAmazonEC2andMicrosoftAzure(withgrants)
– ATLASdiscussinguseofGCE
– OthercommercialOpenStackclouds
• DataCentred(UK),SWITCHengines(Switzerland)
– CERNcommercialcloudprocurement
• Privateclouds
– OpenStackandOpenNebularesearch-fundedcloudsbutnotinvolvedinHEP
RandallSobieIPP/Victoria 15
Networkconnectivity
• AmazonandMicrosoftcloudsareconnectedtotheresearchnetworksin
NorthAmerica(probablyGCEaswell)
– Egresschargescanbewaiveduponrequest
• Trans-borderortrans-oceantrafficcanbeanissue
– BecomeanimportantdiscussiontopicintheLHCONEmeetings
• Privateopportunisticclouds
– trafficflowsoverresearchnetworkbutnotLHCONEnetwork
RandallSobieIPP/Victoria 16
RandallSobieIPP/Victoria 17
CPUBenchmarks
Newsuiteof“fast”benchmarks
– HEPiXBenchmarkWorkingGroup
– Suiteavailableincludes“fastHS”(LHCb)andWhetstonebenchmarks
• WritetoElasticSearchDB
– RunbenchmarksinthepilotjoborduringthebootoftheVM
Datastorage
– DatawrittentolocalstorageonnodeandthentransferredtoselectedSE
– UKgrouphasdonesomeworkintegratingtheirobjectstorewithATLAS
– BNLusingS3storageonEC2forT2-SE
Monitoring
RandallSobieIPP/Victoria 18
CloudSystemmonitorSensu,Munin,RabbitMQ,Mongo-DB,Ganglia
ApplicationmonitorPandamonitoring
Application
BenchmarksandaccountingElasticSearchDB
Cloudorsitemonitor
Summary
• CloudsatHEPsites
– Typicallyintegratedintoanexistinginfrastructure
– Seenasawaytobettermanagemulti-userresources
– CloudR&Dfundingopportunities
• Opportunisticresearchclouds
– Easywaytoutilizecloudsatnon-HEPresearchcomputingfacilities
– Norequirementforon-siteapplicationspecialistsorcomplexsoftware
• Commercialclouds
– EC2/Azure/GCEdominatebutotherOpenStackclouds
– Grantandsomecontractedresources
– Trans-bordernetworkconnectivitybeingaddressed
RandallSobieIPP/Victoria 19