clouds at other sites - uvic.ca

19
Clouds at other sites T2-type computing Randall Sobie University of Victoria Randall Sobie IPP/Victoria 1

Upload: others

Post on 27-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clouds at other sites - uvic.ca

CloudsatothersitesT2-typecomputing

RandallSobie

UniversityofVictoria

RandallSobieIPP/Victoria 1

Page 2: Clouds at other sites - uvic.ca

Overview

•  CloudsareusedinavarietyofwaysforTier-2typecomputing

–  MCsimulation,productionandanalysis

–  Commercial/private,in-house/distributed

•  Motivationforusingclouds

–  Easeofuse,reducedmanpowercosts,resourcesharing

–  Separationofapplicationandsystemadministration

–  Leveragesoftwaredevelopmentbycommercialworld

•  Howarecloudsbeingused?

–  VMprovisioning,jobmanagement,benchmarks,storage,networking,monitoring

RandallSobieIPP/Victoria 2

Page 3: Clouds at other sites - uvic.ca

CloudcomputinginHEP

RandallSobieIPP/Victoria 3

Opportunistic

DedicatedVirtual

cluster

CloudcomputinginHEPistypicallyproviding5-20%oftheprocessingofcurrentprojects

“Dedicated”clouds(OwnedbyHEP)

“Opportunistic”clouds(privateandcommercial)

Page 4: Clouds at other sites - uvic.ca

Clouddeployments

RandallSobieIPP/Victoria 4

Traditionalbare-metal

Specificpurposecloud(e.g..LTDABaBar,HLTclouds)

Standalone/privatecloud(e.g.PNNL,NorduGrid)

Bare-metalorin-housecloudwithexternalcloud(e.g..CERN,BNL)

Distributedclouds(e.g.UK,Canada,Australia,INFNClouds)

Page 5: Clouds at other sites - uvic.ca

Examplesofclouddeployments(meanttoillustrateouruseofclouds)

RandallSobieIPP/Victoria 5

Page 6: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 6

Research Cloud

CREAM CE

Dynamic Torque

TORQUE + Maui

TORQUE + Maui

control VMs

distribute jobs via SSH

(Currently 700 cores)

14,000 HEPSpec ~ (1400 cores)

Australia-ATLAs Tier 2

(Belle II) LCG.Melbourne.au

Australian Belle II Grid Site

Dynamic Torque

SingleCREAMCEservicesATLASTier-2(Torque)andBelleIIsite(DynamicTorque)

Page 7: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 7

Why private cloud?

  Chosen for flexibility, efficient use of compute resources for services   Provides easy load-balancing and availability features   Provides templating features   Easy re-use of templates to test and instantiate new server instances   Non-systems staff can provision their own instances of services   Software Defined Networking is more malleable than physical

networking, encourages better networking practices, including security

Lessons learned

  VM’s and/or containers provide needed flexibility to support multiple collaborations and different user needs Ceph storage is very robust and flexible

  VM’s impose a 15%-20% performance penalty on HEP compute workload without careful tuning   Move to containers on bare metal planned

OpenStack features do not help us make sure a certain number of instances are up and healthy and consistent

Kubernetes looks appealing in this respect

Page 8: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 8

GridPP(P.Love/A.McNab)UniversityOpenstackinstances•  CloudsatHEPinstitutions(Oxford/Imperial).•  ECDFcloudinEdinburghhasrecentlymadeavailabletotheHEPUKVacuumdeployments•  Keytoourlight-weightTier-2strategywhereweoperatewithminimal

manpoweratthesite(<1000cores).

DatacentredcommercialOpenstack•  ScaleofaTier-2facility.•  Freeaccesstothetheirsystem(ATLAS)whilsttheywerecommissioningthings;

paidforaccesswhenfundsavailable.•  NetworkconnectivitytotheUKacademicnetworkisonly1Gbitbuttheyhave

planstoupgrade

Page 9: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 9

Italy(INFN;MassimoSgaravattoetal)PrivateOpenStackCloud(Padova-Legnaro)calledCLOUDAREAPADOVANAUsedby~25usergroups/projectthatfinanciallycontributedfortheresourcesBatchprocessing•  Relyingontheelastiqframework,HTCondorbatchclustersareinstantiated.

•  Thesebatchclustersare'dynamic':newworkernodesareautomaticallyaddedorareremoveddependingonload.

•  CMSCloudprojectisintegratedwiththelocalTier-2.•  E.g.CMSVMscanaccesstheT2storage(dcache)usingthesamelocal

protocol(dCAP)usedbytheT2WNs.

•  PlanstodeploytheSynergyservice,whichallowstomanagetheresourceallocationusingafair-shareapproach,withoutastaticpartitioningofsuchresourcesamongtherelevantusercommunities.

Page 10: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 10

NorduGrid

Page 11: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 11

BernSwitzerlandSWITCHengines–SwissNRENcommercialcloud(OpenStack)(freeduringdevelopmentphase)

Page 12: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 12

CloudSchedulerCloudScheduler

HTCondor

Job

VM

ComputeCloud

VMImage

Repository

Starts job

Start VMs

Submit user script

CanadaDistributedcloudsystemforATLASandBelleII•  IntegratedintoPanda/DIRAC•  Inproductionfor3-4years•  AlsousedbyCanadianastronomy

•  uCernVM,CVMFS,Squid-discovery(Shoal)•  DistributedVMimagerepository•  Datawrittentolocalstorageandtransferred•  BenchmarksrunatVMboot•  VMtimemeasurementsforaccounting•  Reasonablemonitoring

•  UpdatingsystemforOpenNebula•  Studyingdatafederations(e.g.Dynafed)•  Context-awareness

•  Challengesincludemanagingresourcesacrossmanyadministrativedomains

10-15cloudsmanagedbyHTCondor/CloudScheduler(4000-5000cores)800-1000cores(each)EC2/Azure(Egressfeeswaived)

Page 13: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 13

Cloudresources10clouds4300cores

CanadianWLCG“cloud”–includesAustralianT2FridayOctober6

Page 14: Clouds at other sites - uvic.ca

Jobscheduling/VMprovisioning

•  VarietyofmethodsforrunningHEPworkloadsonclouds

–  VM-DIRAC(LHCbandBelleII)

–  VAC/Vcycle(UK)

–  HTCondor/CloudScheduler(Canada)

–  HTC/GlideinWMS(FNAL),HTC/VM(PNNL),HTC/APR(BNL)

–  Dynamic-Torque(Australia)

–  CloudAreaPadovana(INFN)

–  ARC(NorduGrid)

•  Eachmethodhasitsownmeritsandoftenwasdesignedtointegrated

cloudsintoanexistinginfrastructure(e.g.local,WLCGandexperiment)

RandallSobieIPP/Victoria 14

Page 15: Clouds at other sites - uvic.ca

Commercialandprivateclouds

•  Commercialclouduse

–  PrimarilyAmazonEC2andMicrosoftAzure(withgrants)

–  ATLASdiscussinguseofGCE

–  OthercommercialOpenStackclouds

•  DataCentred(UK),SWITCHengines(Switzerland)

–  CERNcommercialcloudprocurement

•  Privateclouds

–  OpenStackandOpenNebularesearch-fundedcloudsbutnotinvolvedinHEP

RandallSobieIPP/Victoria 15

Page 16: Clouds at other sites - uvic.ca

Networkconnectivity

•  AmazonandMicrosoftcloudsareconnectedtotheresearchnetworksin

NorthAmerica(probablyGCEaswell)

–  Egresschargescanbewaiveduponrequest

•  Trans-borderortrans-oceantrafficcanbeanissue

–  BecomeanimportantdiscussiontopicintheLHCONEmeetings

•  Privateopportunisticclouds

–  trafficflowsoverresearchnetworkbutnotLHCONEnetwork

RandallSobieIPP/Victoria 16

Page 17: Clouds at other sites - uvic.ca

RandallSobieIPP/Victoria 17

CPUBenchmarks

Newsuiteof“fast”benchmarks

–  HEPiXBenchmarkWorkingGroup

–  Suiteavailableincludes“fastHS”(LHCb)andWhetstonebenchmarks

•  WritetoElasticSearchDB

–  RunbenchmarksinthepilotjoborduringthebootoftheVM

Datastorage

–  DatawrittentolocalstorageonnodeandthentransferredtoselectedSE

–  UKgrouphasdonesomeworkintegratingtheirobjectstorewithATLAS

–  BNLusingS3storageonEC2forT2-SE

Page 18: Clouds at other sites - uvic.ca

Monitoring

RandallSobieIPP/Victoria 18

CloudSystemmonitorSensu,Munin,RabbitMQ,Mongo-DB,Ganglia

ApplicationmonitorPandamonitoring

Application

BenchmarksandaccountingElasticSearchDB

Cloudorsitemonitor

Page 19: Clouds at other sites - uvic.ca

Summary

•  CloudsatHEPsites

–  Typicallyintegratedintoanexistinginfrastructure

–  Seenasawaytobettermanagemulti-userresources

–  CloudR&Dfundingopportunities

•  Opportunisticresearchclouds

–  Easywaytoutilizecloudsatnon-HEPresearchcomputingfacilities

–  Norequirementforon-siteapplicationspecialistsorcomplexsoftware

•  Commercialclouds

–  EC2/Azure/GCEdominatebutotherOpenStackclouds

–  Grantandsomecontractedresources

–  Trans-bordernetworkconnectivitybeingaddressed

RandallSobieIPP/Victoria 19