cloud computing for research & innovation computing for research & innovation project...

43
Cloud Computing for Research & Innovation Project Directors Group (PDG) 1 Cloud Computing for Research & Innovation David Fergusson, Francis Crick Institute Martin Hamilton, Jisc (editor) Philip Kershaw, STFC, CEDA & JASMIN Steven Newhouse, EMBL-EBI & ELIXIR Jacky Pallas, UCL, Farr Institute & eMedLab Jeremy Yates, STFC DiRAC & SKA

Upload: phungxuyen

Post on 22-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

1

CloudComputingforResearch&Innovation

DavidFergusson,FrancisCrickInstituteMartinHamilton,Jisc(editor)PhilipKershaw,STFC,CEDA&JASMINStevenNewhouse,EMBL-EBI&ELIXIRJackyPallas,UCL,FarrInstitute&eMedLabJeremyYates,STFCDiRAC&SKA

Page 2: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

Contents 2

Contents

ExecutiveSummary.....................................................................................................................................................3

Recommendations......................................................................................................................................................4

1.TheUKNationalE-Infrastructure............................................................................................................................6

2.CloudComputingwithintheUKNationalE-Infrastructure.....................................................................................9

3.Recommendations................................................................................................................................................14

4.DerivedActions.....................................................................................................................................................17

5.Roadmap:A5yearvisionforCloudinUKResearch.............................................................................................20

Acknowledgements...................................................................................................................................................23

AnnexA:WhatistheCloud?Afunctionalview........................................................................................................25

AnnexB:CloudComputingforResearchers.............................................................................................................30

AnnexC:Trust&publiccloud...................................................................................................................................36

AnnexD:NeIandcloud.............................................................................................................................................38

Page 3: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

ExecutiveSummary 3

ExecutiveSummaryToday,researchacrossdiversedomainssuchasphysics,engineering,life-science,theenvironmentandsocialsciencesisbeingdrivenincreasinglybytheabilitytocollect,storeandanalyselargedatasets–socalled‘bigdata’.TheUK’sNationalE-Infrastructure(NeI)needstointegratethroughhigh-bandwidthlow-latencynetworksthecomputational,dataandstorageservicesneededbyresearcherstosupporttheir‘bigdata’analysistorapidlycarryouttheirworld-leadingcollaborativeresearchprogrammes.

AkeycomponentoftheNeIiscloudcomputing–theelastic,on-demandprovisioningofinfrastructure,platformsorsoftware–tomeettheneedsofresearchersfromboththepublicandprivatesectors.Suchahybridmodelrequiresintegrationofpublicsectorinstitutional,communityandnationalresourceswiththoseavailableinternationallyinboththepublicandprivatesector.

GiventhestrategicimportanceoftheNeI,andthegrowingimportanceofcloudcomputingforbigdataanalyticsintheresearchcommunity,membersofthee-InfrastructureProjectDirectorsGroup(PDG),attherequestoftheRCUKNationale-InfrastructureGroupwereaskedtoidentifyasetoftechnicalandpolicyrecommendationsthatwillimprovetheaccessibilityandusabilityofcloudresources-forresearch,teachingandadministration.

ThisreportidentifiesthemajortechnicalandpolicyissuesthatareseentobepreventingwidespreadtakeupofcloudservicesfortheUKacademicandrelatedcommunityandprovidesa5yearroadmaptoinvestigatetheseissuesandprovidecloserintegrationofpublicandprivatesectorresourcestoimprovethecapabilityoftheUKresearchcommunity.DavidFergusson,FrancisCrickInstituteJackyPallas,UCL&eMedLabMartinHamilton,Jisc(editor)PhilipKershaw,STFC,CEDA&JASMINStevenNewhouse,EMBL-EBI&ELIXIRJeremyYates,STFCDiRAC&SKA(Weareindebtedtoanumberofcolleaguesfortheircontributionstothisreport–pleaseseetheAcknowledgementssectionforafulllistofacknowledgements.)

Page 4: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

Recommendations 4

Recommendations

1. ProvideaclearcommunityfocusforcloudcomputingwithintheUKNeIbyestablishingaCloudComputingWorkingGroupthatreportsdirectlytotheRCUKNeIGroup.

ACloud-SpecialInterestGroupshouldalsobesetuptoprovideaforumforallmembersoftheUKresearchcommunity(bothconsumersandproviders)tocometogethertoexchangebestpracticeandexperiences.TheSIGwouldbeabletoprovideagrassrootsviewfromacrossallresearchdomainsastofuturecloudcomputingneeds,whichcanthenbetakenupandturnedintoastrategybytheWorkingGroupandcommunicatedtoRCUKNeIGroup.TheCloudComputingWorkingGroupwouldbeabletoestablishsmallerfocusedworkinggroupstodiscussspecificissuesasrequired.

AnimportanttaskofthisgroupwouldbetoexertinfluenceontheprovidersofcloudsoftwareandscientificsoftwarevendorstoincludetherequirementsoftheUKresearchcommunityinfuturereleases.

2. ProvideaminimaltechnicalintegrationoftheUKNeIresourcestopromoteworkloadmobilityandtoreducetechnicalbarrierstoentry.

ThesetechnicalbarrierscanbereducedbyestablishingaconsistentaccessmodelacrossallUKNeIresources.Theseshouldbeimplementedsoastoenableresearcherstouseasinglesetofidentitycredentialstoaccessservicesattheirhomeinstitutionandtoaccessandmovetheirworkloadsbetweenlocal,regional,nationalandinternationalcloudcomputingresourcesastheyrequire.Anintegratedauthenticationandauthorisationinfrastructure(AAI)isneededalongsideconsistentopeninterfacestotheresourcessoastomakeworkloadsmobilebetweendifferentcloudcomputingproviders.Thiswillfostercompetitionandpreventlock-in.

ConsiderableinvestmenthasalreadybeenmadeinfederatedAAIinitiativessuchasMoonshot(whichisnowavailableastheJiscAssentservice),butfurtherinvestmentisneededtointegratetheseAAItechnologiesintocloudcomputingplatformsandtomakeNeIresources(andthoseintheprivatesector)available.

Virtualresearchenvironmentshaveanimportantroletoplayenablingresearcherstocreatesoftwareenvironmentstailoredtotheneedsoftheirapplicationdomainandusercommunities.Atthesametimehowever,thereisaneedtofacilitatethesharingofapplicationenvironmentstoenableworkloadstobeeasilydeployedonanyUKNeIcompliantcloud.ContainertechnologiessuchasDockerareanimportantenablerforthedevelopmentofthiscapability.

3. EquippingtheresearchcommunitywiththerightskillsandsupporttofullyexploitUKNeIcloudresources.

ManyNeIserviceprovidersarefindingitchallengingtofindstaffwiththerightskillstooperatetheircloudinfrastructuresandtoprovidetheconsultancynecessaryforresearchgroupstorapidlyandsuccessfullyexploit

Page 5: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

Recommendations 5

theavailablecloudresources.RCUKneedstoinvestinbothbasicandadvancedtrainingforserviceprovidersandthoseworkingdirectlywithresearcherstosupporttheirmovetocloudcomputingresources.Investinginstaffworkinginthesefieldsprovidesthemwithskillsthatarepotentiallyverytransferrableintoprivatesectoraspartofnormalstaffmigration.Inparticular,trainingneedstobegiventothoseworkingtosupportresearchersinaccessingcloudinfrastructures–mostlikelytheITSupportgroupsataninstitution.RecentexperiencewiththedeploymentofprivateandcommunitycloudsintheUKresearchcommunityhashighlightedtheneedforthedevopsrole,apersonwhoseskillsetbridgesthetraditionalsystemadministrationandsoftwaredeveloperroles.

4. PolicychangesneededwithinRCUKtogrowtheadoptionofcloudcomputingandthepolicyactionsthatRCUKcaninitiateexternallyonbehalfoftheUKcloudcomputingcommunity.

Cloudcomputingisacquiredonanon-demandbasisasrequired(operatingexpense)asopposedtoaninitialupfrontpayment(capitalexpense).RCUKfundingmodelsneedtoadapttoreflectthischangeforbothresearchersandcommunityserviceproviderswhomayconsumecommercialcloudresourcesinahybridmodel.Outsideofthepublicsector,RCUKmustcontinueanddeveloptheactivitiesthathavebeeninitiatedthroughJiscinestablishingtermsandconditionswithcommercialcloudproviderstoexplorehowthebuyingpowerofthecommunityasagroupcanmakepurchasingoftheseservicesmoreeffective,efficientandproductive.

Page 6: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

1.TheUKNationalE-Infrastructure 6

1.TheUKNationalE-Infrastructure

1.1 ContextThecontextfortheNationalE-InfrastructureislaidoutinAStrategicVisionforUKe-Infrastructurearoadmapforthedevelopmentanduseofadvancedcomputing,dataandnetworks1(Tildesley,2012)

ThemainrecommendationsfromtheTildesleyreportaresummarisedasfollows:

1. Createaten-yearroadmaptodefinethecomponentsoftheinfrastructure:networks;dataandstorage;compute;softwareandalgorithms;securityandauthentication;peopleandskills.

2. Createsecuredataandinformationstoresinstrategiclocationswithdata-analysisprovidedthroughcloudenvironments,workingwithopensourcesoftware.

3. EnsurethatimportantpublicdatabasesareavailabletoallUKresearchers4. Providebroadaccesstotheinfrastructureforindustrialpartners,suppliersandIndependentSoftware

Vendors(ISVs),aswellastheacademiccommunity.5. Assistthedevelopmentofaportfoliooftrainingmodulesincomputationalscience,numerical

algorithmsgrid-computing,parallelprogramming,cloudcomputing,data-centriccomputing,e-science,computeranimationandcomputergraphics.

6. Developasinglecoordinatingbodytodriveclosercooperationandenableeffectiveindustrialaccess,whileinsuringthatUKacademehasaccesstoleadingedgecapability.

InvestmentsbyBIS,theResearchCouncilsandHEIsin2011-12(£160M),2012-2013(£189M)and2014-15(£257M)haveresultedincoreelementsofthisvision(shownaboveinbold)beingputinplace2.

1.2 DefinitionTheNationalE-Infrastructure(NeI)arethoseresources,linkedbyhighbandwidthnetworks,whichprovideUKresearcherswiththecomputational,dataandstorageservicestheyneedtocarryouttheirworld-leadingcollaborativeresearchprogrammes.Inthefuture,theseresources,whichmaybelocatedinboththepublicandprivatesector,needtobeintegratedtoallowresearcherstoaccessmuchgreatercapacityandcapabilitythantheypossiblylocateintheirowninstitutions.

AnintegratedNeIwillallowresearchersfromalldisciplinesto:

1https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/32499/12-517-strategic-vision-for-uk-e-infrastructure.pdf

2https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/249474/bis-13-1178-e-infrastructure-the-ecosystem-for-innovation-one-year-on.pdf

Page 7: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

1.TheUKNationalE-Infrastructure 7

• Expandthecomplexityoftheirsimulations/analysistoproducebetterscience.• Decreasethetimeneededtoobtainthesescienceresults.

• Provideatechnologyplatformthatwillsupportinnovationinboththepublicandprivatesector.

TheRCUKNationale-InfrastructureGroup(Chair,Morrell),comprisedofthe7researchcouncils,IUK,JiscandtheMetOffice,providesstrategicoversightoftheactivitiesoftheNeIProjectsthroughtheNeIProjectDirector’sGroup.TheNeIProjectDirectorsGroup(Chair,Yates),whichcomprisesrepresentativesfromNeIProjectsandProviders(seeAnnexD)andrepresentativesofInnovateUK,JiscandtheHPC-SIG,isthebodychargedwithintegratingtheprojectsandprovidersinasuchawaythatanauthorisedresearchercanaccessher/hisresourcesviaasimpleinterfaceandusingonlyonesetofcredentials.

Thereareover20LargeandSpecialistprojectsthathavebeensetupwithintheNeIecosystemincludingtheJanet6highbandwidthlowlatencynetwork,dataandcomputeservicesforsocialandeconomicscience,genomicsinlifesciencesandmedicine,climateresearch,particlephysicsandinnovation,andcomputeservicesinparticlephysics.Inadditionthereareover35HEIsprovidingservicestotheirresidentacademics.

ThecurrentstateoftheNationalE-InfrastructureisdescribedinTheNationalE-Infrastructure2014Survey3(Yates&Hamilton,NeIProjectDirectorsGroup,2014)anditsfutureevolutionisdescribedin:TheE-infrastructureRoadmap4(Morrell,Chair,RCUKNeIGroup,2014).

1.3 RequirementsInordertoestablisharesearchercentricinfrastructureitisnecessaryfor:

• Theresearchertodiscoverandaccesstoabroadsuiteofintegratedcompute,dataandstorageresourcesthatareaccessedbyahigh-bandwidthlow-latencynetwork.

• Todevelopvirtualresearchenvironmentstomorereadilyenableresearcherstotransitionfromdesktopapplicationexperiencetoservicesdeployedonremoteinfrastructures.Thisisessentialforthelongtailofscienceresearchtoeffectivelyexploitbigdata.

• Avirtualresearchenvironmentthatallowsresearcherstodiscovertheavailableandaccessibleresources,tomovedatabetweenresources,torunreproducibleandpublishableworkflowsthatsupportsopenscienceandopendata.

• AcommonUKAuthenticationInfrastructurethatisinteroperablewithinternationalidentitymanagementinfrastructures,soallowingtheusertouseNeIresourcesusingasinglesetofidentitycredentials.

• AnAuthorisationandAllocation/AccountingInfrastructurethatallowsresearchdomainsandprojectstoauthoriseresearcherstouseappropriateresources,allocatethoseresourcesandmeasuretheirusage.

3http://hpc-sig.org/?wpdmdl=4924https://www.epsrc.ac.uk/newsevents/pubs/e-infrastructure-roadmap/

Page 8: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

1.TheUKNationalE-Infrastructure 8

• Asecurenetworkandstorageenvironmentthatcanofferatrestandinflightinformationassurancetothoseresearchprojects/communitiesthathavedatasecurityconcerns.

Page 9: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

2.CloudComputingwithintheUKNationalE-Infrastructure 9

2.CloudComputingwithintheUKNationalE-Infrastructure

2.1 ImportanceofCloudComputingforResearchersCloudcomputinghasenormouspotentialasanenablingtechnologyfortheresearchcommunity.Computingresourcescanbeprovisionedondemandonasneededbasisandcanbemadeelastictogrowandshrinkasagivenworkloadrequires.Perhapsmostimportant,itprovidesakeysolutiontothechallengeofbigdatabybringinguserstothedata.Inthismodel,cloudenablestheprovisionofvirtualcomputingenvironmentsco-locatedwithcentrescapableofhostingthehugeamountsofdataassociatedwithmanyresearchactivities.Thesecanprovidevastlymorecomputingcapabilitythanavailableinusers’homeinstitutionsandavoidtheneedforthetransferoflargedatavolumes.

Activitiesarecategorisedbydeploymentmodel(private,public,communityorhybridclouds)andbytheservicetheyoffer(infrastructure,platformorsoftware)totheuser–SeeAnnexAformoredetails.Manycommercialcloudcomputingservicesthatarenowofferedtoordinaryconsumerscanbenefitmembersoftheresearchcommunity–SeeAnnexBformoredetails.

2.2 CurrentStatusTheCloudComputingWorkingGroupwasestablishedinAugust2013followinganactionfromtheNationale-InfrastructureProjectDirectorsGroup.ItsroleistofostercollaborationandestablishbestpracticefortheapplicationofCloudComputingintheUKresearchcommunity.Theexecutivesummaryprovidedbytheworkinggroup(Sept2013)identifiedanumberofareasthatneedtobeaddressed:

• Thereisnoobviousentrypointforresearchusers,co-ordinatedandtrainingandguidanceisneeded

• Anumberofcentreshaveplansto(orareintheprocessof)rollingoutPrivateCloudinfrastructures.Thereisaneedtoco-ordinateandshareexperiencebetweengroupstoestablishbestpracticeandavoidfragmentationandduplication

• ThereisaneedtoengageandworkwithcommercialPublicCloudprovidersandco-ordinateusagewithpartnersintheresearchcommunity–intermsoftechnologyandpolicy.

AworkshopwasorganisedinNovember20135tobringtogethermembersoftheresearchcommunityandCloudcomputingexpertsandpractitionerstoelicitfeedbackonthefindingsoftheexecutivesummary.Theworkshop

5https://indico.cern.ch/event/281517/timetable/#all

Page 10: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

2.CloudComputingwithintheUKNationalE-Infrastructure 10

showedthatthereisastronglatentinterestintheresearchcommunitytobebetterinformedandguidedonhowbesttomakepracticaluseofthetechnology.

Presentationswereco-ordinatedaroundfourkeythemes:useofPublicCloud,deploymentofPrivateCloudbye-infrastructurefacilities,cloudfederationandbrokering–thebridgingtogetherofresourcesfromdifferentCloudproviders.

Table1liststhemaincloudprojectsfundedbyUKorEUagenciesthatarerelevanttoresearchersintheUK.Withtheincreasingcross-bordernatureofourresearchcollaborationsandresearchprojectstheUKhasanopportunitytoplayaleadingroleinsuchinitiativesandourowndomesticcloudsmustinteroperatewiththeselargercloudinfrastructures.

Table1:AsampleofUKandEUfundedCloudProjects

Project CloudTechnology MainFunction

EGI(EU) Various ProvideasinglefederatedcloudservicefrommanyindividualinfrastructureprovidersfromacrossEurope.

EMBL-EBI–EmbassyCloud VMware DataAnalysisforBioinformatics.

EUDAT(EU) OwnCloud Provideastandardsetofservicesformovementandstorageofdatabuiltontopofcloudinfrastructure

WorldLHCComputeGRID(STFC) Openstack(CMS) ComputationalandDataServicesforLHC

SquareKilometreArray(STFC) Openstack ComputationalandDataServicesfortheSKA

HelixNebula(EU) VariousCommercialandOpenSource

CreateafederationofprovidersandamarketplaceforEuropeanscientificapplicationdomain

JASMIN2(NERC) VMware Dataanalysisenvironmentfortheenvironmentalsciencescommunity

CLIMB(MRC) Openstack ComputationalandDataServicesforMicrobialDNAanalysis

eMedLab(MRC) Openstack ComputationalandDataServicesforHumanDNAanddiseaseanalysis

Cambridge-AWSLink AWS TestforHybridCloudjobsubmission

NECTAR(Australia) Openstack AustraliangovernmentfundedcloudtosupportAustralianresearchcommunity6

EUT0 Unknown Proposaltocreateahubofknowledgeandexpertisetocoordinatetechnologicaldevelopmentstomeetthe

6https://www.nectar.org.au/research-cloud

Page 11: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

2.CloudComputingwithintheUKNationalE-Infrastructure 11

Project CloudTechnology MainFunction

e-InfrastructureneedsofdifferentScientificCommunities7

EuropeanOpenScienceCloud Unknown ProposalbyCERNendorsedbyotherresearchlaboratoriestoestablishahybridITasaServicetothepublicresearchsectorinEurope.

2.3 NextStepsTherecentPDGreportImagingtheUKNationalDataInfrastructure8suggestedthattheuseofCloudTechnologiesisapre-requisiteforthecreationofacoherentNationalDataInfrastructure.

TheSelf-ServicenatureofCloudandtheabilitytoorchestrateresourcesmakethistechnologyparticularlyrelevanttodatadrivenscience.

Cloudiswidelyseenasthenext-generationITdeliverymodel

• Agile&Flexible

• Utility-basedon-demandconsumption

• Self-servicedrivingdownadministrativeoverheadandmaintenance

PubliccloudsaresettingthebenchmarkofhowITcouldbedeliveredtousers

• Howevernotallorganisationsand/orworkflowsarereadyforpubliccloud

Applicationsarebeingwrittendifferentlytoday

• Moretolerantoffailure

• Makinguseofscale-outarchitecture

Ourdataistoolarge

• Volumesofdataarebeinggeneratedatunprecedentedlevels

• Mostofthisdataisunstructured

Servicerequestsaretoolarge

• Thetimetosciencecouldgetmuchlongerwithoutaccesstoelasticresources

• Moreandmoredevicesarecomingonline

7http://www.eu-t0.eu/8https://www.scribd.com/doc/260531862/Imagining-the-UK-National-Data-Infrastructure-Recommendations

Page 12: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

2.CloudComputingwithintheUKNationalE-Infrastructure 12

• Tablets,phones,laptops,BYODgeneration…

Crucially,applicationsweren’twrittentocopewiththedemand!

• Traditionalinfrastructurecapabilitiesarebeingexhausted

• Serviceuptime,QoS,KPI’sandSLA’sareslipping

2.4 TheCurrentCloudPictureintheNeIThefragmentedandsilo-ednatureoftheNeIatfirstsightsuggeststhattheresourcescouldbemoreeffectivelyusedusingcloudtechnologies.

IntheoryworkloadscouldbedistributedamongthemanysystemsoftheNeI.

Howeverseveralproblemspresentthemselves:

• TheassetsoftheNeIareownedbyparticularresearchdomainsandprojects.o Thereisnoincentivetoshareresourcesandnoincentivetoconsolidateresources.

• Thereisnocommonidentitymanagementsystem.

• ThesparecapacityissimplynotthereintheNeI.o Elasticity,whichshouldbeabenefitofcloud,isnotpresentinaveryfullNeI.

• Cloudisanewtechnologyandmanyofoursystemsstaffandusersareunfamiliarwithit.

• ThereisnofinancialmodeloftheNeI.o Thereisnoresourcebrokeringservice.

• Thereisalreadycommunitythatisveryhardtosizethatmakesuseofpubliccloudprovision.o ThissuggeststheNeIisnotworkingforthesepeople.

Progress,however,hasbeenmade:

• AAAIhasbeeninvestedinwithsignificantbuy-infromacrosstheResearchCouncils• Jischavecreatedrelationshipswithpubliccloudproviders,suchasAWSandGoogle,tocreate

managementportalstotheseservices.• JischavepartneredwithMicrosoft9andAWS10todirectlypeertheJanetnetworkwiththeirdatacentres,

facilitatingtheadoptionofHybridCloudsthatmixinstitutionalandcloudproviderresources.• TheInvestmentsbyBISandtheResearchCouncilshavecreatedasenseofcoherenceandcommunity

thatdidnotexist3yearsago.o ThecreationofPrivateCloudsintheareasoftheEnvironmentalsciences,LHCandSKAdata

processing,andBioinformatics.

9https://www.jisc.ac.uk/news/over-18-million-students-and-staff-to-benefit-from-faster-more-secure-cloud-computing-21-may

10https://www.jisc.ac.uk/amazon-web-services

Page 13: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

2.CloudComputingwithintheUKNationalE-Infrastructure 13

o ThecreationofSoftwareasaService(SaaS)infrastructuressuchastheNERCEnvironmentalWorkbench,theEnvironmentalgenomics–VirtualDesktop.

o DiRACandTheNationalServiceArcher,alreadyshareinfrastructuresandrepresentthePhysicalSciencesandEngineeringgroupingsintheUK.

o DiRACandGridPPhavealreadyswappedprojectsbetweenthemmanuallyinordertomakebestuseofresources.

o Theco-operationoftheSocialScientists,MedicalBioinformaticsandPatientRecordsprojectsinworkingwithJisctoproduceSafeShareisanexampleofhowdifficultprojectssuchasClouduptakecanbemanagedinthefuture.

o TheInfinityDataCentreinSloughhascreatedanenvironmentinwhichco-locationofequipmentisnowpossibleanddesirable.

o TheleadingroleoftheUKinGÉANTandtheEGIshouldmakesurethattheUKbuildsstructuresthatinteroperatewithandleverageresourcesmadeavailableviaEUprojects.

o TheimprovementtoJanetNetworkspeedandlatency,andsecurity,meansthatsystems(anddata)cannowbedistributed.ThishasbeenshowntoworkbyGridPPwhocannowusetheNetworkas“mobile”storageandregularlyaremovingPBsaweekoverJanet.

o ThepossiblefutureabilityoftheJanetNetworkitselftooperateinSelf-ServicemodeisawelcomeadditiontotheClouddeploymentmodel.

• JiscandthePDGhavesurveyedtheattitudeoftheNeIandHEICIOstoCloudtechnologies11• CoursesarebecomingavailabletointroducepeopletotheCloudasanoperatingsystem,includingones

specificallytargetingresearchers,suchasthoserunbyMicrosoftinpartnershipwiththeSoftwareSustainabilityInstituteandOxforde-ResearchCentre12.

11https://www.jisc.ac.uk/news/uk-education-divided-in-its-adoption-of-the-cloud-14-jul-201512http://research.microsoft.com/en-us/projects/azure/training.aspx

Page 14: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

3.Recommendations 14

3.RecommendationsAlthoughanumberofneedshavebeenidentified,themembersoftheCloudWGhavelimitedresourcestoaddressthese.Aco-ordinationrolehasbeenproposedtoassisttheworkoftheworkinggroupchairandcommitteetoorganizeandfacilitatemeetings.

3.1 CommunityBuilding

• SetupaRCUKNeICloudworkinggroupandSIG–governancearrangementstobespecified.Theworkinggroupisresponsibleforthedeliveryofthe(fullyresourced)roadmap.

• LiaisewithgroupsintheresearchcommunitywhoarerollingoutPrivateCloudinterfacestotheire-infrastructure.Shareexperienceandestablishbestpracticeintheuseofsoftwareframeworks,resourcingfordeploymentandoperationsofservices.

• BuildthecommunityintheUK–presentasinglefacetofunds,stakeholdersandvendors.

• Theworkinggroupshouldidentifypilotstobesupported–RCUKbudgetrequestedforcloudinnovationfund.

• Sharetheknowledgeandexperienceofsuccessfulprojectsindifferentsubjectdomainstoensuretransferability.

• Documentexamplesoftheresearchcommunity’sexperiencesofusingpubliccloudprovidingguidanceonwhatworks,andwhatdoesn’tbyseedingactivitybysurveyingresearchers(recentRCUKgrantawardPIs?)ontheiruseofpubliccloud.

3.2 TechnicalIntegration

• Ifitistosucceed,anytechnicalintegrationmusttakeitsleadfromscience-drivenusecases.Clearfocusisneededonwhichinterfacesarerequiredandwhoisthetargetuser,betheyanadministrator,softwaredeveloperorenduserresearcher.

• Co-ordinatewithactivitiesunderwaytobrokerorfederateresourcesfrommultipleCloudprovidersandassiste-Infrastructureprovidersandend-userstobestexploittheseinitiatives.TheseactivitiesincludeJANETCloud,G-Cloud,EGIFederatedCloudTaskForceandHelixNebula.

• TargetsmallamountsoffundingtodevelopUKprioritiesthataren’tbeingmetelsewhere,seedandinfluenceproductevolution.

• HelptheUKtomakethebestuseofcommunityandpublicclouds–e.g.portabilityofapplicationsandworkloads.Thepurposeofthisistoensurethatscientificworkloadscanbeexecutedonthemostefficientandeffectiveinfrastructure.Thegoalisanecosystemofcloudsacrossdomainsandproviders,supportinginter-disciplinaryscience.

Page 15: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

3.Recommendations 15

• Projecttoexplorecommoncloudmanagementportal–linktoAAAI.• ProjecttoexploreemergingcontainertechnologiessuchasDockerasstandard/recommendedapproach

topackagingandportability.Weneedtoidentifyandprioritiseusecasesforapplicationofthistechnology.

• Ratherthanasinglecentralizedresearchcloud,butinsteadlookingforopportunitiestocollaboratefromeffortsalreadyunderwayintheindividualresearchdomainsmakingmoreeffectiveuseofUKinfrastructureandpubliccloud,suchaseMedLabandCLIMB,andsharing/collaboratinge.g.arounddocumentation.

• Costandtimingaspectsofdatastorageandegress–e.g.500TBfromAWSwouldcostroughly$150,000perannumtostore,and$25,000inegresschargestoexportfromthecloud13.Largescaledatatransferrequiredtomoveworkloadsfrome.g.EBItootherfacilitiesrequiresspecialistnetworkinfrastructure.

• Accessandsharingregime-limitationsinCephandSwift,asfocushasbeenonhighperformance.Controlsaroundsharingimagesarealsosomewhatabsent.Howdoweactcollectivelytochangethis?Itwillhappeneventuallybyitself,albeitperhapsmoreslowlythanwewouldlike.

• Weneedtodeveloptheconceptofvirtualresearchenvironments:web-hostedapplicationsthatoffereasytouseinterfacesthatcanbytailoredandcustomisedtomeettheneedsofuserswhoaremoreusedtopointandclickdesktopapplications.

• Weshouldalsooffertoouruserstheoptionofvirtualdesktops.Thesearesimplyvirtualmachinesthataregeneratedandallowtheuserstoruntheirdesktop/laptopapplicationsandworkflowsonmuchmorepowerfulsystems.

• Thiswouldhelpuserstotransitionfromdesktopapplicationsthattheyaremorefamiliarwithtocloudhostedapplications.Thisclassofuserrepresentourmainaccessproblemandtheyactuallywouldcomefromacrosstheresearchspectrum.

• Therewillalwaysbethoseuserswhowanttologinasystemandsubmitjobsfromthecommandline.Thatshouldnotbediscouraged.

3.3TrainingandSupport

• Trainingneedstobecarefullytargetedtotherightclassofusers.Enduserapplicationsarelikelytoremainwithinthescopeofagivenapplicationdomain.However,attheplatformandinfrastructuretiersthereisaneedtotrainandequipadministratorsandanewclassofdevopsusers.

• SupportendusersintheresearchcommunitytoassisttheminhowtouseservicesfromCloudprovidersandco-ordinatewithtraininginitiativesthroughthePDGmembers.

• Forendusers,provideacoherentroutefromdesktopapplicationenvironmentthroughtovirtualenvironmentshostedone-Infrastructure,includingcommunityandpubliccloud.

• AcentralprogrammeofsupportakintoARCHERCSEincludingtraining,adviceandguidanceforexploitingpubliccloud,workingwithcloudproviders.

13http://aws.amazon.com/s3/pricing/

Page 16: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

3.Recommendations 16

• TheResearchSoftwareEngineering(RSE)community14hasapivotalroletoplayinhelpingresearcherstoexploitthepotentialofcloudtechnologiesforresearchandinnovation-forexamplebyportingcommonpackagesandenvironmentstoruninthecloud,andpackagingwithDockerorothersuitablecontainertechnologies.

3.4 PolicyIssues

• LiaisewithPubliccloudprovidersonbehalfofe-infrastructurefacilitiesandend-usersintheareasof:training–howcanusers’bestusepubliccloudforresearch,bestuseoftechnologytofitwithresearchworkloads,policy(SLAs)andfunding.

• Furthercloudbrokerage/peeringworkbyJisc–buildingonAWSmodelwithotherproviders

• Commoninstitutionallevelmanagement/billingportal,discountsforbulkusage,delegationofbudgettoPIs,self-servicemanagementofresearchersbyPI/administrator,toolsfortrackingspending

• Costcharacterisation–refreshingtheworkoftheJisc/EPSRCstudyfrom201115

• Commonapproachforuseofpubliccloudingrantproposals(extendstowiderNeI?)

• Establish“publiccloudasanNeIresource”(Jisccouldbrokeraccesstothis?)• Policystatementonwhatdatacanbeputintocloudproviders,compliancerequirements

14http://www.rse.ac.uk/15https://www.epsrc.ac.uk/research/ourportfolio/themes/researchinfrastructure/subthemes/einfrastructure/cloud/

Page 17: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

4.DerivedActions 17

4.DerivedActions

Tables2and3arealistofsuggestedactions

Table2:ActionsfortheFuturefromtheInfrastructureperspective.

ProduceafinancialmodeloftheNeI.Thisisthefirststepforanyresourcebrokeringmodel.

RCUK

ProducealogicalmapanddescriptionoftheNeIanditsservices

JiscandPDG

AgreeAAAIrolloutacrossPDG

1. AdoptionofappropriateAuthenticationmodel2. AdoptionofAuthorisationandAccountingModel3. NeedtoFederatewithexternalAccessManagement

Provider

PDG,Jisc,andRCUK

AskPDGmemberstosuggestpossibleprivatecloudsandsuggesthowinvestmentscouldbebetterco-ordinated.

Whatcanwedotoconsolidateandbuildeconomiesofscaletohelpmakeourselvesmuchmoreelastic?

PDG

NetworkasaServicemodelsneedtobeunderstoodanddeployed

Jisc

AssessCloudBrokerageModels Jisc

DeploymentofNetworkSecurityservices Jisc

UnderstandhowtheNeIcanbemadeelasticbyresourcepoolinganduseofPublicCloud

PDG

AskRCUKtosetupaCloudWorkingGrouptoinformandguidetheadoptionofCloudintheNeI.

RCUK

Page 18: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

4.DerivedActions 18

ProduceafinancialmodeloftheNeI.Thisisthefirststepforanyresourcebrokeringmodel.

RCUK

CompleteNeItoPublicCloudtestsatCambridge.ThiswillprovideaprototypefortheNeI

CambridgeandAWS

HaveameetingonCloudTechnologiesandgetconsensusontheprojectsandCloudTechnologiestheUKshouldbesupporting.

TheseProjectswillinclude:

1. Suggestionsforsensibledeploymentstacks2. Whichtechnologiesshouldwesupport3. NeIinterfaceswithCloudTechnology–SelfService,

andmetering4. AAAIinterfaceswithCloudTechnology5. UnderstandingwhichareaswillofferSaaS,PaaSand

IaaS6. AroadmapforusingCloudforHPC7. HowcanwedevelopCloudtechnologies,particularlyin

theareasofdatamanagement8. CanwemakeHybridisationeasy

PDG

PublishReportsonNeIandtheCloudandsubmittoRCUKandELC

PDG

Table3:Requirementsfromaresearcherpointofview

Researcherrequirement

TechnicalRequirement Technology Whodoesthis?

SingleSignon AgreedAAAIandinterfacestoCloudTechnology

JiscAssentforAuthentication.

DomainprovidesforAuthorisationandaccounting

Jisc

RCUK

Page 19: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

4.DerivedActions 19

Researcherrequirement

TechnicalRequirement Technology Whodoesthis?

KnowingwhereResourcesare

“cloud”infrastructures

LookingatqueuesthatIamallowedtolookat

VMware,OpenStack,OpenNebulaetc

CloudWG

RCUK

Accessingthoseresources

Portals

ssh

Jisc

RCUK

Collaborativeanalysisandprocessingenvironments,abilitytosharedata,analysisandresultswiththeirpeers

Virtuallaboratories IndividualresearchdomainsdevelopwithknowledgesharingthroughtheCloudWG

BuildingWorkflows Portals Taverna,ApacheMesos,ApacheSpark

DomainResearchersandSoftwareEngineers

Jisc

RCUK

ProducingVirtualimagesandInstances

Accesstoimageandinstancecreationapplications

Portals

e.g.G-Cloud,VMware,OpenStack,OpenNebula,containertechnologiessuchasDocker,etc

DomainPrivateClouds

Jisc

RunningMyworkflow Virtualisation

Portals

Batchingschedulingsoftware,e.g.qsub

VMware,OpenStack,ApacheSpark

Domain

Page 20: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

5.Roadmap:A5yearvisionforCloudinUKResearch. 20

5.Roadmap:A5yearvisionforCloudinUKResearch.Cloudwill,

1. AidtheUKeconomybyallowingUKresearcherstodomorewithlessandprovideacompetitivedatainfrastructureunderpinningUKresearch,cuttingacrossresearchsectorsandallowingindustrytoaccessthesamee-infrastructureanditsresources.ThisshouldincreasetheproductivityoftheUKresearchsectorandUKIndustry.

2. EnablethesharingofresearchandknowledgewithintheUKtoimproveefficiencyandeffectiveness,withimpactthatbenefitstheUKeconomy.

3. ProvideWorldClassinfrastructureforUKresearch.Easilyaccessiblecloudinfrastructuresforeverybody.

4. Meetingthechallengesofdataintensiveresearchandthedevelopmentofdatascience,particularlytheAlanTuringInstitute.

Thiswillbedonebyfollowingaroadmapthatwillallowoure-infrastructuretomakeoptimaluseofcloudtechnologiestodeliverourresearchservices.Byunderstandingwhichcomputeworkloads(e.g.HighPerformanceComputing,HighThroughputComputing,databaseanalysis)couldmovetothepubliccloud,andmaking“privatecloud”facilitiescloudcompatiblewiththepubliccloud,wecanallowtheendusertodefinetheirownroutestoITservicesthroughtoachievetheircomputegoalsandpicktheappropriateserviceprovider(s).

Roadmap

Year1

Page 21: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

21

Task Actions StakeholdersandResource

DataMovement

Replicatelargescaledatafromdatasourcestosinksinasecurefashion,e.g.EBItoeMedLab,EBItoCLIMB

Jisc,eMedLab,CLIMB,EBI.1FTEfor3months

BarrierstousingPublicCloud

Exploreissuesarounduseofpubliccloud,e.g.egresscharging

Jisc,WG,1FTEfor3months

Brokerage Brokeredcommunityagreementwithpubliccloudproviders.Thisrequiresclearconsistent/unifiedstatementfromRCUKabouthow“capital”fundscanbeusedtopurchasepubliccloudcapacity.

Jisc,WG,RCUK,1FTEfor3months

TrackExistingActivities

UKNeICloudprojects,EUProjects,PublicCloudusage Jisc.WG

Year2

Task Actions StakeholdersandResource

BrokerageService

Brokeredcloudfacilitiesstarttobecomeavailabletoresearchers

Jisc/0.3FTEp.a.

Groupbasedaccesstoresourceswithsingleidentity

AAAIimplementationavailabletoNeIProjectsandaAPIisdevelopedtoallowusebyPublicCloudProviders’AAAIinfrastructure.

Workisalreadyinprogress–SafeShareandtheintegrationofAssentandSAFE.ApilothasalreadyallowedABFAB(Assent)tointerfacewithOpenstackKeystone.

Jisc,WG,EPCC,ADRC,Farr,eMedLab.4FTEperannumin2015-2016

Year3

Page 22: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

22

Task Actions StakeholdersandResource

StandardServicesAvailable

GeneralOndemandSelfServiceavailableanoptionforNeIusersforComputeanddataservices

ManagementAPI

Standardisedworkflowimages(orcontainers)

Securedataaccesswithinprivatecloud,includingimprovementstocontainersecurity

WG,Jisc,RCUK,ResearchDomains(2-3FTEfor2015-2106)WG,Jisc(1FTEfor1year)ResearchDomains(ResourceTBD)WG,Jisc(2.5FTEp.a.)

Year4

Task Actions StakeholdersandResource

CloudFederation Federateddataaccessbetweenprivateclouds Jisc,WG,RCUK2FTEfor2016-17

Year5

Task Actions StakeholdersandResource

HybridModelAvailable

Hybridmodelthatletsusersexploitthebestofcommunityspecificprivatecloudandpubliccloudresources,andseamless/efficientmigrationofworkloadanddatabetweenfederatedclouds.Expectationthatcontainerswillplayamajorroleinthis

Jisc,WG,RCUK2FTEfor2017-18

Page 23: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

Acknowledgements 23

AcknowledgementsWewouldliketothankthefollowingpeopleforgivinguptheirtimetoprovideinputintothisreport:NeeloferBanglawala,EPCCStephenBooth,EPCCBrendanBouffler,AmazonWebServicesCraigBox,GooglePeterBoyle,UniversityofEdinburgh/DiRACPeterBraam,UniversityofCambridge/SKA StevenBryen,AmazonWebServicesDavidBritton,UniversityofGlasgow/GridPPDavidChadwick,UniversityofKentTimChown,JiscJeremyColes,CERNIanCollier,STFCDavidColling,ImperialCollegeTimCutts,WellcomeTrustSangerInstituteShaundeWitt,STFCRuedigerDorn,MicrosoftMatthewDovey,JiscTomEddington,NationalOceanographyCentreWilliamFlorance,GooglePaulFretter,NorwichBioScienceInstitutesAndyGrant,AtosScottHamilton,AmazonWebServicesJamesHetherington,UCL/SoftwareSustainabilityInstituteTerryHewitt,STFCHartreeCentreAdamHuffman,FrancisCrickInstituteLaurenceHurst,UniversityofBirminghamEdJackson,AmazonWebServices,ClaireJenner,UCLJensJensen,STFCOwainKenway,UCLIainLarmour,EPSRCGeorgeLeaver,UniversityofManchesterPeterMaccallum,CancerResearchUKCambridgeInstituteSimonMcIntosh-Smith,UniversityofBristolMattMcNeill,Google

Page 24: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

Acknowledgements 24

AleksNenadic,UniversityofManchester/TavernaDanPerry,JiscRobinPinning,UniversityofManchester/N8AlanReal,UniversityofLeeds/HPC-SIG/N8BarakRegev,GoogleThomRödde,GoogleMarkRowlands,AmazonWebServices,AndyRichards,UniversityofOxford/OERC/SES5DavidSalmon,JiscJeremySharp,JiscRhysSmith,JiscKenjiTakeda,MicrosoftJohnTaylor,UniversityofCambridgeAndyTurner,EPCCSimonThompson,UniversityofBirmingham/CLIMBAshVadgama,AWE

Page 25: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexA:WhatistheCloud?Afunctionalview 25

AnnexA:WhatistheCloud?AfunctionalviewThereiswidespreadmisunderstandingabouttheterm‘Cloudcomputing’leadingattimestoaccusationofhypeoratothertimesconflationwithestablishedcomputingparadigmssuchasGridComputing.Anunderstandingoftheconceptsisessentialinordertomakeinformedchoicesabouthowitcanbebestexploitedandinwhatwaysitdiffersorcomplementswhathasgonebefore.

CloudcomputingpowerstheservicesofInternetgiantslikeMicrosoft,GoogleandAmazon.Thattechnologyisnowavailabletoinstitutions,tolearnersandtoresearchers.Thisishugelyempowering,forexamplebyextendingthereachofanindividualfarbeyondwhatwouldhistoricallyhavebeenpossible-forafewtensofdollarsajuniorresearcher,undergraduateorcitizenscientistcancreatetheequivalentofamillionpoundsupercomputerforafewhourstocarryoutacalculation.Cloudcomputingisaportmanteautermencompassingeverythingfrominfrastructureasaservice(essentiallyrentingsomeoneelse’sserverequipment)throughtosoftwareasaservice(typicallywebsitesthatsomeoneelserunsforyou).Inthemiddle,thereisaplatformtierthatprovidesthemicro-servicesthatpowerthelikesofAndroidandiPhoneapps,andalsomanywebdeliveredservices.Butitisalsoimportanttotakeaninformedviewofcloudservices,particularlywhereriskmanagement,switchingcostsandsustainabilityareconcerned.However,atthesametimethereisasignificantamountoffear,uncertaintyanddoubtaboutcloudcomputing-andsomegenuineconcerns.Today’sconceptofcloudcomputinggrewoutoftwoparallelactivitiesthateventuallyconverged-commercialhostingofserversandstorageinprofessionallyrundatacentres,andthetwokeyrealisationsbyleadingInternetfirms.Thesewerethata)theycouldrentoutcapacitythatwouldotherwisebespare,andb)openinguptheirapplicationprogramminginterfaces(APIs)tothirdpartydeveloperswouldmakeitpossibletocreateavibrantecosystemofapplicationsthatnoonecompanycouldlikelyconstructonitsown.Itisalsocommonplaceformobileappstobebuiltuponasubstrateofcloudservices,evenifthisisnotreadilyapparenttotheenduser.

Intheexperienceofauthorsofthisreport,theUSNationalInstituteofStandardandTechnology(NIST)DefinitionofCloudComputing16documentprovidesanexcellentstartingpointtounderstandthistechnology.Foritsdefinitionitsetsoutthreekeyareas:

• EssentialCharacteristics:whatarethepropertiesthatallowustosayonesystemisacloudwhereanotherisnot?Thisiskeytounderstandingcapabilitiesofagivensystemandwhatarethedifferentiatingfactorsthatmakecloudtechnologyunique.

16http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

Page 26: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexA:WhatistheCloud?Afunctionalview 26

• ServiceModels:whatservicesdoesacloudofferandtowhom?Thethreemodelsdescribedcanbeconsideredassuccessiveabstractionaboveunderlyingcomputerhardwareuponwhichacloudruns.Atthelowerlevels,cloudcomputinginterfacesprovideadministratorswithpowerfulcapabilitytodeploywholevirtualcomputinginfrastructures.Attheoppositeendofthescaleendusersusinganapplicationmayhavenoorlittleconceptthatitiscloud-hosted.

• DeploymentModels:toomany‘cloud’maybesynonymouswithlargepubliccloudprovidersbutacloudmaybedeployedinanyofnumberofdeploymentmodelsincludingforexampleaprivatecloudhostedfortheuseofasingleorganisation.Thedeploymentmodelorcombinationofdeploymentmodelsareacriticalconsiderationforgettingthemostfromcloudcomputingtechnologyforanygivenresearchprojectorprogramme.

InthefollowingalltextquotedinitalicsistakenverbatimfromtheNISTdocument.

Cloudcomputingisamodelforenablingubiquitous,convenient,on-demandnetworkaccesstoasharedpoolofconfigurablecomputingresources(e.g.,networks,servers,storage,applications,andservices)thatcanberapidlyprovisionedandreleasedwithminimalmanagementeffortorserviceproviderinteraction.

Thiscloudmodeliscomposedoffiveessentialcharacteristics,threeservicemodels,andfourdeploymentmodels.

A.1EssentialCharacteristicsOn-demandself-service:Aconsumercanunilaterallyprovisioncomputingcapabilities,suchasservertimeandnetworkstorage,asneededautomaticallywithoutrequiringhumaninteractionwitheachserviceprovider.

Broadnetworkaccess:Capabilitiesareavailableoverthenetworkandaccessedthroughstandardmechanismsthatpromoteusebyheterogeneousthinorthickclientplatforms(e.g.,mobilephones,tablets,laptops,andworkstations).

Resourcepooling:Theprovider’scomputingresourcesarepooledtoservemultipleconsumersusingamulti-tenantmodel,withdifferentphysicalandvirtualresourcesdynamicallyassignedandreassignedaccordingtoconsumerdemand.Thereisasenseoflocationindependenceinthatthecustomergenerallyhasnocontrolorknowledgeovertheexactlocationoftheprovidedresourcesbutmaybeabletospecifylocationatahigherlevelofabstraction(e.g.,country,state,ordatacentre).Examplesofresourcesincludestorage,processing,memory,andnetworkbandwidth.

Rapidelasticity:Capabilitiescanbeelasticallyprovisionedandreleased,insomecasesautomatically,toscalerapidlyoutwardandinwardcommensuratewithdemand.Totheconsumer,thecapabilitiesavailableforprovisioningoftenappeartobeunlimitedandcanbeappropriatedinanyquantityatanytime.

Measuredservice:Cloudsystemsautomaticallycontrolandoptimizeresourceusebyleveragingameteringcapabilityatsomelevelofabstractionappropriatetothetypeofservice(e.g.,storage,processing,bandwidth,andactiveuseraccounts).Typicallythisisdoneonapay-per-usercharge-per-usebasis.Resourceusagecanbe

Page 27: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexA:WhatistheCloud?Afunctionalview 27

monitored,controlled,andreported,providingtransparencyforboththeproviderandconsumeroftheutilizedservice.

A.2ServiceModels

Figure1.Cloudcomputingtiers(CC-BYFlickruserPhilWolff)

SoftwareasaService(SaaS):Thecapabilityprovidedtotheconsumeristousetheprovider’sapplicationsrunningonacloudinfrastructure.Theapplicationsareaccessiblefromvariousclientdevicesthrougheitherathinclientinterface,suchasawebbrowser(e.g.,web-basedemail),oraprograminterface.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,operatingsystems,storage,orevenindividualapplicationcapabilities,withthepossibleexceptionoflimiteduser-specificapplicationconfigurationsettings.

SaaSisexemplifiedinthecommercialsectorbythelikesoftheMicrosoftOffice36517andGoogleAppsforEducation18communicationsandcollaborationsuites,andtheSalesForce.com19CustomerRelationshipManagement(CRM)system.Softwarevendorsareincreasinglymovingtodeliveringapplicationsthroughcloudcomputingasthisreducesthefrictionoftakinguptheproduct-andmorecynicallyhelpsthemtoretaincustomers,exploitcustomers’data,andtoupsellcustomerstootherproductsandservices.Intheresearchsector,thereisalonghistoryofweb-basedapplicationsandservicestosupportdistributedusercommunities.ThesecouldbeclassifiedunderthecategoryofSaaS.Inrecentyears,anumberofinitiativeshavegrownaroundtheconceptofVirtualResearchLaboratories,theideaofsharedworkspacesforscientificusertocollaborateandhostedonacloudinfrastructuretoenableaccesstogreatercomputingprocessingcapacityandstoragethatwouldotherwisebepossiblewithadesktopapplication.ExamplesaretheCSIROVirtualLaboratorieshostedontheAustralianresearchcloud,Nectar20.

17http://products.office.com/en-gb/business/Office18https://www.google.com/edu/19http://www.salesforce.com/uk/products/20https://www.nectar.org.au/virtual-laboratories-1

Page 28: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexA:WhatistheCloud?Afunctionalview 28

PlatformasaService(PaaS):Thecapabilityprovidedtotheconsumeristodeployontothecloudinfrastructureconsumer-createdoracquiredapplicationscreatedusingprogramming.languages,libraries,services,andtoolssupportedbytheprovider.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,operatingsystems,orstorage,buthascontroloverthedeployedapplicationsandpossiblyconfigurationsettingsfortheapplication-hostingenvironment.

Thisisessentiallyakitofpartsthatcanbeusedbydeveloperstosimplifytheprocessofbuildinganddeployingapplications.Examplesincludefacilitiesforreliablyhostingapplicationsatscalethathavebeenwrittenincommonprogramminglanguages,suchasAmazonElasticBeanstalk21,MicrosoftAzureWebSites22orGoogleAppEngine23.Providersoftenexposeprogramminginterfacesintotheirownsoftwareandservices,suchastheGoogleMapsAPI24,whichiswidelyusedtointegrateGoogleMapsintothirdpartysitesandservices.Forresearchinfrastructures,PaaSislargelymanifestintheformofcommandlineorvirtualdesktopaccesstoavirtualmachinewhichhasbeenspecificallytailoredwithapplicationsandlibrariesforagivenapplicationdomain.AnexampleisCloudBioLinux25,acustomisedversionofthepopularUbuntuLinuxoperatingsystemdistribution.InfrastructureasaService(IaaS):Thecapabilityprovidedtotheconsumeristoprovisionprocessing,storage,networks,andotherfundamentalcomputingresourceswheretheconsumerisabletodeployandrunarbitrarysoftware,whichcanincludeoperatingsystemsandapplications.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructurebuthascontroloveroperatingsystems,storage,anddeployedapplications;andpossiblylimitedcontrolofselectnetworkingcomponents(e.g.,hostfirewalls).

ThisisexemplifiedbythelikesofAmazonWebServices26(AWS),GoogleComputeEngine27andMicrosoftAzureVirtualMachines28.Eachoftheseservicesgivesyounearinstantaccesstovirtualmachineshostedinoneofthecloudproviders’datacentres,pre-loadedwiththeoperatingsystemandoftentheapplicationsoftwareyourequire.IntheresearchcommunitythereareexamplesofgroupsusingVMware’svCloudDirector29softwaretoprovideIaaS,alsoOpenNebula30.GroupsareincreasinglyturningtoOpenStack31,asdescribedinAnnexB.

21http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html22http://azure.microsoft.com/en-gb/documentation/services/websites/23https://cloud.google.com/appengine/docs24https://developers.google.com/maps/25http://cloudbiolinux.org/26http://aws.amazon.com/27https://cloud.google.com/compute/28http://azure.microsoft.com/en-gb/services/virtual-machines/29http://www.vmware.com/products/vcloud-suite/30http://opennebula.org/31http://www.openstack.org/

Page 29: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexA:WhatistheCloud?Afunctionalview 29

A.3DeploymentModels

Privatecloud:Thecloudinfrastructureisprovisionedforexclusiveusebyasingleorganizationcomprisingmultipleconsumers(e.g.,businessunits).Itmaybeowned,managed,andoperatedbytheorganization,athirdparty,orsomecombinationofthem,anditmayexistonoroffpremises.

Communitycloud:Thecloudinfrastructureisprovisionedforexclusiveusebyaspecificcommunityofconsumersfromorganizationsthathavesharedconcerns(e.g.,mission,securityrequirements,policy,andcomplianceconsiderations).Itmaybeowned,managed,andoperatedbyoneormoreoftheorganizationsinthecommunity,athirdparty,orsomecombinationofthem,anditmayexistonoroffpremises.

Publiccloud:Thecloudinfrastructureisprovisionedforopenusebythegeneralpublic.Itmaybeowned,managed,andoperatedbyabusiness,academic,orgovernmentorganization,orsomecombinationofthem.Itexistsonthepremisesofthecloudprovider.

Hybridcloud:Thecloudinfrastructureisacompositionoftwoormoredistinctcloudinfrastructures(private,community,orpublic)thatremainuniqueentities,butareboundtogetherbystandardizedorproprietarytechnologythatenablesdataandapplicationportability(e.g.,cloudburstingforloadbalancingbetweenclouds).

Page 30: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 30

AnnexB:CloudComputingforResearchersManyoftheproductsandserviceswerefertohereareeitheropensource,oravailableinafreeorlowcostpubliccloudplatformtierthatresearcherscouldreadilyexperimentwith.

B.1SoftwareasaServiceAsnotedabove,softwareasaserviceisreallyanotherwayofsayingthattheproductisdeliveredasawebsiteorwebservice.SoftwareasaServiceproductsaretypicallyhostedbythefirmwhichhasproducedthem,andhencenotgenerallyavailablethroughprivateorcommunityclouds.Wehavesubdividedthiscategoryintotwo:

Generalisedproductsthathaverelevancetoresearchers

ExampleshereincludeawiderangeofInternetcollaborationsuitessuchasBox32,Dropbox33,GoogleDrive34andMicrosoftOneDrive35.Manyoftheseproductshaveadualtrackmodelthatincludesafreemiumserviceaimedatconsumers,andanenterpriseserviceforpayingcustomers.Thesetendtohaveverydifferenttermsandconditions,forexampleconsumertermsandconditionsoftenexplicitlyruleoutanywarrantywhereasenterpriseagreementswillprovideaServiceLevelAgreementincludingpenaltyclausesifSLAcommitmentsarenotmet.Bespokeenterpriseagreementsareoftencreatedthataddresscustomers’particularconcerns,followinglegalandcontractualreview.

WewillmakeaspecialmentionhereoftheGoogleAppsforEducationandMicrosoftOffice365collaborationsuites,whichhavebeenwidelytakenupbyresearchandeducationinstitutions.Wecansurmisethatthisisbecausetheyofferworld-leadingfacilitiesatzerocostforthebaseservice.Thesesameserviceshaveasignificantper-usercostforbusinesses,butarebeingwidelyadoptedbyfirmsbecauseofaperceptionthattheysignificantlyundercutthecostsofprocuringconventionalserverandstorageequipmentandtheninstallingandrunningtheclosestequivalentsoftwarelocally-suchasMicrosoft’sExchangeproduct.Furthermore,thesecloudproductsarecontinuallybeingdevelopedandevolved.GooglenotablytakeprideinmakinghundredsofsmallpointreleasesofGoogleAppsinatypicaltwelvemonthperiod.

Wehavealsoseenasignificantamountofinterestfromresearchersinusinggeneralisedcommunicationtoolssuchassocialmediaandblogstodisseminateresults,tosupportopenpractices,andtohelpstimulateapublicdebateabouttheirwork.Researchersworkinginscientificcomputingoftenexploitproductsandservicesaimed

32https://www.box.com/en_GB/home/33https://www.dropbox.com/34https://www.google.com/drive/35https://onedrive.live.com/about/en-us/

Page 31: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 31

atthedevelopercommunitysuchasGitHub36versioncontrol,whichoffersbothafreemiumconsumerserviceandanenterpriseservice.

Productsthatarespecificallytargetedatresearchers

Inparalleltoresearchadoptionofthesortofgeneralisedserviceslistedabove,anumberofproductshaveemergedthatspecificallytargetresearchersasacommunity.Theseinclude:

• ProfessionalnetworkingsitessuchasMendeley37,Academia.edu38andResearchGate39.

• Specialisedfacilitiesforsharingcodeanddatasuchasfigshare.com40.

• Openaccessjournals,repositoriesandpre-printarchivessuchasarXiv.org41,PLOS42,institutionalrepositoriesandsitesfromtraditionalpublisherssuchasElsevier’sScienceDirect43.

• Equipmentcataloguessuchasequipment.data.ac.uk44andKit-Catalogue45fromJiscandEPSRC.

• LabmanagementsoftwaresuchasQuartzy46–afreeservicewhichcoincidentallylinksusersoflabequipmentwithvendorstoreducethefrictionofreplacingconsumablesandequipmentmoregenerally.

• Labtests“asaservice”–sendoffyoursampletobeprocessedanddownloadtheresults.

• IndependentSoftwareVendors(ISVs)ofscientificcomputingsoftwareandmajorhardwareprovidersareinmanycasesstartingtooffertheirownpackagesthroughtheirowncloudservice–e.g.theAtosExtremeFactory47.

B.2PlatformasaServiceAsnotedabove,ithasbecomeincreasinglycommonforcloudproviderstoexposetheunderpinningsoftheirinternalservicesforotherdeveloperstobuildupon.Thesetendtobepromotedfirstandforemostasfacilitiesforhelpingdeveloperstobuildhighvolumewebsites-suchasloadbalancingandhighlyperformance/availabledatabases.However,cloudprovidersalsooftenexpose“microservices”thatareofparticularrelevanceto

36http://github.com37https://www.mendeley.com/38https://www.academia.edu/39http://www.researchgate.net/40http://figshare.com/41http://arxiv.org/42https://www.plos.org/43http://www.sciencedirect.com/44http://equipment.data.ac.uk/45http://www.kit-catalogue.com/46https://www.quartzy.com/47http://www.bull.com/extreme-factory

Page 32: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 32

researchers–e.g.machinelearninganddataprocessingfacilities.Wehavepickedoutsomeexamplesbelow.Itisimportanttonoteherethatmanyofthekeyplatformtierproductsareeitheropensourceoravailableinanopensourceequivalent,facilitatingtheiruseinprivateandcommunityclouds.RelationaldatabasesMostofthemajorpubliccloudprovidershavesomeformofrelationaldatabaseplatformservice.Theseareusuallycompatiblewithexistingwidelydeployedproducts–e.g.AmazonRDS48canbeusedinplaceofMySQLsimplybychangingtheDatabaseSourceName(DSN)yourcodeconnectsto.NoSQLCloudprovidersoftenoffersocalledNoSQLservices,whicharetypicallynon-relational(schemaless)approachestostoringandmanipulatinglargevolumesofinformation.ExampleshereincludeGoogleCloudDatastore49,AmazonDynamoDB50,severalAzureNoSQLservicesandtheopensourceCassandra51,CouchDB52andMongoDB53products.ObjectstoresSoftwarebuilttouseacloudmodelhastendedtoavoidhavingpermanentlymountedsharedfilesystemssuchaswemightseeinanHPCcluster.Instead,itiscommonpracticetouseobjectstoressuchasAmazonS354,GoogleCloudStorage55,AzureBlobStorage56oropensourcepackageslikeRedis57.Researchersmightbemorefamiliarwithusingmoreniche(yetverywellestablishedintheresearchcommunity)objectstoresoftwareliketheopensourceiRods58,whichisnotyetoffereddirectlyasaservicebythemajorcloudproviders.“Bigdata”ThegoldstandardsoftwareforhandlingbigdataistheApacheHadoop59project,anopensourcesuitewhichgivesroughlyequivalentcapabilitytoGoogle’sproprietaryBigTablealgorithmviaitsHBasesubsystem.Hadoopiswidelyavailablefromcloudprovidersasahostedservice,andalsooftensoldtoenterpriseandinstitutionsasabigdatahandlingsolution–e.g.viaapackagedappliance/cluster.SubsequenttothesuccessofHadoop,Google

48http://aws.amazon.com/rds/49https://cloud.google.com/datastore/docs/concepts/overview50http://aws.amazon.com/documentation/dynamodb/51http://cassandra.apache.org/52http://couchdb.apache.org/53https://www.mongodb.org/54http://aws.amazon.com/s3/55https://cloud.google.com/storage/56http://azure.microsoft.com/storage/57http://redis.io/58http://irods.org/59https://hadoop.apache.org/

Page 33: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 33

openedupitsowninternalserviceastheBigQuery60product.Itisbecomingcommonforcloudproviderstoprovide“bigdataasaservice”usingHadooportheirownproprietarytechnology,e.g.AmazonoffertheirElasticMapReduce61product.MachinelearningMachinelearningisgenerallydefinedintermsof“teaching”thealgorithmtorecogniseagiventargetusingtrainingdatasets.Forexample,GooglerecentlyusedacacheofcatvideosonYouTubetocreateamachinelearningmodelthatwasreliablyabletoidentifycatsinvideos–andworkhasnowmovedonmorecomplexconceptssuchasdescribingthecontentsofaphotograph.MachinelearningtoolssuchasAmazonMachineLearning62,GooglePredictionAPI63,AzureMachineLearningStudio64andApacheMahout65aregenerallyavailable.Mahoutisopensource,butnotyetofferedasaplatformservicebythemajorcloudproviders,whohavetheirownalternatives.ItisimportanttonotethatmanyplatformservicesexposeApplicationProgrammingInterfaces(APIs)thatareuniquetothatservice.Whilstthisisnotuniversallytrue(e.g.AmazonRDS“lookslike”MySQL),thereisapotentiallysignificanteffortrequiredtomigratefromoneprovider’splatformservicetoanother.Furthermore,someoftheassociateddata,suchasamachinelearningmodel,maynotbereadilyexportedtoanotherprovider.Thereforeswitchingcostswillincludetheregenerationofthatdataset.

B.3InfrastructureasaServiceThisisthecloudtierwhichresearchersinscientificcomputingmaybebestequippedtoengagewithdirectly,andindeedmanyresearchersareusingprivate,communityorhybridcloudstodaywithoutperhapsrealisingit–forexampleCERNhasmoveditsentirecomputeinfrastructureovertotheOpenStack66opensourcecloudsoftware.InfrastructureasaServicetypicallyequatestorunningupvirtualmachines(VMs)onthecloudprovider’sshared(multitenant)infrastructurebutitequallyappliestostorageandnetworkingconfiguration.Cloudprovidershavevariousstrategiesforpreventingthesevirtualmachinesfrominterferingwithoneanother,suchassecuritygroupstoprovideIPaddresslevelaccesscontrols.Whilstthereareanumberofvirtualmachineimageformatsused,toolsarealsoavailabletoconvertfromonetoanother,andinmostcasesitispossibletotakeavirtualmachineimagefromoneproviderandconvertittotheformatusedbyanother–e.g.fromaVMwareVMDKdiskimageoranOpenStackQCOW2imagetoanAmazonMachineImage(AMI).

60https://cloud.google.com/bigquery/61http://aws.amazon.com/elasticmapreduce/62http://aws.amazon.com/machine-learning/63https://cloud.google.com/prediction/64https://studio.azureml.net/65http://mahout.apache.org/66http://www.openstack.org/

Page 34: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 34

OrchestrationInfrastructureasaServiceisverywellestablished,withawiderangeoforchestrationtoolsfromcloudprovidersandtheopensourceandcommercialsoftwarecommunities.AtoneextremethesepermittheusertobringupacompletesuiteofVMsfeaturingdifferentapplicationsandconfigurations,tocreatevirtualdatacentres.AtanotherextremewemightsimplybeinterestedinensuringthatcertainsoftwaredependenciesareinstalledonaVM.OrchestrationexamplesfromcloudprovidersincludeAmazon’sCloudFormation67,GoogleCloudDeploymentManager68andAzureAutomation69.Therearealsoprovider-agnosticorchestrationtoolssuchasOpenNebula70(AWS,Azureandprivate/hybridclouds).CannedVMsOneofthemoredauntingaspectsofsupportingscientificcomputingcanbethesheerrangeofsoftwarepackagesinusebyaninstitution’sresearchers.Mostmajorcloudprovidersofferamarketplaceofready-madeVMswithcommonlyusedsoftwarepre-installedonthem,suchasAmazon’sAWSMarketplace71orMicrosoft’sVMDepot72.Thishasthepotentialtobeahugetimesaverforboththeresearcherandthescientificcomputingteam,e.g.foranapplicationwithcomplexdependenciessuchastheGalaxy73bioinformaticssuite.However,theresearchermayneedtobeextremelycarefulabouttrackingsoftwareversionswhenusingreadymadeVMs,tobesureofbeingabletoreproduceresults.Forexample,itmaybeadvisabletoarchivetheVMusedforaparticularpieceofwork,justincasethisturnsouttonolongerbeavailablethroughthecloudprovideratalaterdatewhenitisnecessarytorepeatthework.BringYourOwnLicenseItisbecomingincreasinglycommontofindbothopensourceandcommercialofftheshelfscientificcomputingsoftwareavailablepackagedasVMs,e.g.ANSYSprovidetheirFLUENTCFDsoftwareinthiswaythroughtheAWSMarketplace.CommercialsoftwareistypicallyprovidedunderaBringYourOwnLicensemodel,wheretheenduserisexpectedtoalreadybealicensedcustomer-perhapswithanegotiatedlicenseextensiontousecloudfacilities.Withoutthelicensekeyoraconnectiontoalicenseserver,thesoftwareisuseless.InsomecasestheIndependentSoftwareVendor(ISV)operatestheirowncloudbasedlicenseserverforrealtimecheckingoflicenseentitlementagainstconcurrentusage.Quasi-HPCfacilities

67http://aws.amazon.com/cloudformation/68https://cloud.google.com/deployment-manager/69http://azure.microsoft.com/en-gb/services/automation/70http://opennebula.org/71https://vmdepot.msopentech.com/List/Index72https://vmdepot.msopentech.com73https://galaxyproject.org/

Page 35: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexB:CloudComputingforResearchers 35

OfcoursemanyresearchersareaccustomedtorunningtheircodesontraditionalHPCclusters,andcloudprovidershaverecognisedthisbyprovidingtoolsthatlettheresearcherconvenientlybringupalargenumberofVMswithdedicatedloginnodesandsharedstorageinthetypicalHPCclustermodel.ExamplesoftheseincludetheAzureHPCClusterService74andAmazon’scfnclustertool75.These“quasiclusters”oftendonothavethesortofperformantinterconnectandhighperformanceparallelfilesystemthatwemightseeonatrueHPCcluster–althoughMicrosoftnowoffercomputenodeswithInfinibandinterconnectaspartofAzure.

Formanyclassesofcomputeworkloadthismaynotbeablocker(e.g.embarrassinglyparallelbioinformaticsjobs),butfurtherworkisstillrequiredwithcommonlyusedcodestoexploreperformanceaspects.Somecloudprovidershaverecognisedthatthereissufficientdemandforhighperformanceinterconnectsandaremakingspecialisthardwareavailablesuchasextralargenodes,lowlatencyinterconnects,GPUsandevenFPGAs.

Virtualization,containersandDockerInanidealworld,itwouldbepossibletosimplypickupanapplicationandmoveitanditsdependenciestothecomputeplatformofchoice.Thisissomethingthatthescientificcomputingcommunityhasoftenaddressedbystaticallylinkingexecutablestothelibrariesthattheydependon,butintoday’sincreasinglycomplexdataprocessingenvironmentthisapproachhasitslimitations.Virtualizationatthemachinelevelhasmadeitpossibletobuildthedesiredenvironment,e.g.ontheresearcher’slaptop,andthentakeasnapshotofittorunonthecomputeplatformofchoice.However,thewiderangeofmachineimagesandhypervisorsmakesitdifficulttogeneralisethisapproach–whatisneededissomekindofpackagingstandard.Afurtherwrinkleisthathypervisorstypicallyaddanunwelcomeoverheadtocomputeanddataintensiveworkloads.TheDocker76projectattemptstosolvetheseproblemsusingviaoperatingsystemlevelvirtualizationapproach,buildingonLinuxkernelfeaturestoprovideportableprovideragnostic“containers”thatencapsulateapplicationsandtheirdependencieswhilstisolatingthemfromeachotherandimprovingontheperformanceofahypervisorapproachtovirtualization.Thisisactuallynotanewapproach,andDockerbuildsonideasthatmaybefamiliarfromSolarisZonesandmainframeoperatingsystems.Dockercontainersareinherentlyportable.SupportforDockerhasbeenforthcomingfromAmazon77,Google78andMicrosoft79amongstothers.Whilstthereareothercompetingtechnologies,DockerhasreachedcriticalmassintermsofmindshareandiswidelyusedwithintheInternetindustrye.g.eBay,Yelp,Spotify,YandexandBaidu.TheformationoftheOpenContainerProject80insummer2015,withbroadindustrysupport,suggeststhatapplicationvirtualisationissettorapidlybecomethenorm. 74http://azure.microsoft.com/en-gb/solutions/big-compute/75http://aws.amazon.com/hpc/cfncluster/76http://docker.io77http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html78https://cloud.google.com/compute/docs/containers79https://azure.microsoft.com/en-gb/documentation/articles/virtual-machines-docker-vm-extension/80https://www.opencontainers.org/

Page 36: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexC:Trust&publiccloud 36

AnnexC:Trust&publiccloudPost-Snowden,wehaveincreasinglystartedtocastacriticaleyeonservicesprovidedbyUSownedtechfirms-havetheycolludedwiththeFBI,CIAorNSAtobuildinbackdoorstotheirproducts?TheUKGovernmenthasprovided81aconvenientsetofCloudSecurityPrinciplesthatresearchersandinstitutionscanusetoestablishwhetheraproviderhastakenadequatecaretoprotecttheirdata.LeadingpubliccloudproviderslikeAmazon82andMicrosoft83haveprovidedtheirowncompliancestatementsregardingtheCloudSecurityPrinciples.

Publiccloudproviderswouldalsonotethatitisperfectlypossibletocreateaninsecureorunreliablecloudservicesimplybynotfollowingbestpractice,justasitwasalwayspossibletocreateaninsecureorunreliablein-houseservice.Amazoncodifythisthroughtheirstatementaboutsharedresponsibilities,showninthefigurebelow.

Figure2.Amazonsharedresponsibilitymodel

ForresearchersandinstitutionsoperatingintheUKandthegreaterEuropeanEconomicArea,thereareparticularissuesarounddatasetsthatithasbeenstipulatedmaynotleavethecountry,orleaveEuropeasawhole.AnexampleofthiswouldbegenomedatagatheredaspartofGenomicsEngland’s100,000Genomesproject-whichmustbekeptinEngland84.

Conversely,theEU-SafeHarborAgreement85whichcreatesamanagedprocessgoverningthecontrolledreleaseofEuropeandatatotheUnitedStates,iscurrentlybeingchallengedintheEuropeanCourtofJusticebyAustrianlawstudentMaximillianSchrems,whoallegesthatitcontravenesEUDataProtectionlegislation86.IftheSchremscaseissuccessful,EU-SafeHarborwillbestruckfromthebooksandUSserviceprovidersmaybeforcedtoopenEuropeanfacilitiesorpreventedfromoperatinginEurope.

81https://www.gov.uk/government/publications/cloud-service-security-principles

82http://d0.awsstatic.com/whitepapers/compliance/AWS_CESG_UK_Cloud_Security_Principles.pdf83http://www.microsoft.com/en-gb/enterprise/it-trends/cloud-computing/articles/14-points.aspx84http://www.genomicsengland.co.uk/the-100000-genomes-project/faqs/data-faqs/85http://en.wikipedia.org/wiki/International_Safe_Harbor_Privacy_Principles86http://cjicl.org.uk/2015/04/13/safe-harbor-before-the-eu-court-of-justice/

Page 37: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexC:Trust&publiccloud 37

WemightdrawtheconclusionfromSnowdenthatallformsofdigitaltechnologyhavebeenorwillbeinfiltratedbynationalactors,butitisalsothecasethatthemajorcloudprovidershavebeenworkingtirelesslytoreducetheattacksurface-e.g.byencryptinglinksbetweendatacentres,usingtwofactorauthentication,andencryptingdataintransitandatrest.Forsoundbusinessreasons,cloudprovidersaretryingtoensurethatanauditableandjudicialprocessisfollowedwhenprovidinginformationaboutusersortheirdatatoauthorities,andtofrustrate“trawling”efforts.Forexample,MicrosoftmountedalegalchallengearoundthereleaseofdatafromtheirDublinAzuredatacentretotheUSgovernment87,andGooglepublisharegularTransparencyReport88quantifyinggovernmentrequestsfordata.

Tohelpestablishsomenormsarounduseofpubliccloudgloballywehaveprovidedseveralexamplesbelowofcasestudieswherecloudtechnologiesbeingusedforsensitiveapplicationsincludingpersonaldata,governmentdataandincaseswhereintellectualpropertyisakeyconsideration:

• “TheFinancialIndustryRegulatoryAuthority(FINRA)intheUSisusingAWStoanalyzeandstoreapproximately30billionmarketeventseveryday,savingsome$10m-$20mthroughthemovetothecloud”-FINRA89

• “Inthepast,asimplequestionaboutgeneticslinkedtoamedicalconditionmighttakehours,orevendays,toexecute.ByleveragingGoogleCloudPlatform,theanalysisof1,000patients’genomicdata,across218diseases,generatesnearreal-timeresults”–NorthrupGrumman90

• “ThePhilipsHealthSuitedigitalplatformanalyzesandstores15PBofpatientdatagatheredfrom390millionimagingstudies,medicalrecords,andpatientinputstoprovidehealthcareproviderswithactionabledata,whichtheycanusetodirectlyimpactpatientcare”–Philips91

• “MountSinaiandtheircollaboratorsatStationXareminingthemorethan2,000breastandovariantumorandgermlineDNAsequences(100TBdata)generatedbyTheCancerGenomeAtlasConsortium”–MountSinaiMedicalCentre92

• "Insteadofhavingtospendintheorderof£50,000peryearonstorage,wecanexpandourcloudstorageorbuysometier-threestorageinstead.That'sanorderofmagnitudecheaper-wecanliterallyknockazerooffthatsumwhenweneedtoexpand”–HomertonHospital93

FromaUKperspectivewewouldalsonotethatGoogleAppsiscurrentlybeingrolledoutacrossHMRevenue&Customs94,andtheUKSupremeCourthasmovedtoOffice36595.

87http://www.theguardian.com/technology/2014/dec/14/privacy-is-not-dead-microsoft-lawyer-brad-smith-us-government88http://www.google.com/transparencyreport/89http://aws.amazon.com/solutions/case-studies/finra/90http://googlecloudplatform.blogspot.co.uk/2015/03/personalized-medicine-with-Northrop-Grumman-and-Google-Cloud-Platform.html91http://aws.amazon.com/solutions/case-studies/philips/92http://aws.amazon.com/solutions/case-studies/mt-sinai/

93http://www.computing.co.uk/ctg/news/2360094/case-study-how-homerton-hospital-saved-90-per-cent-on-storage-hardware-by-shifting-from-npfit-for-clinical-imaging

94http://thenextweb.com/google/2015/06/05/the-uk-tax-man-switches-to-googles-cloud-and-drops-microsoft/

Page 38: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 38

AnnexD:NeIandcloud

D.1Publicinvestmentine-InfrastructureInvestmentsbyBIS,theResearchCouncilsandHEIsin2011-12(£160M),2012-2013(£189M)and2014-15(£257M)haveresultedincoreelementsofthenationale-Infrastructurebeingputinplace.

2011-2012

InvestmentsweremadeincoreHPCandNetworkinginfrastructure.InadditioninvestmentsweremadeintheAuthenticationInfrastructureMoonshot(nowknownasJiscAssent).TheseinvestmentsarelistedinTable1below.

Table4:2011-2012HPCInvestments

HPCProject RC Amount/£M

NationalService EPSRC,NERC 43

HartreeCentre STFC 30

DIRAC STFC 15

GRIDPP STFC 3

TheGenomeAnalysisCentre(TGAC)

BBSRC 8

Monsoon NERC/MetOffice 1

JASMIN2&CEMS NERC,&UKSA 7.75

RegionalCentres:N8,SES5,MID+,HPCMidlands,ARCHIE-

EPSRC 10

95http://www.microsoft.com/en-gb/enterprise/it-trends/cloud-computing/articles/how-the-cloud-is-future-proofing-it-at-the-uk-supreme-

court.aspx

Page 39: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 39

HPCProject RC Amount/£M

WeSt

JANETNetworkandAuthentication(Moonshot)

Jisc 31

HPCDataStorage EPSRC,STFC 15

2012-2013

BigDataprojectsusingfundsannouncedbytheGovernmentinDecember2012werefundedatthistime.MajorAwardshavebeenmadeto18centresintheUK,16ofwhomareHEIs.TheseawardsarelistedinTable2.Thepre-eminentroleofHEIsinmanagingandprovidingnationalandLargeSpecialistdataandcomputeservicestoUKacademiaisemphasisedbytheseawards.

Table5:BigDataInvestments2012-2013

BigDataProject RC Amount/£M

Digitaltransformationsinartsandhumanities

AHRC 8

E-infrastructureforbiosciences BBSRC 13

ResearchdatafacilityandsoftwareDevelopment

EPSRC 8

Administrativedatacentres ESRC 36

Understandingpopulations ESRC 12

Businessdatasafe ESRC 14

Biomedicalinformatics MRC 55

Page 40: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 40

BigDataProject RC Amount/£M

NERCEnvironmentalBigDataInitiative

NERC 13

SquareKilometreArray STFC 11

EnergyEfficiencyComputingHartreeCentre

STFC 19

Total 189

Furtherinvestmentsweremadeasfollows:

• TheMedicalResearchCouncil(MRC)willinvest£50millioninbioinformatics,whichusesmanyareasofcomputerscience,statistics,mathematicsandengineeringtoprocessbiologicaldata.Theseinclude£19MtotheFarrInstituteofHealthInformaticsResearchatnodes,London,Manchester,Wales,Scotland;andthe£32MMedicalBioinformaticsinitiativetofund5projects;eMedLab(UCLPartners-Crick-Sanger-EBI),TheMRCConsortiumforMedicalMicrobialBioinformatics,LeedsMRCMedicalBioinformaticsCentreandtheMRC/UVRIMedicalInformaticsCentre.

• TheArtsandHumanitiesResearchCouncil(AHRC)invested£4millionin21newopendataprojects.Theywillmakelargedatasetsthatordinarilyonlyacademicswouldhaveaccesstoaccessibletothegeneralpublic.

• TheEconomicandSocialResearchCouncil(ESRC)hasinvested£14millionin4newresearchcentresatEssex,Glasgow,UCLandLeedsUniversities,aswellasafurther£5MinvestmentintheAdministrativeDataServiceatEssexUniversity.Thecentreswillmakedatafromprivatesectororganisationsandlocalgovernmentaccessibletoresearchersinvestigatinganythingfromtransporttoobesity.Atpresentthedataisbeingcollectedbytheseorganisations,butisnotbeingusedforresearchpurposes.Theprogrammeisdividedupinto3phases:Phase1wassetuptogetinformationfromgovernmentdepartmentsandsetupAdministrativeDataResearchNetwork;Phase2:setupBusinessandLocalGovernmentDataResearchCentres;collectandsetupPhase3SocialMediaandThirdSectordata.

• TheNaturalEnvironmentResearchCouncil(NERC)hasinvested£4.6millionoffundingfor24projectstohelptheUKresearchcommunitytakeadvantageofexistingenvironmentaldata.

• TheEngineeringandPhysicalSciencesResearchCouncil(EPSRC)haveinvested£8MintheResearchDataFacility(RDF)whichisoperatedbyEPCC.

TheRDFisdesignedtoprovideresearchdatamanagementanddataanalysisservicesforALLRCUKresearchers.Accesswillbegovernedbyapeerreviewmechanism.EarlyprojectsincludethehostingoftheDiRACCodeBenchmarkingProject.

Page 41: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 41

Otherdatainfrastructureinvestmentsincludethe

• EuropeanBioinformaticsInstituteatHinxton(Cambridge)whichreceived£75MfromBBSRC,

• theOpenDataInstitute,anotfor-profitfundedbyInnovationUK(£10Mover5years,subjecttoindustryinvestment)andbyindustryisabigdataundertakingdedicatedtoprovidingopenaccesstodatafromacrossthepublicsectorinordertoenableindustrialandacademicexploitation.

• In2012,theClinicalPracticeResearchDatalink,a£60millionservicefundedbytheMHRAandtheNationalInstituteforHealthResearch,wasestablishedtoprovidepatientdataformedicalresearch.

• TheGovernmentearmarked£100millionfortheNHStosequencetheDNAofupto100,000patientswithcancerandrarediseases,whichwillincludethedevelopmentofappropriatedatainfrastructure(NHS).

2014-2015

Threemajorinvestmentsdominatedthisperiod

1. CentreforCognitiveComputingattheHartreeCentre.Thiswasfundedatthe£115Mlevelwithafurther£230MfromIBM

2. A10PflopSupercomputerfortheMetOffice(£100M)3. AlanTuringCentreforDataScience(£42M)

Inadditionitwasannouncedthatafurther£100MwouldbemadeavailabletotheSKAProjectaspartofBigDataInvestments.

D.2NeI,cloudtechnology&theaccessagendaAtpresenttheNeIprojectsarelargelyhiddenfromthegeneralresearchcommunityandsomewouldsayarelargelyhiddenfromeventheirintendeduserbase.

HPCandHTCarestillregardedasdifficulttoolstousebyresearchers,eventhoughtheyareprobablynomoredifficulttousethanapieceoflabequipment.

HowdowethenmakeHPC,DataIntensiveComputingandHTCberegardedaspartoftheirbasicresearchlaboratory?

Thisrequiresresearcherstoviewtheseresourcesas

Page 42: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 42

• Viewable.Theyaremanifestedtangibly,whetheronadesktoporamobiledevice

• Easytointerfacewithsothatuserscano Submitworkflowso Constructworkflowso Seewhatresourcesareavailableo Checkwhatresourcestheyhaveused

• NeedingonlyONEuseridentity

Thediagramsaboveandbelowgivesanexampleofhowthisworks.Asfarastheuserisconcerneddirectlyinterfacingisnownotneeded.WhatbecomesimportantnowaretheCloudServicesdescribedearlierinthisreport.

Theneedtointeractdirectlywiththecomputeanddataresourcesisremovedandisreplacedbyasetofgenericfunctionsinaworkflowthatallowtheuserineffecttousethesameworkflowonamultiplicityofsystems.

Itisthisthatremovesabarriertousagebygivenusertheon-demandself-servicetheywouldliketouse.

Page 43: Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors Group (PDG) Recommendations 5 the available cloud resources. RCUK needs to invest in

CloudComputingforResearch&Innovation

ProjectDirectorsGroup(PDG)

AnnexD:NeIandcloud 43

AnexampleintheUKofafunctioningCloudisJASMIN296.JASMIN2supportsthedataanalysisrequirementsoftheUKandEuropeanclimateandearthsystemmodellingcommunity.Itconsistsofmulti-Petabytefaststorageco-locatedwithdataanalysiscomputingfacilities,withsatelliteinstallationsatBristol,LeedsandReadingUniversities.

JASMIN2isasuccessfulClouddeploymentthatservicesakeyUKresearchcommunity.

96http://jasmin.ac.uk/