how the indigo-datacloud computing platform aims at ... · how the indigo-datacloud computing...

44
How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 Giacinto DONVITO INDIGO Technical Director INFN Bari [email protected]

Upload: others

Post on 22-Sep-2019

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

HowtheINDIGO-DataCloudcomputingplatformaims

athelpingscientificcommunities

RIA-653549Giacinto DONVITO

INDIGOTechnicalDirectorINFNBari

[email protected]

Page 2: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGO-DataCloud

• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)

• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures

• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.

• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology

• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation

• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2

Page 3: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

FromthePaper“AdvancesinCloud”

• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf

To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.

3

Page 4: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGOAddressesCloudGaps

• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance

• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.

4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 5: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.

5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 6: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

WorkPackages

6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 7: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGO-DataCloudGeneralArchitecture

7

JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI

OtherScienceGateways

Mobile Apps

OpenMobileToolkit

Ophidpiaplugin

LONIplugin

Taverna,Keplerplugin

AdminPortlets

UserPortlets

DataAnalitics

WorkflowPortlets

SGMonGUIClients

FutureGatewayPortal WorkflowsMobileclientsSupportservices

WP6Services

Kubernetes Cluster

IAM

Service

PaaS

Orchestrator

QoS/SLA

CloudProvider

Ranker

Monitoring

Infrastructure

Manager

TOSCA

TOSCAWP5

Services

Onedata Dynafed

FTSDataServices

REST/CDMI/Wedbav/posix/GridftpOIDC

Accounting

Non-INDIGO

IaaS

NativeIaaS API

Heat/IM

TOSCA

WP4Services

Mesos

ClusterMesos

Cluster

Aut.Scaling

Service

Storage

Service

S3/CDMI/Posix/WebdavGridFTP

Smart

Scheduling

SpotIstances

Native

Docker

QoS Support

Identity

Armonization

Local

Repository

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 8: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

IaaSFeatures(1)

• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.

• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).

• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.

• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.

8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 9: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

IaaSFeatures(2)

• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.

• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.

• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.

9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 10: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

PaaSFeatures(1)

• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.

• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.

• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.

• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.

10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 11: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

PaaSFeatures(2)

• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.

• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.

• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.

• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.

11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 12: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

PaaSFeatures(3)

• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.

• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.

• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.

12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 13: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

AAIFeatures

• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 14: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

StorageQualityofServiceandtheCloud

14

Amazon S3 Glacier

Google Standard DurableReducesAvailability Nearline

HPSS/GPFS CorrespondstotheHPSSClasses(customizable)

dCache Resilient TAPEdisk+tape

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 15: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Nextstep:DataLifeCycle

15

• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings

6m 1years 10years

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 16: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

DataFederation

AmazonS3

DNS:p-aws-useast

INFNItaly

DockerOneclient

Docker

AWSUSA

DockerOnezone

VMonezone

DockerOneclient

Docker

NFSServer

VMoneprovider

VMnfs

VMoneclient

POSIXVolume

DockerOneclient

DockerUPVSpain

VM:demo-onedata-upv-provider

DockerOneclient

LaptopOSX

SAMBAExport

boot2docker

20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 17: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

17

FrontendServices/Toolkit

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 18: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Integration schemas

• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.

• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.

• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18

Page 19: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

FromCSGFtoFutureGateway

GridEngine

JSAGA

Portlet Portlet …

ClassicCSGF (before INDIGO)

Liferay/Glassfish

JSAGA

Portlet Portlet …

FutureGateway Approach (INDIGO)

Liferay/Tomcat

Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries

APIServer

Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA

RESTAPIs

Web/MobileApps

• ThesameRESTAPIscouldbeusedbyMobileApps

• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer

• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19

Page 20: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Ophidia framework

• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators

• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20

Page 21: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGOmoduleforKepler

• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21

Page 22: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

22

Usecasesexamples

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 23: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Service Deployment andapplication execution

Integrating distributed data infrastructures with INDIGO-DataCloud 21

WearenowworkingonaddingaCaliconetworkconfiguration

Page 24: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Application execution tothe PaaS Layer

• TheINDIGOapproachtotheapplicationdistributionandexecutionis:• BasedonDocker• ExploitsMesos+Chronos• AlltheapplicationexecutionsaredescribedexploitingaTOSCATemplatesviasimpleAPIsorPortlets• Theinput/outputareautomaticallymanagedbythePaaS layer(viaOnedata andexternalendpoints)• Dependencies,retryonfailuressupportedbymeansofChronos• Geographicaldata-awareschedulingprovidebyINDIGOPaaS orchestrator

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 25: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

• chronos_job_1:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Executeapp'• command: /bin/bashrun.sh• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_SPACE: {get_input: InputOnedataSpace}• INPUT_PATH: {get_input: InputPath}• ....• artifacts:• image:• file: indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1

• chronos_job_upload:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Uploadoutputdata'• command: /bin/bashrun.sh• retries:3

• environment_variables:• PROVIDER_HOSTNAME: <ONEDATA_PROVIDER_IP>• ONEDATA_TOKEN:<ROBOTToken>• ONEDATA_SPACE:<path>• INPUT_FILENAME: <inputfilename>• OUTPUT_FILENAME: <inputfilename-->coincindeswithamber-job-01

OUTPUT_FILENAME>• OUTPUT_PROTOCOL:http(s)|ftp(s)|S3|Swift|WebDav• OUTPUT_URL: <outputURL>• OUTPUT_CREDENTIALS: <e.g.username:password>• artifacts:• image:• file: indigodatacloud/jobuploader• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1• - job_predecessor: chronos_job_1•• docker_runtime1:• type:tosca.nodes.indigo.Container.Runtime.Docker• capabilities:• host:• properties:• num_cpus: 0.5• mem_size: 512MB

Page 26: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

• tosca_definitions_version:tosca_simple_yaml_1_0• imports:• - indigo_custom_types:https://raw.githubusercontent.com/indigo-dc/tosca-

types/master/custom_types.yaml• description:>• TOSCAexamplesforspecifyingaChronos Jobthatrunsanapplicationusing

Onedata storage.• inputs:• input_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sINPUTOnedata space• required:yes• output_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sOUTPUTOnedata space.

Itcanbethesameastheinputtoken• required:yes• #data_locality:• #type:boolean• #description:FlagthatcontrolstheINPUTdatalocality:ifyestheorchestrator

willselectthebestprovider,ifnotheuserhastospecifytheprovidertobeused• #required:yes• input_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomounttheInput

Onedata space.Ifnotprovided,datalocalityalgo willbeapplied.• entry_schema:• type:string• default:['']• required:no• output_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomountthe

OutputOnedata space.Ifnotprovided,thesameprovider(s)usedtomounttheinputspacewillbeused.

• entry_schema:• type:string• default:['']• required:no• input_onedata_space:• type:string

• required:yes• output_path:• type:string• description:PathtotheoutputdatainsidetheOutputOnedata space• required:yes• output_filenames:• type:list• description:Listoffilenamesgeneratedbytheapplicationrun• entry_schema:• type:string• required:yes• cpus:• type:float• description:AmountofCPUsforthisjob• required:yes• mem:• type:float• description:AmountofMemory(MB)forthisjob• required:yes• topology_template:• node_templates:• chronos_job:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule:'R0/2015-12-25T17:22:00Z/PT5M'• name:'JOB_ID_TO_BE_SET_BY_THE_ORCHESTRATOR'• description:'Executeapp'• command:'/bin/bashrun.sh'• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_TOKEN: {get_input:input_onedata_token }• OUTPUT_ONEDATA_TOKEN: {get_input:output_onedata_token }• INPUT_ONEDATA_PROVIDERS: {get_input:input_onedata_providers }• OUTPUT_ONEDATA_PROVIDERS: {get_input:output_onedata_providers }• INPUT_ONEDATA_SPACE: {get_input:input_onedata_space }• ....• artifacts:• image:• file:indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker

Page 27: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

UC:Awebportalthat exploits abatch systemtorunapplications

• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.• Typicallywebportalimagealsohasabatchqueueserverinstalled.

• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 28: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

UCInspiration:Galaxyonthecloud

• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.

28IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 29: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

UC:Awebportalthat exploits abatchsystem torunapplications

1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.

2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata

3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata

4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA

Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch

cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon

thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 29

Page 30: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

UC:UseCaseLifecycle

• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.

• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.• Theusecaseadministratordeploysthevirtualappliance.

• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.

30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 31: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

UC:AGraphic Overview

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

Front-EndPublic IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessWebPortal

Galaxy

4)Install /Configure

WNWNWN …

VirtualElastic Cluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

31

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

Page 32: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

DetailedIaaS Status

CCR Workshop, 17/3/2016 Davide Salomoni 32

Page 33: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#nova-docker

•Docker driverforOpenStack Compute(nova).• 3rdpartycomponent,notfromINDIGO,wehavesentsomepatchesand bugfixes.• Deploymentscenario:• Installapackageinthedesiredcomputenodes.• Configurenovaandglancetobeawareofcontainers.

• ItwillbepackagedforLiberty.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 34: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#ONEDock

•Docker driverforOpenNebula.• ItrequiresaworkingOpenNebula 4installation.•Deploymentscenario:• InstallandconfigureONEDock inthefrontendnode.•ConfigurethecomputenodeswithDocker andONErequiredconfig.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 35: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#RepositorySynchronization

•SyncbetweenDocker Huborganizationandlocalregistry/catalogviawebhooks.•ModulesforbothOpenNebula andOpenStack.•StandalonecomponentwithRESTAPI.•Deploymentscenario:• Installapackageandconfigurethewebhooks.•Giveaccesstothesyncusertotheregistry/catalog.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 36: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#opie

• Preemptible InstancesextensionforOpenStack Compute(nova)•NeedsaworkingOpenStack Compute(nova)environment.• API,SchedulerandHostManager aspluggableexternalmodules.•Deploymentscenario:• Installationofapackage+a*manual*modification(i.e.applyingapatch)onthecomputeAPI.• Configurenova(i.e.nova.conf)tousethenewHostManager and• PreemtibleFilterScheduler.• ItwillbereleasedforLiberty.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 37: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#Synergy

• FairshareschedulingsupportforOpenStack.•NeedsaworkingOpenStack Compute(nova)environment.• Externalandstandalonecomponent,nomodificationsneeded.•Deploymentscenario:• Installsynergyasaseparateservice,configureittointeract withOpenStack.• ItwillbereleasedforLiberty(forthe1strelease).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 38: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#uDocker

•Runcontainersinbatchsystemsinuserspace,makingpossibleto downloadandruncontainersbynon-privilegedusers.• ItdoesnotrequireDocker atall.•Deploymentscenario:• Nodeployment.Itisasimplepythontool.

• ThishasbeentestedtorunsomecontainersinlocalHPCsystem(CSIC),withsuccess,soitmaybeusefulforEGIintheGridcontext.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 39: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#AAIforOpenStack

• TheOpenID ConnectsupporthasbeenimprovedintheKeystoneauthenticationclientlibraries(itdidnotworkoutofthebox).•Deploymentscenario:• Keystoneserverconfigurationneeded.• NothingrequiredattheCLI,changescontributedupstream(althoughnotreleasedyet).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 40: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

#AAIforOpenNebula

•BasedontheTokenTranslationService(TTS).•Deploymentscenario:• Noinstallationrequiredatthesitelevel,onlyconfiguration (similartoPERUN).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 41: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGOFAQ

• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:

• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.

• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.

• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).

• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.

• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 42: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

INDIGOFAQ

• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.

• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.

• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.

35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 43: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Conclusions

• Firstofficialreleasewillbe:1st August

• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested

• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands

43 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 44: How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 GiacintoDONVITO INDIGO Technical

Thankyou

https://www.indigo-datacloud.euBetterSoftwareforBetterScience.

44IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud