pierre riteau, chameleon lead devops engineer

29
www. chameleoncloud.org FEBRUARY 5, 2016 1 CHAMELEON: BUILDING A RECONFIGURABLE EXPERIMENTAL TESTBED FOR LARGE-SCALE CLOUD RESEARCH Pierre Riteau, Chameleon Lead DevOps Engineer [email protected] Grid’5000 Winter School 2016 February 5, 2016 Grenoble, France

Upload: hamien

Post on 13-Feb-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

FEBRUARY 5, 2016 1

CHAMELEON:BUILDINGARECONFIGURABLEEXPERIMENTALTESTBEDFORLARGE-SCALECLOUDRESEARCH

Pierre Riteau, Chameleon Lead DevOps Engineer [email protected]

Grid’5000 Winter School 2016 February 5, 2016 Grenoble, France

Page 2: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

TOAVOIDANYMISUNDERSTANDINGS

Page 3: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHAMELEONDESIGNSTRATEGY� Large-scale:“BigData,BigCompute,BigInstrumentresearch”

� ~650nodes(~14,500cores),5PBdiskovertwosites,2sitesconnectedwith100Gnetwork

� Reconfigurable:“Ascloseaspossibletohavingitinyourlab”� Baremetalreconfigura[on,operatedasasingleinstrument� Supportforrepeatableandreproducibleexperiments

� Connected:“Onestopshoppingforexperimentalneeds”� WorkloadandTraceArchive� Partnershipswithproduc[onclouds:CERN,OSDC,Rackspace,Google,andothers

� Partnershipswithusers� Complementary:“Can’tdoeverythingourselves”

� Complemen[ngGENI,Grid’5000,andotherexperimentaltestbeds� Sustainable:“Easytomaintain,easytoshare”

Page 4: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHAMELEONHARDWARE

SCUsconnecttocoreandfullyconnectedtoeachother

HeterogeneousCloudUnits

AlternateProcessorsandNetworks

SwitchStandardCloudUnit42compute4storagex10

Chicago

To UTSA, GENI, Future Partners

Aus,nChameleonCoreNetwork

100Gbpsuplinkpublicnetwork(eachsite)

CoreServices3.6PBCentralFileSystems,FrontEndandDataMovers

CoreServicesFrontEndandData

MoverNodes 504x86ComputeServers48Dist.StorageServers102HeterogeneousServers16MgmtandStorageNodes

SwitchStandardCloudUnit42compute4storagex2

Page 5: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHAMELEONHARDWARE� StandardCloudUnits(SCU)(deployed)

� Eachofthe12StandardCloudUnitsisasingle48Urack� 42DellR630computeservers,eachwithdual-socketIntelXeon(Haswell)processors(12cores,24threads)and128GBofRAM

� 4DellFX2storageservers,eachwithaconnectedJBODarrayof162TBdrives(totalof128TBperSCU),2x10cores,and64GBofRAM

� Alloca[onscanbeanen[reSCU,mul[pleSCUs,orwithinasingleSCU,oracrossSCUs(e.g.,storageserversforHadoopconfigura[ons)

� 48portForce10S6000OpenFlow-enabledswitches10Gbtohosts,40GbuplinkstoChameleoncorenetwork

� Connectx3InfinibandnetworkinonerackatTACC� Sharedinfrastructure(deployed)

� 3.6PBglobalstorage,100GbInternetconnec[onbetweensites� HeterogeneousCloudUnits(tobeprocuredinY2)

� ARMmicroservers,Atommicroservers,SSDs,GPUs,FPGAs

Page 6: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CAPABILITIESANDSUPPORTEDRESEARCH

Virtualiza[ontechnology(e.g.,SR-IOV,accelerators),systems,networking,infrastructure-levelresourcemanagement,etc.

Repeatableexperimentsinnewmodels,algorithms,plaeorms,auto-scaling,high-availability,cloudfedera[on,etc.

Developmentofnewmodels,algorithms,plaeorms,auto-scalingHA,etc.,innova[veapplica[onandeduca[onaluses

Isolatedpar,,on,fullbaremetalreconfigura,on

Isolatedpar,,on,ChameleonAppliances

Persistent,reliable,sharedclouds

Page 7: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

IMPLEMENTINGTHEEXPERIMENTALWORKFLOW

discover resources

provision resources

configure and interact monitor

- Fine-grained - Complete - Up-to-date - Versioned - Verifiable

- Advance reservations & on-demand - Fine-grained allocations - Isolation

- Bare metal - Deeply reconfigurable - Multiple appliances to a lease - Snapshotting - Complex Appliances

- Hardware metrics - Fine-grained information - Aggregate and archive

Page 8: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

Page 9: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

BUILDINGATESTBEDFROMSCRATCH

� Requirements(proposalstage)� Architecture(projectstart)� TechnologyEvalua[onandRiskAnalysis

� Manyop[ons:G5K,Nimbus,LosF,OpenStack� Sustainabilityasdesigncriterion:canaCStestbedbebuiltfromcommoditycomponents?

� Technologyevalua[on:Grid’5000andOpenStack� Architecture-basedanalysisandimplementa[onproposals

� CHI=OpenStack+Grid’5000+specialsauce

Page 10: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHI:DISCOVERINGANDVERIFYINGRESOURCES� Fine-grained,up-to-date,andcompleterepresenta[on� Bothmachineparsableanduserfriendlyrepresenta[ons� Testbedversioning

� “WhatwasthedriveonthenodesIused6monthsago?”� Dynamicallyverifiable

� Doesrealitycorrespondtodescrip[on?(e.g.,failurehandling)� Grid’5000registrytoolkit+ChameleonportalUI

� Automatedresourcedescrip[on,automatedexporttoRM/Blazar� g5k-checks(renamedcc-checksforconsistency)

� Canberunamerboot,acquiresinforma[onandcomparesitwithresourcecatalogdescrip[on

Page 11: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

v1

Page 12: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

v1

v2

Page 13: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHI:PROVISIONINGRESOURCES� Resourceleases� Advancereserva[ons(AR)andon-demand

� ARfacilitatesalloca[ngatlargescale� Fine-grainalloca[onofarangeofresources

� Differentnodetypes,switches,etc.� Isola[onbetweenexperiments� Futureextensions:matchmaking,testbedalloca[onmanagement

� OpenStackNova/Blazar,contribu[onstoBlazar� ExtensionstosupportGanochartdisplaysandotherfeatures

Page 14: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHI:CONFIGUREANDINTERACT� BareMetal� Allowdeepreconfigurability(accesstoconsole)� Mapmul[pleappliancestoalease� Snapshopngforimagesharing� Efficientappliancedeployment� Handlecomplexappliances

� Virtualclusters,cloudinstalla[ons,etc.� Interact:shapeexperimentalcondi[ons

� OpenStackIronic,Glance,anduser-data/meta-data

Page 15: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHI:INSTRUMENTATIONANDMONITORING

� Enablesuserstounderstandwhathappensduringtheexperiment

� Instrumenta[on:high-resolu[onmetrics� Typesofmonitoring:

� Infrastructuremonitoring(e.g.,PDUs)� Userresourcemonitoring� Customusermetrics

� Aggrega[onandArchival� Easilyexportdataforspecificexperiments

� OpenStackCeilometer+custommetrics

Page 16: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHI:OVERALLARCHITECTURE

Portal Identity

Management Resource discovery

Grid’5000 Reference

API

Reservation Service (Blazar)

Horizon

Keystone

Nova

Ironic

Neutron

Ceilometer

Glance

special sauce

Custom development

OpenStack

Page 17: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

HOWDOESITWORKINTERNALLY?Chameleon

user Blazar

R1 R2 Reservations

Reserve resources

Nova

P1 P2 Resource pools

freepool

Create dedicated resource pool

(host aggregate)

Page 18: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

HOWDOESITWORKINTERNALLY?Chameleon

user Blazar

R1 R2 Reservations

Reserve resources

Nova

P1 P2 Resource pools

freepool

Create dedicated resource pool

(host aggregate)

Launch bare-metal instances in reservation

Ironic

Schedule then request bare-metal

deployment

Cluster Control & provision (IPMI / PXE / iSCSI)

Page 19: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

DEVELOPEDINTHEOPEN

� hops://github.com/ChameleonCloud

� OpenStackpatches,Grid’5000g5k-checkspatches� Userportal,resourcediscovery,Horizonextensions� Testbedconfigura[onwithPuppet(notyetopen)

� AimistoprovideaChameleon-in-a-box!

Page 20: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHAMELEONTIMELINEANDSTATUS� 10/2014:Projectstarts� 12/2014:FutureGrid@Chameleon(OpenStackKVM)� 04/2015:ChameleonTechnologyPreviewonFutureGridhardware

� 06/2015:ChameleonEarlyUseronnewhardware� 07/2015:ChameleonPublicavailability(baremetal)� 09/2015:ChameleonKVMOpenStackcloudavailable� 10/2015:InteroperabilitywithGENI(1stphase)� Today:600+users/150+projects� 2016:Heterogeneoushardwareavailable

Page 21: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

INTHEPIPELINE…� Y1themewas“makingthingspossible”:focusoninfrastructure� Y2themeis“frompossibletoeasy”:focusonusers� Outreach:webinars,tutorials,userstories� Experimentmanagement

� Appliances:snapshopng,sharing,appliancemarketplace,community� ExperimentBlueprint:automa[onandpreserva[on

� Func[onality:frompossibletoeasy� Beoerreconfigura[oncapabili[es� Beoernetworkingcapabili[es� Beoerinfrastructuremonitoring(PDUs,etc.)� Andothers

Page 22: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

Page 23: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

OPENSTACK:LESSONSLEARNED

�  Opera[ngOpenStackcanbedifficult�  Forgetabouttradi[onalUNIXadmin:evenbaremetalneedsOVSandIPnamespaces�  Thousandsofconfigura[onswitches,manywithlioledocumenta[on�  Mustreadthecode!�  Inter-dependentcomponentsèchecksalllogswithdebugenabled

�  UpstreamdevelopmentmostlydoneonKVM�  Lesstes[ngofIronicèbugs

�  Lotsofexperimentalprojectswithlioleupstreamsupport�  WewereluckyascommunityinterestedinrevivingBlazar

�  Donotputtoomuchhopeinblueprints�  Manyabandonedordelayedformul[plereleases

� Wheretofindhelpandpossiblefixes?�  bugs.launchpad.net(bugreports)/review.openstack.org(patches)�  MostdevelopersavailableonIRC

Page 24: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

VIRTUALIZATIONORCONTAINERIZATION?

� YuyuZhou,UniversityofPiosburgh� Research:lightweightvirtualiza[on� Testbedrequirements:

� Baremetalreconfigura[on� Bootfromcustomkernel� Consoleaccess� Up-to-datehardware� Largescaleexperiments

SC15 Poster: “Comparison of Virtualization and Containerization Techniques for HPC”

Page 25: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

TEACHINGCLOUDCOMPUTING� NiravMerchantandEricLyons,UniversityofArizona

� ACIC2015:project-basedlearningcourse�  Dataminingtofindexoplanets�  ScaledanalysispipelinebyJaredMales�  DevelopaVM/workflowmanagement

applianceandbestprac[cethatcanbesharedwithbroadercommunity

� Testbedrequirements:�  EasytouseIaaS/KVMinstalla[on� Minimalstartup[me�  Supportdistributedworkers�  Blockstore:makecopiesofmany100GB

datasets

Page 26: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

DEFENDINGCOMPUTINGRESOURCES� LedbyJessieWalker,UniversityofArkansasatPineBluff

� Workingondetec[ngcyberaoacks� Modelandvisualizemul[-stage

intrusionaoacks(MAS)�  CreatecustomSnortrulestomonitor

trafficanddetectaoacks� Complexandexpensivetobuyandusetheirownhardware

� Limitedbypermissionsneededtoruncybersecurityaoacksinsidecampuses

� Testbedrequirements:�  Virtualmachinestosimulateaoacksin

thecloudandrunintrusiondetec[onsystems

Page 27: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

PARTINGTHOUGHTS� FromvisiontorealitywithExpressDelivery

� Builtfromscratchwithinayearonashoestring� Thankstoexperiencefromothertestbeds,esp.Grid’5000

� Thankstoopen-sourcecodefromotherprojects,esp.OpenStackandGrid’5000

� Opera[onaltestbed:600+users/150+projects� Federa[on

� OngoingeffortswithGENI� Grid’5000too?

Page 28: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

CHAMELEONTEAMKate Keahey

Chameleon PI Science Director

Architect University of Chicago

Joe Mambretti Programmable networks Federation activities Northwestern University

Dan Stanzione Facilities Director

TACC

Pierre Riteau DevOps Lead University of Chicago

Paul Rad Industry Liaison

Education and training UTSA

DK Panda High-perf networking Ohio State University

Page 29: Pierre Riteau, Chameleon Lead DevOps Engineer

www. chameleoncloud.org

COMEANDWORKWITHUS!

� Asacollaborator� Generalizingresults:whatwouldKameleonorDISTEMlooklikeintheChameleoncontext?

� AlsoprojectsinresourcemanagementforHPC&Cloud,elas[cscalingplaeorm

� Summerinternshipopportuni[es

� Asaco-worker� Programmingpostdocorresearchingprogrammer