pierre riteau, chameleon lead devops engineer

Post on 13-Feb-2017

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www. chameleoncloud.org

FEBRUARY 5, 2016 1

CHAMELEON:BUILDINGARECONFIGURABLEEXPERIMENTALTESTBEDFORLARGE-SCALECLOUDRESEARCH

Pierre Riteau, Chameleon Lead DevOps Engineer priteau@uchicago.edu

Grid’5000 Winter School 2016 February 5, 2016 Grenoble, France

www. chameleoncloud.org

TOAVOIDANYMISUNDERSTANDINGS

www. chameleoncloud.org

CHAMELEONDESIGNSTRATEGY� Large-scale:“BigData,BigCompute,BigInstrumentresearch”

� ~650nodes(~14,500cores),5PBdiskovertwosites,2sitesconnectedwith100Gnetwork

� Reconfigurable:“Ascloseaspossibletohavingitinyourlab”� Baremetalreconfigura[on,operatedasasingleinstrument� Supportforrepeatableandreproducibleexperiments

� Connected:“Onestopshoppingforexperimentalneeds”� WorkloadandTraceArchive� Partnershipswithproduc[onclouds:CERN,OSDC,Rackspace,Google,andothers

� Partnershipswithusers� Complementary:“Can’tdoeverythingourselves”

� Complemen[ngGENI,Grid’5000,andotherexperimentaltestbeds� Sustainable:“Easytomaintain,easytoshare”

www. chameleoncloud.org

CHAMELEONHARDWARE

SCUsconnecttocoreandfullyconnectedtoeachother

HeterogeneousCloudUnits

AlternateProcessorsandNetworks

SwitchStandardCloudUnit42compute4storagex10

Chicago

To UTSA, GENI, Future Partners

Aus,nChameleonCoreNetwork

100Gbpsuplinkpublicnetwork(eachsite)

CoreServices3.6PBCentralFileSystems,FrontEndandDataMovers

CoreServicesFrontEndandData

MoverNodes 504x86ComputeServers48Dist.StorageServers102HeterogeneousServers16MgmtandStorageNodes

SwitchStandardCloudUnit42compute4storagex2

www. chameleoncloud.org

CHAMELEONHARDWARE� StandardCloudUnits(SCU)(deployed)

� Eachofthe12StandardCloudUnitsisasingle48Urack� 42DellR630computeservers,eachwithdual-socketIntelXeon(Haswell)processors(12cores,24threads)and128GBofRAM

� 4DellFX2storageservers,eachwithaconnectedJBODarrayof162TBdrives(totalof128TBperSCU),2x10cores,and64GBofRAM

� Alloca[onscanbeanen[reSCU,mul[pleSCUs,orwithinasingleSCU,oracrossSCUs(e.g.,storageserversforHadoopconfigura[ons)

� 48portForce10S6000OpenFlow-enabledswitches10Gbtohosts,40GbuplinkstoChameleoncorenetwork

� Connectx3InfinibandnetworkinonerackatTACC� Sharedinfrastructure(deployed)

� 3.6PBglobalstorage,100GbInternetconnec[onbetweensites� HeterogeneousCloudUnits(tobeprocuredinY2)

� ARMmicroservers,Atommicroservers,SSDs,GPUs,FPGAs

www. chameleoncloud.org

CAPABILITIESANDSUPPORTEDRESEARCH

Virtualiza[ontechnology(e.g.,SR-IOV,accelerators),systems,networking,infrastructure-levelresourcemanagement,etc.

Repeatableexperimentsinnewmodels,algorithms,plaeorms,auto-scaling,high-availability,cloudfedera[on,etc.

Developmentofnewmodels,algorithms,plaeorms,auto-scalingHA,etc.,innova[veapplica[onandeduca[onaluses

Isolatedpar,,on,fullbaremetalreconfigura,on

Isolatedpar,,on,ChameleonAppliances

Persistent,reliable,sharedclouds

www. chameleoncloud.org

IMPLEMENTINGTHEEXPERIMENTALWORKFLOW

discover resources

provision resources

configure and interact monitor

- Fine-grained - Complete - Up-to-date - Versioned - Verifiable

- Advance reservations & on-demand - Fine-grained allocations - Isolation

- Bare metal - Deeply reconfigurable - Multiple appliances to a lease - Snapshotting - Complex Appliances

- Hardware metrics - Fine-grained information - Aggregate and archive

www. chameleoncloud.org

www. chameleoncloud.org

BUILDINGATESTBEDFROMSCRATCH

� Requirements(proposalstage)� Architecture(projectstart)� TechnologyEvalua[onandRiskAnalysis

� Manyop[ons:G5K,Nimbus,LosF,OpenStack� Sustainabilityasdesigncriterion:canaCStestbedbebuiltfromcommoditycomponents?

� Technologyevalua[on:Grid’5000andOpenStack� Architecture-basedanalysisandimplementa[onproposals

� CHI=OpenStack+Grid’5000+specialsauce

www. chameleoncloud.org

CHI:DISCOVERINGANDVERIFYINGRESOURCES� Fine-grained,up-to-date,andcompleterepresenta[on� Bothmachineparsableanduserfriendlyrepresenta[ons� Testbedversioning

� “WhatwasthedriveonthenodesIused6monthsago?”� Dynamicallyverifiable

� Doesrealitycorrespondtodescrip[on?(e.g.,failurehandling)� Grid’5000registrytoolkit+ChameleonportalUI

� Automatedresourcedescrip[on,automatedexporttoRM/Blazar� g5k-checks(renamedcc-checksforconsistency)

� Canberunamerboot,acquiresinforma[onandcomparesitwithresourcecatalogdescrip[on

www. chameleoncloud.org

v1

www. chameleoncloud.org

v1

v2

www. chameleoncloud.org

CHI:PROVISIONINGRESOURCES� Resourceleases� Advancereserva[ons(AR)andon-demand

� ARfacilitatesalloca[ngatlargescale� Fine-grainalloca[onofarangeofresources

� Differentnodetypes,switches,etc.� Isola[onbetweenexperiments� Futureextensions:matchmaking,testbedalloca[onmanagement

� OpenStackNova/Blazar,contribu[onstoBlazar� ExtensionstosupportGanochartdisplaysandotherfeatures

www. chameleoncloud.org

CHI:CONFIGUREANDINTERACT� BareMetal� Allowdeepreconfigurability(accesstoconsole)� Mapmul[pleappliancestoalease� Snapshopngforimagesharing� Efficientappliancedeployment� Handlecomplexappliances

� Virtualclusters,cloudinstalla[ons,etc.� Interact:shapeexperimentalcondi[ons

� OpenStackIronic,Glance,anduser-data/meta-data

www. chameleoncloud.org

CHI:INSTRUMENTATIONANDMONITORING

� Enablesuserstounderstandwhathappensduringtheexperiment

� Instrumenta[on:high-resolu[onmetrics� Typesofmonitoring:

� Infrastructuremonitoring(e.g.,PDUs)� Userresourcemonitoring� Customusermetrics

� Aggrega[onandArchival� Easilyexportdataforspecificexperiments

� OpenStackCeilometer+custommetrics

www. chameleoncloud.org

CHI:OVERALLARCHITECTURE

Portal Identity

Management Resource discovery

Grid’5000 Reference

API

Reservation Service (Blazar)

Horizon

Keystone

Nova

Ironic

Neutron

Ceilometer

Glance

special sauce

Custom development

OpenStack

www. chameleoncloud.org

HOWDOESITWORKINTERNALLY?Chameleon

user Blazar

R1 R2 Reservations

Reserve resources

Nova

P1 P2 Resource pools

freepool

Create dedicated resource pool

(host aggregate)

www. chameleoncloud.org

HOWDOESITWORKINTERNALLY?Chameleon

user Blazar

R1 R2 Reservations

Reserve resources

Nova

P1 P2 Resource pools

freepool

Create dedicated resource pool

(host aggregate)

Launch bare-metal instances in reservation

Ironic

Schedule then request bare-metal

deployment

Cluster Control & provision (IPMI / PXE / iSCSI)

www. chameleoncloud.org

DEVELOPEDINTHEOPEN

� hops://github.com/ChameleonCloud

� OpenStackpatches,Grid’5000g5k-checkspatches� Userportal,resourcediscovery,Horizonextensions� Testbedconfigura[onwithPuppet(notyetopen)

� AimistoprovideaChameleon-in-a-box!

www. chameleoncloud.org

CHAMELEONTIMELINEANDSTATUS� 10/2014:Projectstarts� 12/2014:FutureGrid@Chameleon(OpenStackKVM)� 04/2015:ChameleonTechnologyPreviewonFutureGridhardware

� 06/2015:ChameleonEarlyUseronnewhardware� 07/2015:ChameleonPublicavailability(baremetal)� 09/2015:ChameleonKVMOpenStackcloudavailable� 10/2015:InteroperabilitywithGENI(1stphase)� Today:600+users/150+projects� 2016:Heterogeneoushardwareavailable

www. chameleoncloud.org

INTHEPIPELINE…� Y1themewas“makingthingspossible”:focusoninfrastructure� Y2themeis“frompossibletoeasy”:focusonusers� Outreach:webinars,tutorials,userstories� Experimentmanagement

� Appliances:snapshopng,sharing,appliancemarketplace,community� ExperimentBlueprint:automa[onandpreserva[on

� Func[onality:frompossibletoeasy� Beoerreconfigura[oncapabili[es� Beoernetworkingcapabili[es� Beoerinfrastructuremonitoring(PDUs,etc.)� Andothers

www. chameleoncloud.org

www. chameleoncloud.org

OPENSTACK:LESSONSLEARNED

�  Opera[ngOpenStackcanbedifficult�  Forgetabouttradi[onalUNIXadmin:evenbaremetalneedsOVSandIPnamespaces�  Thousandsofconfigura[onswitches,manywithlioledocumenta[on�  Mustreadthecode!�  Inter-dependentcomponentsèchecksalllogswithdebugenabled

�  UpstreamdevelopmentmostlydoneonKVM�  Lesstes[ngofIronicèbugs

�  Lotsofexperimentalprojectswithlioleupstreamsupport�  WewereluckyascommunityinterestedinrevivingBlazar

�  Donotputtoomuchhopeinblueprints�  Manyabandonedordelayedformul[plereleases

� Wheretofindhelpandpossiblefixes?�  bugs.launchpad.net(bugreports)/review.openstack.org(patches)�  MostdevelopersavailableonIRC

www. chameleoncloud.org

VIRTUALIZATIONORCONTAINERIZATION?

� YuyuZhou,UniversityofPiosburgh� Research:lightweightvirtualiza[on� Testbedrequirements:

� Baremetalreconfigura[on� Bootfromcustomkernel� Consoleaccess� Up-to-datehardware� Largescaleexperiments

SC15 Poster: “Comparison of Virtualization and Containerization Techniques for HPC”

www. chameleoncloud.org

TEACHINGCLOUDCOMPUTING� NiravMerchantandEricLyons,UniversityofArizona

� ACIC2015:project-basedlearningcourse�  Dataminingtofindexoplanets�  ScaledanalysispipelinebyJaredMales�  DevelopaVM/workflowmanagement

applianceandbestprac[cethatcanbesharedwithbroadercommunity

� Testbedrequirements:�  EasytouseIaaS/KVMinstalla[on� Minimalstartup[me�  Supportdistributedworkers�  Blockstore:makecopiesofmany100GB

datasets

www. chameleoncloud.org

DEFENDINGCOMPUTINGRESOURCES� LedbyJessieWalker,UniversityofArkansasatPineBluff

� Workingondetec[ngcyberaoacks� Modelandvisualizemul[-stage

intrusionaoacks(MAS)�  CreatecustomSnortrulestomonitor

trafficanddetectaoacks� Complexandexpensivetobuyandusetheirownhardware

� Limitedbypermissionsneededtoruncybersecurityaoacksinsidecampuses

� Testbedrequirements:�  Virtualmachinestosimulateaoacksin

thecloudandrunintrusiondetec[onsystems

www. chameleoncloud.org

PARTINGTHOUGHTS� FromvisiontorealitywithExpressDelivery

� Builtfromscratchwithinayearonashoestring� Thankstoexperiencefromothertestbeds,esp.Grid’5000

� Thankstoopen-sourcecodefromotherprojects,esp.OpenStackandGrid’5000

� Opera[onaltestbed:600+users/150+projects� Federa[on

� OngoingeffortswithGENI� Grid’5000too?

www. chameleoncloud.org

CHAMELEONTEAMKate Keahey

Chameleon PI Science Director

Architect University of Chicago

Joe Mambretti Programmable networks Federation activities Northwestern University

Dan Stanzione Facilities Director

TACC

Pierre Riteau DevOps Lead University of Chicago

Paul Rad Industry Liaison

Education and training UTSA

DK Panda High-perf networking Ohio State University

www. chameleoncloud.org

COMEANDWORKWITHUS!

� Asacollaborator� Generalizingresults:whatwouldKameleonorDISTEMlooklikeintheChameleoncontext?

� AlsoprojectsinresourcemanagementforHPC&Cloud,elas[cscalingplaeorm

� Summerinternshipopportuni[es

� Asaco-worker� Programmingpostdocorresearchingprogrammer

top related