exogeni federated private niaas - internet2meetings.internet2.edu/media/medialibrary/2015/04/... ·...

Post on 08-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ExoGENI Federated Private NIaaS Infrastructure!

Chris Heermann!ckh@renci.org!

Overview!

•  ExoGENI architecture and implementation"•  ExoGENI Science use-cases"

– Urgent Computing: Storm Surge Predictions on GENI

– ScienceDMZ as a Service: Creating Science Super-Facilities with GENI

•  Support for SDN in ExoGENI"

2!

IaaS: clouds and network virtualization

   

   

Cloud  Providers  

Virtual  Compute  and    Storage  Infrastructure  

Breakable Experimental Network

Transport  Network  Providers  

Cloud  APIs  (Amazon  EC2  ..)   Dynamic  circuit  APIs  (NLR  Sherpa,  DOE  OSCARS,  I2  ION,  OGF  NSI  …)  

Virtual  Network  Infrastructure  

controller  

•  ORCA is a “wrapper” for off-the-shelf cloud and circuit nets etc., enabling federated orchestration: + Resource brokering + VM image distribution +  Topology embedding + Stitching +  Federated Authorization

•  GENI, DOE, NSF SDCI+TC •  http://geni-orca.renci.org •  http://networkedclouds.org

Open  Resource  Control  Architecture  

B  

SM          

AM  

aggregate  

coordinator  

The APIs!

•  Simple API, complex description language"–  createSlice(sliceName, Term, SliceTopology, Credentials)"

•  Topology management"–  deleteSlice(sliceName)"–  sliceStatus(sliceName)"

•  Debugging"–  modifySlice(sliceName, TopologyUpdate)"

•  Elasticity"–  extendSlice(sliceName, NewTerm)"

•  Agility"•  Description language:"

–  NDL-OWL – OWL-based ontology that describes"•  Participating in US-EU effort to standardize the IaaS ontology"

–  User: Resource requests"–  Provider: Resource description, public resource advertisement, manifest"

5!

GENI Federation!•  Federated identity"

–  InCommon "–  X.509 identity certificates"

•  Common APIs"–  Aggregate Manager"

•  ExoGENI has a compatibility API layer supporting AM API v2 "–  Clearinghouse"

•  Federated access policies"–  ABAC"

•  Agreed upon resource description language"–  RSpec"

•  ExoGENI translates relevant portions from NDL-OWL to RSpec and back as needed"•  Several major portions"

–  ExoGENI, InstaGENI, WiMax, Internet2 AL2S"•  Federation with EU"

–  Amsterdam XO rack part of SDX demo at GEC21 with iMinds"

6!

Virtual  network  exchange  

Virtual  colo  campus  net  to  circuit  

fabric  

Cloud  hosts  with  network  control  

Building network topologies

Computed  embedding  

Slice  owner  may  deploy  an  IP  network  into  a  slice  (OSPF).  

OpenFlow-­‐enabled  L2  topology  

slice  

ExoGENI •  Every Infrastructure as a Service, All Connected.

–  Substrate may be volunteered or rented. –  E.g., public or private clouds, HPC, instruments and transport

providers –  Contribution size is dynamically adjustable

•  ExoGENI Principles: –  Open substrate –  Off-the-shelf back-ends

•  OSCARS, NSI, EC2 etc. –  Provider autonomy –  Federated coordination –  Dynamic contracts –  Resource visibility

Breakable Experimental Network

Current topology!

9!

An ExoGENI cloud “rack site”

Management switch

OpenFlow-enabled L2 switch

Sliverable Storage

2x10Gbps

dataplane links

4x1G

bps

man

agem

ent

and

iSCSI

st

orag

e link

s (b

onde

d)

To campus Layer 3network

Dataplane to dynamic circuit backbone (10/40/100Gbps)

Static VLAN tunnels provisioned

to the backbone

Worker nodeWorker nodeWorker nodeWorker nodeWorker nodeWorker nodeWorker nodeWorker nodeWorker nodeWorker node

Management node

(optional)Dataplane to campus

network for stitchable VLANs

Direct L2 Peeringw/ the backbone

option 1:tunnels

option 2:fiber uplink

ExoGENI software structure

Current deployments!

•  xCAT"–  Operator node provisioning"–  User-initiated bare-metal provisioning"

•  OpenStack Essex++ (RedHat/CentOS version)"–  Custom Quantum plugin to support multiple dataplanes"–  Working on Juno port"

•  iSCSI user slivering"–  IBM DS3512 appliance"

•  NetApp iSCSI support in the works"–  Linux iSCSI stack"

•  Backend support for LVM, Gluster, ZFS"

12!

Tools!

•  ORCA Native tools (native APIs, resource descriptions)"–  Flukes"–  More flexibility"

•  Federation tools (federation APIs, resource descriptions)"–  Jacks, omni, jFed"–  Compatibility"

13!

Tools (continued)!

Presentation title goes here" 14!

ExoGENI – a federation of private clouds!

•  Each site is a micro-cloud"–  Adding support for HPC batch schedulers"

•  Owners decide what portion of resources to contribute"

•  Free to continue using native IaaS interfaces"•  Have the opportunity to take advantage of

federated identity and inter-provide orchestration mechanisms"

•  What is it good for?"–  Foundation for future science institutional collaborative CI"

15!

!ExoGENI Science Use-cases!

Presentation title goes here" 16!

Computing Storm Surge!•  ADCIRC Storm Surge Model"

–  FEMA-approved for Coastal Flood Insurance Studies "–  Very high spatial resolution (millions of triangles)"–  Typically use 256-1024 cores for real-time (one simulation!) "

ADCIRC  grid  for  coastal  North  Carolina  

Tackling Uncertainty!

Research  Ensemble  NSF  Hazards  SEES  project  

22  members,  H.  Floyd  (1999)  

One  simulaJon  is  NOT  enough!  ProbabilisJc  Assessment  of  Hurricanes  

A  “few”  likely  hurricanes  Fully  dynamic  atmosphere  (WRF)  

Why GENI?!•  Current limitations: Real-time demands for compute resource"

–  Large demands for real-time compute resources during storms"–  Not enough demand to dedicate a cluster year-round"

•  GENI enables"–  Federation of resources"–  Cloud bursting, urgent, on-demand"–  High-speed data transfers to/from/between remote resources"–  Replicate data/compute across geographic areas"

•  Resiliency, performance"

Storm Surge Workflow!

Ensemble  Scheduler  

Collector  

•  Whole  workflow  is  22  ensemble  members  •  Pegasus  workflow  management  system    

…  

Slice Topology!

•  11  GENI  sites  (1  ensemble  manager,  10  compute  sites)  •  Topology:  92  VMs  (368  cores),  10  inter-­‐domain  VLANs,  1  TB  iSCSI  storage  •  HPC  compute  nodes:    80  compute  nodes  (320  cores)  from  10  sites    

Representative Science DMZ!

Dedicated vs. Virtual resources!•  GENI provides a distributed software-defined infrastructure

(SDI)"–  Compute + Storage + Network"

Emerging Trend: Super Facilities, Coupled by Networks!

Experimental  faciliJes  are  being  transformed  by  new  detectors,  advanced  mathemaJcs,  roboJcs,  automaJon,  advanced  networks.    

Today’s Demonstration:Real-time data processing and vis. workflow!

h_p://portal.nersc.gov/project/als/sc14/  

Data  from  ALS  Experiment  

SPADE  instance  @  Server  at  Argonne  

ExoGENI  SPADE  VM  @  Starlight,  Chicago  

ESnet  

ExoGENI  SPADE  VM  @  Oakland,  California  

Compute  Cluster    NERSC,  LBL  

AL2S,  ESnet  

•  WAN-­‐opJmized  data  transfer  nodes  and  a  network  slice  created  programmaJcally  (Science  DMZ  as  a  service)  

•  ApplicaJon  workflow  instanJated  to  stage  data  at  the  GENI  rack  on  Science  DMZ  slice  

•  Data  is  moved  opJmally  across  the  WAN1  

1  Earlier  work,  like  Phoebus,  have  instandated  the  value  of  this  approach  

Dedicated vs. Virtual resources!•  GENI provides a distributed software-defined infrastructure

(SDI)"–  Compute + Storage + Network"

•  GENI racks may be deployed on-campus or in provider networks close to the campus"

•  ‘Science DMZ as a service’ "–  Applications can provision a virtual ‘Science DMZ’ as and when

needed"Programmable  infrastructure  to  enable  end-­‐users  to  create  dynamic  ‘fricJon-­‐free’  

infrastructures  without  advanced  knowledge/training  

Microtomography of High Temperature Materials under stress!

Set  collected  by  materials  sciendst  Rob  Ritchie,  LBNL/UCB  

What constitutes programmable network behavior? (i.e. what is SDN?)!

•  Control over virtual topology"–  Link in one layer is

represented by a path in another"

•  Control over packet forwarding"–  Making decisions about

which interface a packet/frame should be placed"

•  Queue management and arbitration"

–  Defining packet queues and associated service and scheduling policies "

Layer 1/2/3 VPNs via explicit signaling (MPLS, GMPLS)"

Bandwidth-on-demand services (OSCARS, NSI)"

FlowVisor"

OpenFlow 1.0, Nicira OpenVSwitch, Cisco ONE, OpenDaylight, Juniper Contrail"

Numerous vendor-proprietary APIs,"OpenFlow 1.3"

28!

ExoGENI and OpenFlow (now)!

•  OpenFlow experiments using embedded topologies with OVS spanning one or more sites"–  e.g. HotSDN ‘14 “A Resource

Delegation Framework for Software Defined Networks” Baldin, Huang, Gopidi"

•  Experiments with OF 1.0 in rack switches"–  Described in ExoBlog

(www.exogeni.net)"

29!

ExoGENI and OpenFlow (near future)!

•  OpenFlow service on BEN (ben.renci.org)"–  40G wave using Juniper EX switches"– FSFW, OF 1.0, multiple controllers"– Topology embedding/VNE for ExoGENI, path service

for other projects."•  Slice on AL2S with own controller"

– Topology embedding for ExoGENI, value-add experimenter services with ExoGENI resources"

•  Application-specific topology embedding"

30!

Where are we going?!•  More sites"

–  Georgia Tech [Atlanta, GA], PUCP [Lima, Peru], Ciena [Hanover, MD]"•  Updated OpenStack"•  Better compute isolation"

–  Take NUMA into account for placement decisions"•  Better storage isolation"

–  Provision storage VLANs/channels with QoS properties to provide predictable performance"

•  Better network isolation and performance"–  Enable SR-IOV "

•  More complex topology management/embedding"–  Fully dynamic slices"

•  More diverse substrates"–  Integration with batch schedulers (SLURM)"–  VMWare, other cloud stacks"–  Public clouds"

31!

Thank you!!

•  http://www.exogeni.net""

32!

top related