cloud service level agreements and cloud federation

31
© The Aerospace Corporation 2014 Cloud Service Level Agreements and Cloud Federation Dr. Craig A. Lee, Senior Scientist, [email protected] The Aerospace Corporation GSAW, February 26, 2014

Upload: others

Post on 11-Sep-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Service Level Agreements and Cloud Federation

© The Aerospace Corporation 2014

Cloud Service Level Agreements

and

Cloud Federation

Dr. Craig A. Lee, Senior Scientist, [email protected]

The Aerospace Corporation

GSAW, February 26, 2014

Page 2: Cloud Service Level Agreements and Cloud Federation

Introduction

• NIST Definition of Cloud Computing – Cloud computing is a model for enabling ubiquitous, convenient, on-

demand network access to a shared pool of configurable computing

resources (e.g., networks, servers, storage, applications, and services)

that can be rapidly provisioned and released with minimal management

effort or service provider interaction.

• Potential Benefits – Improved mission-effectiveness

• Improved reliability using on-demand resources to recover after failure

• Surge capacity provided using rapid, on-demand elasticity

• Improved access to data sets, services, and other resources

– Improved cost-effectiveness

• Economies of scale achieved through consolidation

• Generic hosting environment provided for many missions

• Reduced power, space, cooling, physical infrastructure requirements

2

Page 3: Cloud Service Level Agreements and Cloud Federation

A Cloud-Based Reference Model A

SP

IS

P

3

Page 4: Cloud Service Level Agreements and Cloud Federation

Issues and Observations

• How can we manage sets of applications that will have

differing resource and performance requirements?

– How to ensure that mission goals are met (e.g., throughput), while ensuring

that aggregate cloud requirements are met (e.g., overall server utilization)?

– Individual and aggregate requirements could be competing and conflicting

– Service Level Agreements (SLAs) are commonly considered to address the

issue of individual requirements

• How to Manage Security and Governance across

Organizations

– Distributed Infrastructures and Organizations

– Federated Authentication & Authorization Are Critical

– Virtual Organizations (VOs) have been used to address this issue

4

Page 5: Cloud Service Level Agreements and Cloud Federation

More About SLAs

• Some applications will be performance-critical or performance-sensitive

– "Best effort" cloud resources may not suffice to meet app requirements

• Some applications will have dynamic requirements

– Some apps will have varying demands – surge

– Some apps may surge at unpredictable times

• Previously addressed by over-provisioning with dedicated hardware

– Dedicated system was sized for the worst-case, rather than the average case

– Drove entire system cost

• This is antithetical to cloud computing

– Multi-tenant environment where utilization and costs can be better managed

• Hence, the goal is to provide the user with a reasonable expectation that

performance requirements will be met, through mechanisms that are

reasonable for the provider to implement and support for multiple apps

• Dynamic, machine-enforceable SLAs

– These are not simply "contractual" SLAs

– These are services that a provider may provide and a user may use

5

Page 6: Cloud Service Level Agreements and Cloud Federation

Basic SLA Functions – an Autonomic Control Cycle

• Admission Control

– Mapping of app-level requirements to infrastructure-level metrics

– WS-Agreement and WS-Agreement Negotiation

– Term language needed

• Monitoring - Metrics Collection

– Where: host OS/hypervisor, guest OS, application-level

– When: upstream vs. downstream

• SLA Evaluation

– Hysteresis

– Statistical methods, e.g., Median Absolute Deviation, Interquartile

Range, Iterative Local Regression

• SLA Enforcement -- Violation Response

– Throttling

– Load migration – both process and VM

– On-demand resources

– SLA re-negotiation

6

Page 7: Cloud Service Level Agreements and Cloud Federation

A General SLA Architecture

7

Page 8: Cloud Service Level Agreements and Cloud Federation

Distributed Monitoring of

Different Mission Components

8

Page 9: Cloud Service Level Agreements and Cloud Federation

Type I Best Effort

Type II Strict

Throttling

Type III Live Migration

Type IIIa

Type IIIa

Type IIIb

Type IIIb

additional surge

capacity

w/ spare capacity

Type V Idle

Type III Live Migration

w/ back-fill

Type IV Preemptible

Spare Cap.

w/

preemptible

back-fill

Server Load Types

9

Page 10: Cloud Service Level Agreements and Cloud Federation

Migration Overhead Metrics & Policies

• Overhead Metrics

– Memory footprint

– File descriptors-- network/disk connections

– Accurate migration overhead model needed

• Possible Policies: what to move where

– First Fit Decreasing

– Fastest Migration Time (least time needed)

– Maximum Load Reduction

– Load Reduction to Just Below Maximum

– Highest Correlation with Causing Excessive Load

– Application Value (priority)

– Application Availability

10

Page 11: Cloud Service Level Agreements and Cloud Federation

An SLA Architecture for OpenStack

11

Page 12: Cloud Service Level Agreements and Cloud Federation

Nova’s Filter and Weight Design

12

Page 13: Cloud Service Level Agreements and Cloud Federation

Comments About Cloud SLAs

• Development & Test plan needed

– What are the simplest SLA mechanisms that "scratch the itch" for the most

users?

• Capacity Planning & Management

– How to estimate query requirements, load demand, time-to-completion

– How to support reasonable load requirements to produce reasonable times-

to-completion

– How to manage sets of users such that no one user is disruptive

– How to on-board requirements from other organizations

• Cyber-security Implications

– As clouds become larger and more widely used, there will be more

automated tools, i.e., autonomic behaviors

– Autonomic agents become a threat surface -- compromising an agent that

controls system behavior would have broad impact

• Track and exploit on-going SLA work in the marketplace

– OGF WS-Agreement, WS-Agreement Negotiation

– TeleManagement Forum (TMF), IEEE, EU, ...

– NSF Center for Cloud and Autonomic Computing

13

Page 14: Cloud Service Level Agreements and Cloud Federation

Changing Gears: Why Federate Clouds?

• Fundamental Cloud Technology:

– On-Demand Provisioning of Resources, e.g., servers, storage,

communication, platforms, services

• The Leap: a Global, Inter-Cloud:

– Everything is available anywhere, anytime, securely, transparently,

without having to worry about infrastructure

• Fundamental cloud technology says nothing about:

– Distributed Data and Workflow Management

– Wide-Area Network Management

– Federated Identity Management

– Single Sign-on

– Delegation of Trust

– Virtual Organizations

– Managing the Trust Ecosystem

• This requires Cloud Federation

14

Page 15: Cloud Service Level Agreements and Cloud Federation

Private

Cloud

Organization A

Private

Cloud

Organization B

Cloud Deployment Modes

Public Cloud

Hybrid

Cloud

Federated Community

Cloud

Hybrid

Cloud

15

Page 16: Cloud Service Level Agreements and Cloud Federation

Port-au-Prince, Haiti

First Responders Stakeholder Cloud #1 Stakeholder Cloud #2 Stakeholder Cloud #3

Cloud On-Demand Federated Cloud

Use Case: Disaster Response Using Federated Clouds

16

NGA/NCOIC GeoInt Community Cloud Prototype (GCC) demonstrated

the integration of cloud resources and geospatial information tools in

response to a Haiti-like earthquake event. Participants: Aerospace,

NJVC, Boeing, Raytheon, Telos, Winthrop Mgmt, OGC.

Page 17: Cloud Service Level Agreements and Cloud Federation

Use Case: GEOSS

• Ten year plan to build out architecture with data management tools

for full and open exchange of data, metadata and data products,

recognizing national and international policies and legislation.

• http://www.earthobservations.org 17

Page 18: Cloud Service Level Agreements and Cloud Federation

Necessary Security Capabilities for Federation

• Federated Identity Management (Authentication – AuthN) – “Are you who you say you are?”

– “Where are you from? Who is your Identity Provider?”

– “Can I trust your Identity Provider?”

– “How do I interact with your Identity Provider to verify your identity?”

• Federated Access Control (Authorization - AuthZ) – In a distributed env, how are user identities mapped to authorization privileges?

– Use a trusted Attribute Server to manage dynamic sets of authorization attributes

• Provides a “security context” to manage sets of authorization attributes that

are not tied to any particular home institution

• Authorization attributes must be well-known, or negotiated and agreed upon

– This is the Virtual Organization (VO) concept

• Could also be called workspaces, enclaves, …

• Trust Federations – Establishes trust among potential federation participants

– Example: International Grid Trust Federation, www.igtf.net

• Provides "Good Housekeeping Seal of Approval" to CA operators

• Certificates from approved CAs are trusted among IGTF participants

18

Page 19: Cloud Service Level Agreements and Cloud Federation

Federated Authorization Management with Virtual Organizations (VOs)

• A VO is a security and collaboration context not exclusively associated with any one physical organization or site

– Participating partners agree upon structure, rules and processes

– A VO partner can be a single person, a group or an entire organization

• A VO has members that are assigned roles and/or attributes

– Membership roles or attributes grant specific capabilities within a given VO as determined by each resource/service provider

• Partners participating in a VO contribute resources, i.e., data and services

– They retain complete control over their own resources!

– Access by VO members can be modified or revoked at any time by both the VO administrator and the resource administrator

• VOs enable federated, community clouds by being the Trusted Third Parties who assert user identity attributes, and who may authenticate users as well

19

Page 20: Cloud Service Level Agreements and Cloud Federation

Intellectual Heritage: VOs Developed for Global Grids

20

Worldwide LHC Computing Grid routinely runs over 200 VOs

(Dashboard website: http://dashb-earth.cern.ch)

Page 21: Cloud Service Level Agreements and Cloud Federation

OpenStack: an Open Source Cloud Project

21

Horizon

Nova

Keystone

Swift Glance

Simplified OpenStack Architecture

OpenStack -- www.openstack.org

• Open source cloud project run by

the OpenStack Foundation

• 87 countries

• 140 organizational members

• Hundreds of active developers

Page 22: Cloud Service Level Agreements and Cloud Federation

Adding an External VO Mgmt System to OpenStack NGA/NCOIC GeoInt Community Cloud (GCC) Prototype

22

• The Keystone project concept extended to include VO projects – Project names starting with "VO." are VO projects

– VO projects could be denoted by internal Keystone attribute

• Keystone and Swift clients and servers modified to demonstrate the management of container access using VO roles

Cloud A

Keystone

Cloud C Cloud B

Keystone

The VOMS manages info for multiple VOs

• Members

• Roles

• Participating Sites

VOMS w/ VOMS Admin

Haiti VO

Haiti VO Admin

Swift Swift Swift

List Upload

Download

Keystone

Page 23: Cloud Service Level Agreements and Cloud Federation

Adding a VO Auth Stage to the Keystone Command Pipeline

• All OpenStack services are

designed with a configurable

command pipeline

• When AO Auth sees a project

name with the “VO.” prefix, it

consults the VOMS

– Verifies user’s VO membership

– Returns member’s VO role and other

sites participating in this VO

23

Request

Response

Token

Auth

VO

Auth

Admin Token Auth

XML Body

JSON Body

Keystone Service

VOMS Admin

VOMS

VO Admin

VO Admin

VO Admin

Sites

Roles

Members

Page 24: Cloud Service Level Agreements and Cloud Federation

VOMS w/ Admin

VO.Haiti UserA: RoleA Site: Clouds A, B

… VO.Haiti Admin

Container ACL: UserA

Container ACL: RoleA

Swift

Cloud A

Keystone

Keystone

VO

Client

Swift VO Client

UserA

GCC Example:

User Authentication using Keystone with external VOMS

24

Role A Account: Service Catalogs and

AUTH_TOKENs for all VO sites

Container ACL: RoleA

Container ACL: UserB

Swift

Keystone

Cloud B

UserB

Authn as RoleA

Page 25: Cloud Service Level Agreements and Cloud Federation

Container ACL: UserA

Container ACL: RoleA

Swift

Cloud A

Keystone

Keystone

VO

Client

Swift VO Client

UserA

Role A Account: Service Catalogs and

AUTH_TOKENs for all VO sites

Container ACL: RoleA

Container ACL: UserB

Swift

Keystone

Cloud B

UserB

GCC Example: VO Container Access

25

List, Upload, Download

Local Admin Local Admin

VOMS w/ Admin

VO.Haiti UserA: RoleA Site: Clouds A, B

… VO.Haiti Admin

Page 26: Cloud Service Level Agreements and Cloud Federation

GCC Example:

Using VO Roles to Manage Container Access

26

VO Roles

Civic

Engineer

First Responder

Participating Site Access Control

Lists

Medical

Data

Building

Plans, Maps

End User

Data

Read:

Write:

Read:

Write:

Read:

Write:

Local

Admin

Data containers

could be made

available directly,

or replicated

Data

Containers

Medical

Staff

VO Members

Page 27: Cloud Service Level Agreements and Cloud Federation

Application Service Owner

VOMS

VOMS Admin

App VO w/ App VO Admin

Member Role

Site A, User k Data Processor

Site A, User m Data Processor

Site B, User n End User

Site AUTH_URL

Site A http://myclouda:5000/v2.0

Site B http://mycloudb:5000/v2.0

App User

Hypervisor

Hardware Server

App-level Service

Virtual Machine Image

VO_PEP.py

1: Instantiate

App Service

4: Verify VO

Membership 5: VO Membership

Verified

AUTH_TOKEN Service Catalog • App Svc Endpoint

8: Authz User

9: User Authz’d

Using VOs to Manage Access to

Arbitrary App-Level Services

Service Catalog

---

---

---

---

2: Register &

Administer

Endpoint

App Service Endpoint

w/ reqd. VO Roles/Attrs

Keystone

27

Page 28: Cloud Service Level Agreements and Cloud Federation

Key Design Issues for VOs in OpenStack • How to store the VO attributes

– As separate user entries with VO attributes (as per VOMS)

– As a set of general mapping rules (as per attribute mappings)

• How to Integrate VOs with multiple clouds

– Authenticating identity credentials from different Identity Providers

– Aggregating IdP attributes with VO attributes

• External VOMS DB vs. Peer-to-Peer Keystone VO DBs

– External VOMS easier to implement, quicker authz revocation process, but single

point of failure

– P2P doesn’t rely on third party VOMS, not a single point of failure, but introduces

consistency issue, and authz revocation takes longer

• Whether to check that user is a VO member or not

– Carries an overhead (esp. if external VOMS DB) so when to avoid it?

• Ensure that VOs can be used by arbitrary application-level services

– Database access, RSS feeds, any kind of standard services, e.g., geospatial tools:

Web Map Service, Web Feature Service, Web Coverage Service, …

• What should be at app-level vs. infra-level, i.e., what should be in Keystone

• Who manages the VO

– VO admin (distributed) or VOMS admin (centralized)

• Infrastructure-level federation vs. application-level federation

• Scalability 28

Page 29: Cloud Service Level Agreements and Cloud Federation

Additional Important Issues

• Many VO management and administrative issues must be

addressed

– Creating and terminating a VO

– Joining and leaving a VO

– Who is the VO Admin, and the VOMS Admin

• Common understanding of VOs

– VO resources, roles, attributes and trusted VOMS

– Data publishing and discovery

– Semantic interoperability

• Granularity of data/resource protection

– Resource protection carries an overhead

– There will be a granularity at which resource protection overhead

becomes excessive and intolerable

• Trust Federations

– Human organizations where common agreements are made in

advance of need

– Existing Example: International Grid Trust Federation, www.igtf.net

– Possible Analogy: International Disaster Response Trust Federation 29

Page 30: Cloud Service Level Agreements and Cloud Federation

Summary and Discussion

• Two major topics explored:

– Cloud SLAs

– Cloud federation – Virtual Organizations

• Capacity and performance management will be central to how

cloud-based ground systems are deployed and managed

– Autonomic control will be crucial since it will simply not be

possible -- nor desirable – to have humans in the loop

• Cloud federation will be central to how ground systems are

operated across a distributed infrastructure of inter-clouds

– Identity attributes and resource attributes (“data tagging”) will be

crucial to enforcing security and governance policies

• Lots of work to be done

30

Page 31: Cloud Service Level Agreements and Cloud Federation

© The Aerospace Corporation 2014

Thank you!

Any Questions?

All trademarks, service marks, and trade names

are the property of their respective owners.