demystifying the reliability of cloud...

26
DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella, Ph.D. Rutgers University

Upload: others

Post on 26-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

DEMYSTIFYING THE RELIABILITY OF

CLOUD SERVICES Michael Tortorella, Ph.D.

Rutgers University

Page 2: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 2

THEME Many users are suspicious of cloud services

reliability

Many service providers are not confident of their risk analyses for cloud services SLAs

A consistent, unified approach to cloud services reliability is valuable

Tools for service reliability engineering readily apply

Page 3: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 3

OVERVIEW

Some examples of cloud services

How should reliability of cloud services be described?

Modeling cloud services reliability

Cloud services SLA risk analysis

Conclusion

Page 4: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 4

CLOUD SERVICES EXAMPLES

File system backup

SaaS Remote execution of applications

Remote FTP Dropbox™-like services

Page 5: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 5

CLOUD SERVICES RELIABILITY

These are on-demand services Accessed via transactions

To understand reliability, you need to understand the user requirements Failures are violations of the user

requirements Reliability engineering concerns frequency

and duration of failures and outages

Page 6: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 6

CLOUD SERVICES USER REQUIREMENTS

Get at my data when I want to And unauthorized access prevented

Uninterrupted upload and download

Uninterrupted computation

Data are secure and uncorrupted

Page 7: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 7

CLOUD SERVICES RELIABILITY REQUIREMENTS

Each user requirement gives rise to one or more reliability requirements How often are the failure modes in the user

requirement permitted to occur? For how long are outage conditions

permitted to persist?

Page 8: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 8

CLOUD SERVICES FAILURE MODES

Can’t access stored data Including executables

Interruptions Uploads Downloads Executions

Delays Errors

Page 9: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 9

CLOUD SERVICES FAILURE MECHANISMS

Loss of some shared resource(s) Servers WAN LAN Local interfaces

CSP goes out of business

Page 10: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 10

CLOUD SERVICES RELIABILITY: KEY CONCEPT

In order to meet the service’s requirements, the cloud infrastructure needs to configure itself in specific ways to deliver the service

Frequency and duration of cloud service failures/outages is related to the frequency and duration of infrastructure failures/outages

Page 11: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 11

CLOUD SERVICES RELIABILITY MODELING

Make these connections explicit

Attach numbers based on understanding of the infrastructure hardware and software SDN, NFV,… are merely different

technologies enabling cloud services Even virtualization needs a physical and

software infrastructure

Page 12: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 12

EXAMPLE ENTERPRISE DATA BACKUP

Service: off-site backup for enterprise data

Features Many users Many databases Service provider decides where to store

Page 13: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 13

EXAMPLE ENTERPRISE DATA BACKUP

User requirements Any user may request access to his/her

data at any time Any user may request upload or download Interruptions limited Delays limited User facilities must be freed up after

session is complete

Page 14: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 14

EXAMPLE ENTERPRISE DATA BACKUP

Reliability requirements examples Accessibility

+ Probability that a user is able to access his/her data at any time is ≥ 0.9995

+ Probability that a user can access data he/she is not authorized to is ≤ 0.0001

Continuity + Probability that a download or upload is interrupted is ≤

0.001 Release

+ Probability that the service provider does not free up the user’s facilities is ≤ 0.00001

Page 15: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 15

EXAMPLE ENTERPRISE DATA BACKUP

Reliability requirements could be aggregated Requirement for an entire group of users rather

than by individual user Makes it more difficult to discern individual user

performance

Page 16: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 16

EXAMPLE ENTERPRISE DATA BACKUP

RELIABILITY MODEL

Enterprise network WAN

SL1

SL2

SL3

SL4

Users

Page 17: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 17

EXAMPLE ENTERPRISE DATA BACKUP

Number and location of storage sites may change from time to time In this example, each storage site

contains all the data WAN may not be totally within the control

of the cloud storage service provider Contracted from some other network

provider

Page 18: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 18

EXAMPLE ENTERPRISE DATA BACKUP

All of this “at time t”

P{access data} = P{user facility working} × P{reach WAN thru enterprise network} × [P{reach SL1 thru WAN & SL1 works} + ⋅ ⋅ ⋅ + P{reach SL4 thru WAN & SL4 works}]

Page 19: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 19

EXAMPLE ENTERPRISE DATA BACKUP

Each term may involve additional modeling

Independence may or may not be appropriate

Service functional decomposition can be a useful tool

Page 20: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 20

EXAMPLE SERVICE FUNCTIONAL DECOMPOSITION

VoIP with SIP

OK OK

BYE BYE

Pause OK

OK

INFO INFO

Pause ACK

ACK

OK OK

Pause Ringing

Ringing

INVITE Trying

INVITE UAC UAS AS

User Agent Client Application Server User Agent Server

Page 21: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 21

EXAMPLE ENTERPRISE DATA BACKUP

Similar models can be constructed for the interruptions failures and the release failures

As with all reliability models, simplifications and assumptions are made WAN modeling involves network capacity and

traffic demand

Page 22: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 22

FULFILLING THE PROMISE OF SERVICE RELIABILITY IN EMERGING TECHNOLOGY-BASED NETWORKS

The industry needs has a set of reliability metrics and terminology that

are common across the network for which individual service availability

requirements can be specified

Page 23: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 23

FULFILLING THE PROMISE OF SERVICE RELIABILITY IN EMERGING- TECHNOLOGY-BASED NETWORKS

Service customers don’t know and don’t care what technologies are being used to deliver the service Even when technologies change (e. g., SDN) Enterprises and individuals may have different

attitudes

The service provider is responsible for ensuring that their infrastructure is capable of delivering reliable service SLA

Page 24: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 24

CLOUD SERVICES SLA RISK ANALYSIS

SLAs for enterprise services are common

SLAs for cloud services are riskier Reliability has been less well characterized

Page 25: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 25

COMMON RELIABILITY METRICS AND TERMINOLOGY

Transaction-based services

Engineering foundations exist CQR 2004-2007 Two papers in QTQM Chapter in a forthcoming book

Page 26: DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICEScqr2014.ieee-cqr.org/ETR-RT/Tortorella_IEEE-ETR-RT-2014_Cloud Ser… · DEMYSTIFYING THE RELIABILITY OF CLOUD SERVICES Michael Tortorella,

CQR 2014 May 12, 2014 CLOUD SERVICES RELIABILITY

©2014 M. Tortorella Slide 26

CONCLUSION

Conceptually straightforward Key is understanding customer reliability

requirements for the cloud service + Accessibility + Continuity/Fulfillment + Release

Execution can be challenging Infrastructure models Networks with unreliable elements