calton pu experimental methods on performance in cloud and accuracy in big data analytics

Experimental Methods on

Performance in Clouds, …

Calton Pu

CERCS and School of Computer Science

Georgia Institute of Technology

1

2

Ancestors of Clouds (Hardware)

Data processing centers (~1960s)

Supercomputers, Grids (~1970s)

P2P, SETI@Home (~1999), botnets

Utility computing and data centers (~2000s)

Modern Clouds

Amazon data centers in early 2000s was only at 10%

capacity – introduction of Amazon Web Service (AWS)

in 2006

2007 – Google & IBM join Cloud Computing research

(NSF), Microsoft joins in 2010, NSFCloud in 2014

3

Cloud & Big Data (company)

4

Google Inc

Third market cap in the world ($390B in 2014)

Probably more data than anyone else

13 declared

data centers

around the

world; drawing

260MW in 2011

(2,259,998

MWh total).

Cloud & Big Data (government)

5

NSA (maybe more than Google)

Utah Data Center, drawing 65MW (about half of

Salt Lake City)

Different Cloud Services

6

Some Concrete Offerings

7

SaaS

PaaS

IaaS

Hardware

Hig

he

r Le

vel S

erv

ice

s

Cloud Service Models

Software as a Service (SaaS) [not covered] Use provider’s applications over a network

Example: Salesforce.com

Platform as a Service (PaaS) [not covered] Use system-level services (e.g., database) to

develop and deploy customer applications

Example: Google App Engine, MS Azure

Infrastructure as a Service (IaaS) Rent processing, storage, network

Example: Amazon EC2, Emulab

8

Amazon EC2 (circa 2010)

Elastic Block Store, CloudWatch,

Automated Scaling

9

Instance Type Memory Compute

(1GHz virt)

Local

Storage

Data Price

(hour)

Std Small 1.7 GB 1 X 1 160 GB 32b $0.085

Std Large 7.5 GB 2 X 2 850 GB 64-bit $0.34

Std X-Large 15.0 GB 4 X 2 1.7 TB 64-bit $0.68

High Memory X-L 17.1 GB 2 X 3.25 420 GB 64-bit $0.50

High Memory DB X-L 34.2 GB 4 X 3.25 850 GB 64-bit $1.00

High Memory QD X-L 68.4 GB 8 X 3.25 1.7 TB 64-bit $2.00

High CPU Medium 1.7 GB 2 X 2.5 350 GB 32-bit $0.17

High CPU X-L 7.0 GB 8 X 2.5 1.7 TB 64-bit $0.68

Cluster Compute X-L 23 GB 33.5 1.7 TB 64-bit $1.60

AWS Free Tier

New AWS customers can get started with Amazon EC2 for free. Each month for 1 year: 750 hours of EC2 running Linux, RHEL, or SLES

t2.micro

750 hours of EC2 running MS Windows Server t2.micro

750 hours of Elastic Load Balancing plus 15 GB data

30 GB of Amazon Elastic Block Storage in any combination of General Purpose (SSD) or Magnetic, plus 2 million I/Os and 1 GB of snapshot storage

15 GB of bandwidth out

1 GB of Regional Data Transfer

10

Resources Available for

Experiments

Our own cluster (about 50 nodes)

GT/CERCS cluster (about 800 nodes)

Emulab (Utah), PROBE (CMU)

A few hundreds nodes, a few dozen available

CloudLab replaces Emulab (May 2015)

Other partner clusters in companies and

universities

11

Challenges in Cloud Adoption

From user’s point of view

Data security/privacy (in a public cloud)

Performance concerns

From provider’s point of view

Up-front hardware costs high, rapid aging

Hardware capacity generally under-utilized

Low scalability of most enterprise applications

Negotiating SLA contracts and price structures

12

Cloud Management Challenge

High utilization brings higher ROI

Achievable by predictable/stationary workloads

Mission-critical applications need SLA

Resource Utilization Paradox

Good ROI requires high utilization (many

papers on consolidation claim >90% utilization)

Consistent reports of 18% average utilization

Cloud management is more challenging

than we hoped initially

13

Representative Cloud Workloads

Cloud workload – amount of processing that a cloud has to do at a given time

Use workloads to test a particular type of application

Types of workloads: E-commerce

OLTP

Forum/Message board

Web 2.0 application

MapReduce

14

15

Example 1: RUBiS Benchmark

E-commerce applications (eBay auctions)

N-tier (3 or more tiers): web servers, application

servers, database servers

26 web interactions, requiring sophisticated

models, e.g., Layered Queuing Network Models

16

Typical Execution Environment

Client Browsers Tomcat Servlet Engines

MySQLDB Servers

Apache Web Servers

HTTP

AJP13 JDBC

Hardware resource

Xen Hypervisor

D0 VM1 VM2 VM3

Hardware resource

Host OS

Hypervisor

Hardware resource

Host OS

Hypervisor

Virtual

Mgm.

Inferface

17

Meta-Model of RUBiS

Layered Queuing Network Model of RUBiS(3-Tier): one for each of 26 interactions; total of 78 sub-models

18

Web Server Sub-Model

3-tier: simplest implementation of RUBiS

AboutMe (1 of 26), customized for 3-tier

Challenges in Modeling

Layered Queuing Network Models become

very complex even for “simple” n-tier

applications

Experiments are needed anyway

Setting the values for various sub-models

Need detailed experiments for a variety of

configurations

Let’s try “pure” experiments

19

20

Example 2: RUBBoS Benchmark

Another e-commerce workload

Bulletin Board (Slashdot)

DB server bottleneck, C-JDBC as load

balancer

24 web interactions

Configuration notation: 1-2-1-9

1 web server, 2 app servers, 1 C-JDBC server,

9 DB servers

Emulab (a relatively modest testbed)

21

Example Hardware (Emulab)

22

Example Software Configuration

23

Sample Configuration (1-2-1-3)

Apache

Tomcat

C-JDBC

MySQLPostgreSQL

24

Low endDB server

Experiment Design

Web

server1

App

server1-3

C-

JDBC1

DB

server1-9

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

NormalDB server

25

MySQL Throughput (Low-Cost)

Better scalability (different query processing strategies)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

What’s Different about Clouds (1)

Traditional benchmarks

Static configuration: HW,

SW, workload range

Find the “best tuning” to

achieve highest

throughput

Cloud benchmarks

Dynamic and many

configurations

Find representative

throughput and response

time for each

configuration

(reproducible results by

other users)

26

27

MySQL Throughput for R/W

Mix (read one, write all)

0

100

200

300

400

500

600

Th

rou

gh

pu

t (o

ps/s

)

Workload

1-1-1-8ML

1-2-1-4ML

1-2-1-5ML

1-2-1-6ML

1-2-1-7ML

1-2-1-9ML

1-3-1-9ML

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

28

1-2-1-9ML Configuration Data

Clear bottleneck indicated by leveled

performance (previous slide)

All high workloads (more than 4000)

Same leveling for other configurations

Average resource consumption on DB

servers quite low (CPU and disk I/O)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

29

Web Server CPU Utilization

(1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

30

Application Server CPU

Utilization (one of 1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

31

C-JDBC Server CPU Utilization

(1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

32

DB Server CPU Utilization

(one of 1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

33

DB Server Disk I/O Bandwidth

Utilization (one of 1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

34

Observations on 1-2-1-9ML

No CPU bottlenecks anywhere

Disk I/O bandwidth on the DB servers has

a slight peak at the high value spectrum

boundary

An infrequent disk I/O bottleneck, which cannot

explain the observed lack of overall system

performance

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

36

Maximum of Disk I/O Bandwidth

Utilization (all of 1-2-1-9ML)

MySQL

Browse-only

Read-Write

Wait-1st

Wait-All

PostgreSQL

Browse-only

Read-Write

Wait-1st

Wait-All

DB server

What’s Different about Clouds (2)

Traditional benchmarks

Balanced configuration:

near-full utilization of all

resources for a stable

workload

Single bottleneck, high

average utilization

Cloud benchmarks

Almost always start all

low utilization

No stable bottlenecks

Often stays with all low

average utilization, but

performance remains low

Found new phenomenon:

Multi-bottlenecks

37

Why Automation?

Traditional benchmarks such as TPC and

SPEC answer the question

For a given hardware/software configuration

and workload, what is the highest achievable

throughput?

In the cloud this become very difficult due

to various dimensions:

Horizontal scalability

Vertical Scalability

Variety of software components

38

Solution: Expertus

A framework for large-scale benchmark

measurements through flexible automation

of experiments.

It creates the scripts through a multi-stage

code generation process.

Easy to plug new benchmarks and clouds

Enables cloud measurements at a scale

that is beyond manual management of

benchmarks

39

Experiment Summary

Over 500 different hardware configurations

(i.e., varying node number and type)

Over 10,000 different software

configurations (i.e., varying software and

software setting).

Over 100,000 computing nodes in various

cloud environments:

Emulab

Amazon EC2

Open Cirrus.40

41

E-commerce Example

Many configuration variables

Many applications, clear differences among them

Many cloud offerings, non-obvious differences

Different software/hardware configurations may

produce the same or different results

Experimental setup challenges

Dependencies among components

Systematic search through the potentially large

configuration space

Experimental Challenges

42

43

Elba: Automating Measurements

0

20

40

60

80

100

L/ L

2H/ L

Analyzed

Result

79.515.298.2/97.246.2/36.22H/H

98.221.397.3/98.236.4/46.62H/L

87.25.398.265.3H/H

98.322.398.766.8H/L

78.311.297.999.8L/L

MemoryCPUMemoryCPU

APP ServerDB Server

79.515.298.2/97.246.2/36.22H/H

98.221.397.3/98.236.4/46.62H/L

87.25.398.265.3H/H

98.322.398.766.8H/L

78.311.297.999.8L/L

MemoryCPUMemoryCPU

APP ServerDB Server

Automated, Evolutionary

Staging Cycle

(0) Config. Design

Deployment

Scripts

MuliniTBL

Analyzer

Monitors

App

Staging

Deployment

Workload

Driver

(1) Code Generation / Deployment

System Under TestWorkload Drivers

Monitor

(3) Analyzer

Monitor

Monitor

Monitor

Evaluation / Analysis

(4) Reconfiguration

(2) Execution

Automated

AdaptationBenchmark

specs

Experiment

Spec. Lang.

Adapt.

Cost

Automated Experiment Management Through Extensible, Flexible and modular code

generation

Extensibility Extending the framework to support specification changes,

new benchmarks, computing clouds, and software packages

Flexibility Modification to input configuration or output configuration

without changing the source code of the framework

Modularity Consists of a number of components that may be mixed

and matched in a variety of configurations

44

Benefits of Automation

Abstraction mapping. External forces often drive changes

Standards formulation/adoption

Industry evolution

Internal forces drive changes Goals, functionality refinement

Interoperable heterogeneity. Heterogeneous clouds and applications

Flexible customization. Experiment goals, API changes

45

Code Generation – Key

Challenges

The code generator adopts a compiler approach of multiple serial transformation stages.

One type of transformation at any given stage (e.g., cloud, operating system, application etc…)

The number of stages is determined by the experiment, application, software stack, operating system, and cloud.

At each stage Expertus uses the intermediate XML document created from the previous stage as the input to the current stage.

Expertus Approach

46

Multi-Stage Code Generation

47

Steps Inside a Stage

48

An Example XSLT Template

49

An Intermediate XML

50

Generated Code

51

Create experiment specification with the application, software packages, cloud and experiments.

Use Expertus and generate scripts.

Platform configuration sets up the target cloud.

Application deployment is to deploy the target application on the configured cloud.

Configure application correctly.

Main script runs the test plan, which in fact consists of multiple iterations.

Upload the resource monitoring and performance data to the data warehouse.

Experiment Automation Process

52

Automati

on

Process

53

54

Configuration levels

Usability of the Tool. How quickly a user can change an existing specification

to run the same experiment with different settings

Generated Script Types and Magnitude. Depends on the application, software packages,

deployment platform, number of experiments.

Richness of the Tool. Magnitude of completed experiments

Amount of different software packages, clouds, and applications it supports

Extensibility and Flexibility. Supporting new clouds

Supporting new applications

Evaluation Metrics

55

Usability

Specification change to Code changes Number of Nodes vs. Generated Code

56









Evaluation Metrics

57

Current Status

58

59

Complexity of the tool

Table1: Number of Experiments

Table2: Experiment size vs. NLOC









Evaluation Metrics

60

Adding a new Cloud

Template Changes Changes in Generated Code

61

Adding a new Application


62

Adding a new DBMS


63

Magnitude of Code per Node

64

Significant strides towards realizing flexible

and scalable application testing for today’s

complex cloud environments.

Over 500 different hardware configurations.

Over 10, 000 software configurations.

Five clouds (i.e., Emulab, EC2, Open Cirrus,

Georgia Tech cluster, and Wipro)

Three representative applications (RUBBoS,

RUBiS, and CloudStone)

65

Usability

Support new clouds, applications, and software

packages with only a few template line changes.

8.21% of template line changes, to support

Amazon EC2 once we had support for the Emulab

cloud.

Caused a 25.35% change in the generated code

for an application scenario with 18 nodes

Switching from the RUBBoS to the RUBiS

required only a 5.66% template change

66

Flexibility and Extensibility

Configuration Planning

Provider profit model (simplified)

67

Provider Revenue Model

68

Design of CloudXplor Tool

69

Data Refinement in CloudXplor

70

Cloud Evaluation - CloudXplor

71

Maximal Throughput of

RUBBoS (1-2-4 R/W)

72

Throughput RUBBoS

(1-2-4 and R/W)

73

Response Time Dist.

(1-1-2 MySQL Cluster)

74

Revenue and Cost

Analysis (R/W)

75

Profit and Cost Analysis

RUBBoS (R/W)

76

Optimal Profit Configurations

77

Example: RUBBoS Benchmark

E-commerce applications (Slashdot

Bulletin Board)

N-tier (3 or more tiers): web servers,

application servers, database servers

26 web interactions, requiring sophisticated

models, e.g., Layered Queuing Network

Models

78

RUBBoS 4 Tier Deployment

79

Example: RUBBoS Benchmark

80

RUBBoS Software

Components

81

RUBBoS Software Settings

82

Cloud Evaluation - Overview

Main idea How and where to deploy your enterprise system in what

scenario?

Automated empirical measurement and evaluation of

alternative platforms, configurations, and architectures

for n-tier apps in the cloud

Hardware platforms (IaaS) Amazon EC2, Open Cirrus (HP), and Emulab

System software configurations LAMP, MySQL Cluster (off-the-shelf RDBMS)

Application software E-commerce application benchmarks (RUBBoS)

83

Cloud Evaluation - Mapping

84


85


86

Cloud Evaluation - Deployment

87

Cloud Evaluation - Deployment

88

Cloud Evaluation – Infrastruct.

89

Fast-Forward A Few Years

Using the automated experiment

generation infrastructure, we ran many

thousands of experiments

We found several interesting phenomena

The best: Very Short Bottlenecks that cause

Very Long Response-Time Requests

90

Latency Long Tail Problem

At moderate CPU utilization levels (about

60% at 9000 users), 4% of requests take

several seconds, instead of milliseconds

91

Latency Long Tail: A Serious

Research Challenge

No system resource is near saturation Very Long Response Time (VLRT) requests

start to appear at moderate utilization levels (often at 50% or lower)

VLRT requests themselves are not bugs: They only take milliseconds when run by

themselves

Each run presents different VLRT requests

VLRT requests appear and disappear too quickly for most monitoring tools

92

Big Data & Clouds Need

Automation

Experimental approaches

Often the only choice (modeling too complex)

Abundant resource availability

Many configurations mean many experiments

and measurements

Automated experiment generation,

execution, monitoring, and analysis

Very interesting phenomena found (VSB)

93

End of Session

Any Questions?

Calton Pu ([email protected])

94

mailto:[email protected]

calton pu experimental methods on performance in cloud and accuracy in big data analytics

Data & Analytics