cloud workload analysis and simulation

Cloud ComputingProject B Cloud workload analysis and simulation.

Group 3:Abinaya ShanmugarajArunraja SrinivasanPrabhakar GanesamurthyPriyanka Mehta

Instructor : Dr. I-Ling Yen

TA : Elham Rezvani

Overview

• Dataset preprocessing

• Dataset Analysis and Observations

• Important attributes in dataset

• Categorization of users and tasks

• Time series analysis

• Workload prediction

• Looking Ahead

Dataset pre-processing

• Inconsistent and vague data was processed to perform analysis.

• The task-usage table has many records for a same jobID-task index pair because the same task might be re-submitted or re-scheduled due to task failure.

• So to avoid reading many values for the same JobID-Task index pair pre-processing was done.

• Pre-processing: All records were grouped by JobID-Task index and the last occurring record of repeating task records was considered and stored as a single record.

• Time is in microseconds in the dataset.

• Pre-processing: Time is converted into days and hours for per day analysis

Dataset pre-processing

Dataset analysis and observation

• The data in the tables were visualized

• The data which were found to be constant/within a small range of values for most of the records were not considered for analysis.

• The attributes that play a major part in shaping the user profile and task profile are considered important attributes.

• The main attributes from a table were analyzed and visualized and certain observations were made.

Data Analysis and Observation

Ignored attribute(example) – Memory accesses per instruction

Memory accesses per instruction Vs Tasks per JobID – Except for a few tasks MAI is almost the same for all tasks

Job Events tableAttributes considered: Time, JobID, event type, user.

• These attributes were extracted from the csv files using java code.

• To find the number of jobs submitted per day and per user, the records with event type = 0 were considered, as ‘0’ means a job is submitted by the user.

• Time in microseconds is converted into days

Visualizations : jobs submitted per day, per user.

Task events tableAttributes considered: Time, JobID, task index,event type, user, CPU request,

memory request, disk space request.

• With records where event type = 0, the number of tasks per day, per user was visualized.

• Through the distinct count of users, the numbers of users per day was visualized

Average tasks per day = 1,607,694

Average users per day = 398

Visualizations: number of tasks per day, per user, number of users per day, user submission rate (total number of tasks submitted/30) average memory requested per user, average CPU requested per user, Avg tasks/job per user.

Tasks per day Vs Jobs per dayDay

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0M

1M

2M

3M

4M

5M

Cou

nt o

f Tas

k In

dex

0K

10K

20K

30K

40K

Dis

tinct

cou

nt o

f Job

ID

Sheet 7

C o u n t o f Ta s k In d e x a n d d is tin c t c o u n t o f J o b ID f o r e a c h D a y .

Observation: From the visualization, there is loose correlation between Jobs/day and Tasks/day. (Less jobs does not mean less number of tasks)

Tasks per day Vs Users per dayDay

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0M

1M

2M

3M

4M

5M

Cou

nt o

f Tas

k In

dex

0

100

200

300

400

500

Dis

tinct

cou

nt o

f Use

r

Sheet 1

C o u n t o f Ta sk In d e x a n d d is tin c t co u n t o f U se r f o r e a ch D a y .

Observation: From the visualization, there is loose correlation between Jobs/day and users/day. There is a pattern in users/day(Every week, 7th day has less number of users(possibly a weekend)). Type of users is important than number of users/day to predict the number of tasks/day

User Submission rate(Task/day)

Observation: Few users have very high submission rate.

Avg. Tasks/Job per user

Observation: Most jobs user submit are similar as the number of tasks in the jobs are same

Machine EventsAttributes considered: Time, machine ID, event type,

CPU, memory.

• Considering records with event type = 0, we get machines that are added to the cluster and are available

• Considering records with event type = 1, we get machines that are removed due to failure

• Considering records with event type = 2, we get the machines whose attributes are updated

• These data is of less significance for our project

Tasks usageAttributes considered: start time, end time, job ID, task

index, CPU rate, canonical memory usage, assigned memory usage, local disk space usage.

• Using the considered attributes, task length(running time*CPU rate) was computed. (running time was converted from microseconds to seconds)

• The user data from task events table was extracted to get the average memory, CPU used per user

Visualization: Average CPU used per user, Average memory used per user

CPU requested per user Vs CPU used per user

Observation: Most users over estimate the resources they need and use less than 5% of the requested resources A few users under estimate the resources and use more than thrice the amount of requested resources.

Memory requested per user Vs Memory used per user

Observation: Most users over estimated the resources they need and use less than 30% of the requested resources Very few users under estimated the resources and use more than the amount of requested resources

but when tasks use more memory than requested they get killed.

Important Attributes• Those attributes which play an important part in

identifying user and task shape

• From the visualizations and observations made, the following are identified as important attributes:

• User : Submission rate, CPU estimation ratio, Memory estimation ratio

Estimation ratio = (requested resource – used resource)/requested resource

• Task : Task length, CPU usage, Memory usage

CPU Estimation ratio per User

Users with negative (red) CPU estimation ratio have used resources more than requested.Users with CPU estimation ratio between 0.9 to 1 have not used more than 90% of the requested resource.

Memory Estimation ratio per User

Users with negative (orange) memory estimation ratio have used resources more than requested.Users with memory estimation ratio between 0.9 to 1 have not used more than 90% of the requested resource.

Categorization of Users

Categorization of Tasks

Dimensions for categorizationUser : Submission rate, CPU estimation ratio, Memory estimation ratioTask : Task length, CPU usage, Memory usage

We use the following clustering algorithms to identify optimal number of clusters for users and tasks1. K- means 2. Expectation – Maximization (EM)3. Cascade Simple K-means4. Xmeans• We categorize the users and tasks using these clustering algorithms with the above dimensions for users and tasks.• We compare and choose the best clustering for users and tasks.

User Categorization

Users - K- means with 4 clusters

X : Avg. memory est. ratio Y: Submission rate Z: Avg. CPU est. ratio

Tasks Categorization

Tasks – Day 13 – Kmeans (3 clusters)

X: Memory usage Y: Length Z: CPU usage

Tasks – Day 13 - Xmeans

X: Memory usage Y: Length Z: CPU usage

Clustering Comparison:

Our clustering(Xmeans)

K means clustering in done in IEEE paperAn Approach for Characterizing Workloads in Google Cloud to Derive RealisticResource Utilization Models

Selected User and Task clustering

Users - K means with 4 clustersX : Avg memory est. ratio Y: Submission rate Z: Avg. CPU est. ratio

Tasks - X means with 3 clustersX: Memory usage Y: Length Z: CPU usage

Time Series Analysis

Selecting Target Users & Tasks

From the clustering results we observed:• 97% of the users have estimation ratios ranging from 0.7-1.0• That is 97% of the users don’t user more than 70% of the resources they request• We targeted User Cluster 0 & Cluster 3 ( more than 90 % unused)

We targeted tasks that were long enough to perform efficient resource allocation• Performed clustering on task lengths of these users to filter out short tasks

User workload analysis – Dynamic Time Warping

To identify user’s tasks with similar workload,We ran the DTW algorithm on each tasks of Cluster0 and Cluster3 users• Computed the DTW between user’s tasks and a reference curve• Extracted tasks of a user that have same DTW value• These tasks were identified to have similar workload curve.

Workload prediction

Workload predictionSince resource allocation and de-allocation cannot be done dynamically because of :• Huge overhead• Delay in allocating resourcesSo the resource allocation must happen once in every pre-determined interval of time.

Prediction:• When a predictable user runs a task , its initial workload is compared with the curve associated(reference curve) with him/her.• Based on the slope of the predicted workload curve(reference curve) a step- up or step-down in resource allocation is determined, considering the delay in resource allocation.

Looking ahead…

• When the unhashed job name and user name is known, associations between job name and its workload can be formed and used for better prediction

• As observed in the user clustering, most users have poor estimation ratios.So better resource estimating processes can be used to assist users to have a better Estimation ratios.

• More techniques like regression analysis, curve fitting algorithms can be used to get a better representative curve for a predictable user.

நன்றி�

cloud workload analysis and simulation

Technology

day day

user observation

jobs user

user data

user visualization

user profile

type of users

considered attributes