qiang guan, ziming zhang and song fu university of north texas

24
Anomaly Identification Using Reduced Metric Space in Cloud Computing Systems Qiang Guan, Ziming Zhang and Song Fu University of North Texas

Upload: arden

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Cloud Computing Systems. Qiang Guan, Ziming Zhang and Song Fu University of North Texas. Introduction. Anomaly detection is a vital element of operations in large scale datacenter. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Cloud Computing SystemsQiang Guan, Ziming Zhang and Song FuUniversity of North Texas

Page 2: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

IntroductionAnomaly detection is a vital

element of operations in large scale datacenter.◦Detecting patterns in a given data

set that do not conform to an established normal behavior.

Page 3: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

ChallengesContinuous monitoring and large

system scale lead to the overwhelming volume of data collected by health monitoring tool.

The large number of metrics that are measured make the data model extremely complex.◦High metric dimensionality will cause

low detection accuracy and high computational complexity.

Page 4: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

This paperPresents a metric selection

framework for online anomaly detection in utility cloud.◦Select most essential metrics by

applying metric selection and extraction methods.

◦Identify anomalies using an incremental clustering approach.

◦Implement a prototype and evaluate the performance.

Page 5: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Dimensionality ReductionTransforms the collected health-

related performance data to a new metric space with only the most important metrics preserved.

In this paper:◦Metric selection using mutual

information.◦Metric extraction by metric space

combination and separation.

Page 6: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric SelectionSelect the best subset of the

original metric set based on mutual information.◦The mutual information of two

random variables is a quantity that measures the mutual dependence of the two random variables.

Page 7: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Selection(Cont.)

Sm

ii

cmIS

relevanceSrelevance );(1),(max

Smm

jiji

mmIS

redundancySredundancy,

);(1),(min 2

)()(),(max SredundancySrelevancedependencySdependency

However, finding the optimal metric subset id NP-hard.

=>

Page 8: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Incremental Search MethodGiven Sk-1, try to select the kth

metric that maximizes dependency() from the remaining metrics in (M-Sk-1).

→S1 ⊂ S2 ⊂ ... ⊂ Sn

11

);(11);(max

kjki Sm

jiiSMmmmI

kcmI

Page 9: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Incremental Search Method(Cont.)Sn*

◦Find the range of i, where the cross-validation error erri has small mean and small variance.

◦err* = Min(erri)◦n* equals to the smallest i, for which

Si has err*.

Page 10: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric ExtractionCreates new metrics by

transformation or combination of the original metrics.

Two methods:◦Metric space combination◦Metric space separation

Page 11: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Space CombinationDataset D = [x1, x2, …, xL]Record xi = [x1,i, x2,i, …, xn,i] T

Covariance matrix of D: V=DDT

Calculate the eigenvalues {λi} of V and sort them in descending order.

Choose n’ by:)1,0(,

1

'

1

n

ii

n

ii

Page 12: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Space Combination(Cont.)The corresponding n’

eigenvectors are the new metrics.

Apply Gram-Schmidt orthogonalization process to compute eigenvectors {ej}.

Page 13: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Space SeparationSeparate desired data from

mixed data.

Record x = [x1, x2, …, xL] T

Component e =[e1, e2, …, en’] T

x = Ae → e = Wx

Find an optimal transformation matrix W so that {ej} are maximally independent.

Page 14: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Space Separation(Cont.)Independent component analysis

(ICA)◦A computational method for

separating a multivariate signal into additive subcomponents.

◦A special case of blind source separation.

Page 15: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Incremental ClusteringData points are considered one

at a time, and assigned to existing groups without affecting the existing group significantly.◦“A data point goes into the nearest

group if the Euclidean distance between this point and the centroid of the group smaller than δ; else create a new group.”

◦Update centroid after new point comes in.

◦Adjust δ if cloud operators find false-negative. Normal but assigned to anomaly.

Page 16: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Experiment Setting362 servers.Each server hosts up to ten VMs.Benchmarks:

◦RUBiS distributed online service benchmark

◦MapReduce jobsFault injection

◦CPU, memory, disk, and network faults.

Page 17: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Experiment Setting(Cont.)Monitoring tools

◦sysstat: runtime performance data in Dom0

◦Modified perf: performance counters from hypervisor.

Total 518 metrics.◦182 + 336◦However, only 406 non-constant.

Monitor every minute from 2011/01/20 to 2011/08/11.

Page 18: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Selection Result406→14

◦Metric space reduced by 96.6%

Page 19: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Metric Extraction ResultsMetric extraction and metric

selection v.s. Metric extraction only.

Page 20: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

Detection Precision

Page 21: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas

ConclusionAnomaly detection is important.

◦self-managing cloud resources and enhancing system dependability.

They present a metric selection framework with metric selection and extraction mechanisms.

The selected and extracted metric set contributes to highly efficient and accurate anomaly detection.

Page 22: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas
Page 23: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas
Page 24: Qiang  Guan,  Ziming  Zhang and Song Fu University of North Texas