load shedding in network monitoring applications

29
Introduction Load Shedding Evaluation Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 Gianluca Iannaccone 2 Josep Sanjuàs 1 Diego Amores 1 Josep Solé-Pareta 1 1 Technical University of Catalonia (UPC) Barcelona, Spain 2 Intel Research Berkeley, CA Intel Research Berkeley, July 26, 2007 1 / 11

Upload: others

Post on 15-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Load Shedding in Network

Monitoring Applications

Pere Barlet-Ros1 Gianluca Iannaccone2

Josep Sanjuàs1 Diego Amores1 Josep Solé-Pareta1

1Technical University of Catalonia (UPC)Barcelona, Spain

2Intel ResearchBerkeley, CA

Intel Research Berkeley, July 26, 2007

1 / 11

Page 2: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Outline

1 Introduction

Motivation

Case Study: Intel CoMo

2 Load SheddingPrediction Method

System Overview

3 Evaluation and Operational ResultsPerformance Results

Accuracy Results

2 / 11

Page 3: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic

Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

3 / 11

Page 4: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic

Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

The problem

Efficiently handling extreme overload situations

Over-provisioning is not possible

3 / 11

Page 5: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system

Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex stateful computations

1http://como.sourceforge.net

4 / 11

Page 6: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system

Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex stateful computations

Traffic queries are black boxes

Arbitrary computations and data structures

Load shedding cannot use knowledge about the queries

1http://como.sourceforge.net

4 / 11

Page 7: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Load Shedding Approach

Main idea

1 Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2 Leverage correlation to predict CPU load

3 Use prediction to guide the load shedding procedure

5 / 11

Page 8: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Load Shedding Approach

Main idea

1 Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2 Leverage correlation to predict CPU load

3 Use prediction to guide the load shedding procedure

Novelty: No a priori knowledge of the queries is needed

Preserves high degree of flexibility

Increases possible applications and network scenarios

5 / 11

Page 9: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Traffic Features vs CPU Usage

0 10 20 30 40 50 60 70 80 90 1000

2

4

x 106

CP

U c

ycle

s

0 10 20 30 40 50 60 70 80 90 1000

1000

2000

3000

Packets

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

x 105

Byte

s

0 10 20 30 40 50 60 70 80 90 1000

1000

2000

3000

Time (s)

5−

tuple

flo

ws

Figure: CPU usage compared to the number of packets, bytes and flows

6 / 11

Page 10: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

System Overview

Figure: Prediction and Load Shedding Subsystem

7 / 11

Page 11: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Load Shedding Performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm0

1

2

3

4

5

6

7

8

9x 10

9

time

CP

U u

sage [cycle

s/s

ec]

CoMo cycles

Load shedding cycles

Query cycles

Predicted cycles

CPU frequency

Figure: Stacked CPU usage (Predictive Load Shedding)

8 / 11

Page 12: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Load Shedding Performance

0 2 4 6 8 10 12 14 16

x 108

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CPU usage [cycles/batch]

F(C

PU

usage)

CPU cycles per batch

Predictive

Original

Reactive

Figure: CDF of the CPU usage per batch

9 / 11

Page 13: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Accuracy Results

Queries estimate their unsampled output by multiplying their

results by the inverse of the sampling rate

Errors in the query results (mean ± stdev )

Query original reactive predictive

application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65

application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76

flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34

high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30

link-count (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50

link-count (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60

top destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

10 / 11

Page 14: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Ongoing and Future Work

Ongoing Work

Query utility functions

Custom load shedding

Fairness of service with non-cooperative users

Scheduling CPU access vs. packet stream

Future Work

Distributed load shedding

Other system resources (memory, disk bandwidth, storage space)

11 / 11

Page 15: Load Shedding in Network Monitoring Applications

Introduction Load Shedding Evaluation

Availability

The source code of our load shedding prototype is publiclyavailable at http://loadshedding.ccaba.upc.edu

The CoMo monitoring system is available athttp://como.sourceforge.net

Acknowledgments

This work was funded by a University Research Grant awarded by the Intel Research Counciland the Spanish Ministry of Education under contract TEC2005-08051-C03-01

Authors would also like to thank the Supercomputing Center of Catalonia (CESCA) for givingthem access the Catalan RREN

11 / 11

Page 16: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Work Hypothesis

Our thesis

Cost of mantaining data structures needed to execute a query

can be modeled looking at a set of traffic features

Empirical observation

Different overhead when performing basic operations on thestate while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of

these operations

2 / 11

Page 17: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Work Hypothesis

Our thesis

Cost of mantaining data structures needed to execute a query

can be modeled looking at a set of traffic features

Empirical observation

Different overhead when performing basic operations on thestate while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of

these operations

Our method

Models queries’ cost by considering the right set of traffic features

2 / 11

Page 18: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Traffic Features vs CPU Usage

1800 2000 2200 2400 2600 2800 30001.4

1.6

1.8

2

2.2

2.4

2.6

2.8x 10

6

packets/batch

CP

U c

ycle

s

new_5tuple_flows < 500

500 ≤ new_5tuple_flows < 700

700 ≤ new_5tuple_flows < 1000

new_5tuple_flows ≥ 1000

Figure: CPU usage versus the number of packets and flows

3 / 11

Page 19: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Multiple Linear Regression (MLR)

Linear Regression Model

Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi , i = 1, 2, . . . ,n.

Yi = n observations of the response variable (measured cycles)

Xji = n observations of the p predictors (traffic features)

βj = p regression coefficients (unknown parameters to estimate)

εi = n residuals (OLS minimizes SSE)

Feature Selection

Variant of the Fast Correlation-Based Filter2 (FCBF)

Removes irrelevant and redundant predictors

Reduces significantly the cost of the MLR

2L. Yu and H. Liu. Feature Selection for High-Dimensional Data:

A Fast Correlation-Based Filter Solution. In Proc. of ICML, 2003.

4 / 11

Page 20: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

System Overview

Prediction and Load Shedding subsystem

1 Each 100ms of traffic is grouped into a batch of packets

2 The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3 The most relevant features are selected (using FCBF) to be used by the MLR

4 MLR predicts the CPU cycles required by the query to run

5 Load shedding is performed to discard a portion of the batch

6 CPU usage is measured (using TSC) and fed back to the prediction system

5 / 11

Page 21: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Load Shedding

When to shed load

When the prediction exceeds the available cycles

avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space

Overhead is measured using the time-stamp counter (TSC)

How and where to shed load

Packet and Flow sampling (hash based)

The same sampling rate is applied to all queries

How much load to shed

Maximum sampling rate that keeps CPU usage < avail_cycles

srate = avail_cyclespred_cycles

6 / 11

Page 22: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Load Shedding

When to shed load

When the prediction exceeds the available cycles

avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space

Overhead is measured using the time-stamp counter (TSC)

How and where to shed load

Packet and Flow sampling (hash based)

The same sampling rate is applied to all queries

How much load to shed

Maximum sampling rate that keeps CPU usage < avail_cycles

srate = avail_cyclespred_cycles

6 / 11

Page 23: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Load Shedding

When to shed load

When the prediction exceeds the available cycles

avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space

Overhead is measured using the time-stamp counter (TSC)

How and where to shed load

Packet and Flow sampling (hash based)

The same sampling rate is applied to all queries

How much load to shed

Maximum sampling rate that keeps CPU usage < avail_cycles

srate = avail_cyclespred_cycles

6 / 11

Page 24: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Load Shedding Algorithm

Load shedding algorithm (simplified version)

pred_cycles = 0;

foreach qi in Q dofi = feature_extraction(bi);

si = feature_selection(fi, hi);pred_cycles += mlr(fi , si , hi);

if avail_cycles < pred_cycles × (1 + error) thenforeach qi in Q dobi = sampling(bi , qi , srate);

fi = feature_extraction(bi);

foreach qi in Q doquery_cyclesi = run_query(bi , qi , srate);

hi = update_mlr_history(hi, fi , query_cyclesi);

7 / 11

Page 25: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Testbed Scenario

Equipment and network scenario

2 × Intel R© PentiumTM 4 running at 3 GHz

2 × Endace R© DAG 4.3GE cards

1 × Gbps link connecting Catalan RREN to Spanish NREN

Executions

Execution Date TimeLink load (Mbps)mean/max/min

predictive 24/Oct/06 9am:5pm 750.4/973.6/129.0original 25/Oct/06 9am:5pm 719.9/967.5/218.0reactive 05/Dec/06 9am:5pm 403.3/771.6/131.0

Queries (from the standard distribution of CoMo)

Name Description

application Port-based application classificationcounter Traffic load in packets and bytes

flows Per-flow counters

high-watermark High watermark of link utilization

pattern search Finds sequences of bytes in the payloadtop destinations List of the top-10 destination IPs

trace Full-payload collection

8 / 11

Page 26: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Packet Loss

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm0

1

2

3

4

5

6

7

8

9

10

11x 10

4

time

pa

cke

ts

Total

DAG drops

(a) Original CoMo

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm0

1

2

3

4

5

6

7

8

9

10

11x 10

4

time

pa

cke

ts

Total

DAG drops

Unsampled

(b) Reactive Load Shedding

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm0

1

2

3

4

5

6

7

8

9

10

11x 10

4

time

pa

cke

ts

Total

DAG drops

Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

9 / 11

Page 27: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Related Work

Network Monitoring Systems

Only consider a pre-defined set of metrics

Filtering, aggregation, sampling, etc.

Data Stream Management Systems

Define a declarative query language (small set of operators)

Operators’ resource usage is assumed to be known

Selectively discard tuples, compute summaries, etc.

10 / 11

Page 28: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Related Work

Network Monitoring Systems

Only consider a pre-defined set of metrics

Filtering, aggregation, sampling, etc.

Data Stream Management Systems

Define a declarative query language (small set of operators)

Operators’ resource usage is assumed to be known

Selectively discard tuples, compute summaries, etc.

Limitations

Restrict the type of metrics and possible uses

Assume explicit knowledge of operators’ cost and selectivity

10 / 11

Page 29: Load Shedding in Network Monitoring Applications

Appendix Backup Slides

Conclusions and Future Work

Effective load shedding methods are now a basic requirement

Rapidly increasing data rates, number of users and complexity of

analysis methods

Load shedding operates without knowledge of the traffic queries

Quickly adapts to overload situations by gracefully degrading

accuracy via packet and flow sampling

Operational results in a research ISP network show that:

The system is robust to severe overload

The impact on the accuracy of the results is minimized

Limitations and Future work

Load shedding methods for queries non robust against sampling

Load shedding strategies to maximize the overall system utility

Other system resources (memory, disk bandwidth, storage space)

11 / 11