george varghese (based on cristi estan’s work) university of california, san diego may 2011...

44
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Upload: jared-rich

Post on 24-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

George Varghese (based on Cristi Estan’s work)

University of California, San DiegoMay 2011

Internet traffic measurement:

from packets to insight

Page 2: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Research motivation

The Internet in 1969 The Internet today

Problems Flexibility, speed, scalability

Overloads, attacks, failures

Measurement & control

Ad-hoc solutions suffice

Engineered solutions needed

Research direction: towards a theoretical foundation for systems doing engineered measurement of the Internet

Page 3: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Current solutions

AnalysisServer

Raw dataTraffic

reports

Network OperatorRouter

Fast link

Memory

Network

State of the art: simple counters (SNMP), time series plots of traffic (MRTG), sampled packet headers (NetFlow), top k reports

Concise?Accurate?

Page 4: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Measurement challenges

• Data reduction – performance constraints Memory (Terabytes of data each hour) Link speeds (40 Gbps links) Processing (8 ns to process a packet)

• Data analysis – unpredictability Unconstrained service model (e.g. Napster, Kazaa ) Unscrupulous agents (e.g. Slammer worm) Uncontrolled growth (e.g. user growth)

Page 5: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Main contributions

• Data reduction: Algorithmic solutions for measurement building blocks Identifying heavy hitters (part 1 of talk) Counting flows or distinct addresses

• Data analysis: Traffic cluster analysis automatically finds the dominant modes of network usage (part 2 of talk) AutoFocus traffic analysis system used by

hundreds of network administrators

Page 6: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Identifying heavy hitters

AnalysisServer

Raw data

Traffic

reports

Router

Fast link

Memory

Network

Identifying heavy hitters with multistage

filters

Network Operator

Page 7: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Why are heavy hitters important?

• Network monitoring: Current tools report top applications, top senders/receivers of traffic

• Security: Malicious activities such as worms and flooding DoS attacks generate much traffic

• Capacity planning: Largest elements of traffic matrix determine network growth trends

• Accounting: Usage based billing most important for most active customers

Page 8: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Problem definition

• Identify and measure all streams whose traffic exceeds threshold (0.1% of link capacity) over certain time interval (1 minute) Streams defined by fields (e.g. destination IP) Single pass over packets Small worst case per packet processing Small memory usage Few false positives / false negatives

Page 9: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Measuring the heavy hitters

• Unscalable solution: keep hash table with a counter for each stream and report largest entries

• Inaccurate solution: count only sampled packets and compensate in analysis

• Ideal solution: count all packets but only for the heavy hitters

• Our solution: identify heavy hitters on the fly Fundamental advantage over sampling –

instead of (M is available memory)

Page 10: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Why is sample & hold better?

uncertainty uncertainty uncertainty

uncertainty

Sample and hold

Ordinary sampling

Page 11: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

How do multistage filters work?

Array of counters

Hash(Pink)

Page 12: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

How do multistage filters work?

Collisions are OK

Page 13: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

How do multistage filters work?

Stream memory

stream1 1

Insert

Reached threshold

stream2 1

Page 14: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Stage 2

How do multistage filters work?

Stream memory

stream1 1

Stage 1

Page 15: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Conservative update

Gray = all prior packets

Page 16: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Conservative update

Redundant

Redundant

Page 17: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Conservative update

Page 18: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Multistage filter analysis

• Question: Find probability that a small stream (0.1% of traffic) passes filter with d = 4 stages * b = 1,000 counters, threshold T = 1%

• Analysis: (any stream distribution & packet order) can pass a stage if other streams in its bucket ≥ 0.9% of traffic at most 111 such buckets in a stage => probability of passing

one stage ≤ 11.1% probability of passing all 4 stages ≤ 0.1114 = 0.015% result tight

Page 19: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Multistage filter analysis results

• d – filter stages• T – threshold• h=C/T, (C capacity)• k=b/h, (b buckets)

• n – number of streams

• M – total memory

Quantity Result

Probability to pass filter

Streams passing

Relative error

Page 20: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bounds versus actual filtering

Number of stages

Average

probability

of passing

filter for

small

streams

(log scale)

Worst case boundWorst case bound

Zipf boundZipf bound

ActualActual

Conservative updateConservative update

1 2 3 4

1

0.1

0.01

0.001

0.0001

0.00001

Page 21: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Comparing to current solution

• Trace: 2.6 Gbps link, 43,000 streams in 5 seconds

• Multistage filters: 1 Mbit of SRAM (4096 entries)

• Sampling: p=1/16, unlimited DRAM

Average absolute error / average stream size

Stream size Multistage filters Sampling

s > 0.1% 0.01% 5.72%

0.1% ≥ s > 0.01% 0.95% 20.8%

0.01% ≥ s > 0.001% 39.9% 46.6%

Page 22: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Summary for heavy hitters

• Heavy hitters important for measurement processes

• More accurate results than random sampling: . instead of

• Multistage filters with conservative update outperform theoretical bounds

• Prototype implemented at 10 Gbps?

Page 23: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Building block 2, counting streams

• Core idea Hash streams to bitmap and count bits set

Sample bitmap to save memory and scale

Multiple scaling factors to cover wide ranges

• Result Can count up to 100 million streams with an

average error of 1% using 2 Kbytes of memory

Accurate for 16-32 streams

8-15 streams

0-7 streams

Page 24: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

Does not work if there are too many flows

Hash based on flow identifier

Estimate based on the number of bits set

Page 25: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

Bitmap takes too much memory

Increase bitmap size

Page 26: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

Too inaccurate if there are few flows

Store only a sample of the bitmap and extrapolate

Page 27: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

Must update multiple bitmaps for each packet

Use multiple bitmaps, each accurate over a different range

Accurate if number of flows is 16-32

8-15

0-7

Page 28: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

16-32

8-15

0-7

Page 29: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Bitmap counting

Multiresolution bitmap

0-32

Page 30: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Future work

Page 31: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Traffic cluster analysis

AnalysisServer

Raw data

Traffic

reports

Router

Fast link

Memory

NetworkNetwork Operator

Part 2: Describing traffic with traffic cluster analysisPart 1: Identifying heavy

hitters, counting streams

Page 32: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Finding heavy hitters not enough

Rank Destination IP Traffic

1 jeff.dorm.bigU.edu 11.9%

2 lisa.dorm.bigU.edu 3.12%

3 risc.cs.bigU.edu 2.83%

Most traffic goes to the dorms …

Rank Dest. network Traffic

1 library.bigU.edu 27.5%

2 cs.bigU.edu 18.1%

3 dorm.bigU.edu 17.8%

Where does the traffic come

from?……

What apps are used? Which

network uses web and which

one kazaa?

• Aggregating on individual fields useful but Traffic reports often not at right granularity

Cannot show aggregates over multiple fields

• Traffic analysis tool should automatically find aggregates over right fields at right granularity

Rank Source IP Traffic

1 forms.irs.gov 13.4%

2 ftp.debian.org 5.78%

3 www.cnn.com 3.25%

Rank Source Network Traffic

1 att.com 25.4%

2 yahoo.com 15.8%

3 badU.edu 12.2%

Rank Application Traffic

1 web 42.1%

2 ICMP 12.5%

3 kazaa 11.5%

Page 33: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Ideal traffic report

Traffic aggregate Traffic

Web traffic 42.1%

Web traffic to library.bigU.edu 26.7%

Web traffic from forms.irs.gov 13.4%

ICMP from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9%

Web is the dominant

applicationThe library is a

heavy user of webThat’s a big flash

crowd!

This is a Denial of Service attack !!

Traffic cluster reports try to give insights into the structure of the traffic mix

Page 34: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Definition

• A traffic report gives the size of all traffic clusters above a threshold T and is:

Multidimensional: clusters defined by ranges from natural hierarchy for each field

Compressed: omits clusters whose traffic is within error T of more specific clusters in the report

Prioritized: clusters have unexpectedness labels

Page 35: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Unidimensional report example

10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14

15 35 30 40 160 110 35 75

Threshold=100Hierarchy

50 70 270 35 75

7530550 70

120 380

500

160 110

270

305

120 380

500

10.0.0.2/31 10.0.0.4/31 10.0.0.8/31 10.0.0.10/31

10.0.0.0/30 10.0.0.4/30 10.0.0.8/30

10.0.0.0/29 10.0.0.8/29

10.0.0.0/28

10.0.0.14/31

10.0.0.12/30

AI Lab

2nd

floor

CS Dept

Page 36: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

270

120

500

305

380

160 110

Unidimensional report example

10.0.0.8 10.0.0.9

10.0.0.0/29 10.0.0.8/29

10.0.0.8/31

10.0.0.8/30

10.0.0.0/28

120 380

160 110

Compression

305-270<100

380-270≥100

Source IP Traffic

10.0.0.0/29

120

10.0.0.8/29

380

10.0.0.8 160

10.0.0.9 110

Rule: omit clusters with

traffic within error T of

more specific clusters in

the report

Page 37: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Multidimensional structure

All traffic All traffic

US EU

CA NY FR RU

Web Mail

Source net Application

All traffic

EU

RU

Mail

RU Mail

RU Web

Page 38: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

AutoFocus: system structure

Trafficparser

Web basedGUI

Cluster miner

Grapher

Packet header trace / NetFlow data

categories

names

Page 39: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Traffic reports for weeks, days, three

hour intervals and half hour

intervals

Page 40: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight
Page 41: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Colors – user defined traffic categories

Separate reports for each category

Page 42: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Analysis of unusual events

• Sapphire/SQL Slammer worm Found worm port and protocol automatically

Page 43: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Analysis of unusual events

• Sapphire/SQL Slammer worm Identified infected hosts

Page 44: George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight

Related work

• Databases [FS+98] Iceberg Queries Limited analysis, no conservative update

• Theory [GM98,CCF02] Synopses, sketches Less accurate than multistage filters

• Data Mining [AIS93] Association rules No/limited hierarchy, no compression

• Databases [GCB+97] Data cube No automatic generation of “interesting” clusters