an analysis of sampling effects on graph structures derived from network flow data
DESCRIPTION
An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data. Mark Meiss Advanced Network Management Laboratory Indiana University. Quick Overview. Why this study? Existing work focuses on the effects of sampling on individual flows or distributions of flows. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/1.jpg)
An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data
Mark MeissAdvanced Network Management
LaboratoryIndiana University
![Page 2: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/2.jpg)
Quick Overview
Why this study? Existing work focuses on the effects
of sampling on individual flows or distributions of flows.
Open question: How are graph structures built from flow data affected?
![Page 3: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/3.jpg)
Quick Overview
Building graphs from flow data Basic graph properties Methodology Experiments Results Take-home message: Aggregation
matters and is not your enemy.
![Page 4: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/4.jpg)
Background
“graph structures derived from network flow data”… ?
![Page 5: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/5.jpg)
Basic network
![Page 6: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/6.jpg)
Degree
![Page 7: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/7.jpg)
Clustering Coefficient
![Page 8: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/8.jpg)
Betweenness
![Page 9: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/9.jpg)
Motifs
![Page 10: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/10.jpg)
Weighted network
![Page 11: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/11.jpg)
Strength
![Page 12: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/12.jpg)
Directed network
![Page 13: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/13.jpg)
Applications
Modeling and prediction Anomaly detection Application classification Capacity planning Community identification (etc.)
![Page 14: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/14.jpg)
Motivation
So what does packet sampling have to do with this?
Isn’t knowing p(sample) = 0.01 good enough?
![Page 15: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/15.jpg)
Motivation
![Page 16: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/16.jpg)
Motivation The distributions of degree and
strength for large-scale network data generally obey a power law:
xx)Pr(
![Page 17: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/17.jpg)
Motivation
The exact value matters!
![Page 18: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/18.jpg)
Methodology
Internet2 / Abilene used as testbed Generate UDP traffic and analyze
its traces in Abilene netflow-v5 data
![Page 19: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/19.jpg)
Flow Generation Language (FGL) FGL is a scripting language for quick and easy
traffic generation:println("Bias study #4 (2008-12-10)");println();println("This FGL code will generate 100 128-byte packets to each UDP port");println("in the range 10100-10199 on the hosts 64.57.17.200 - 64.57.17.209.");println();
x = proc(pkt) begin println("Emitting 100 of ", pkt); notate(pkt); emit(pkt, 100, 0.02); delay(0.10);end;
port = range(10100, 10199);host = range(start:ip("64.57.17.200"), end:ip("64.57.17.209"));
xip = [ ip_header(src:ip("156.56.103.1"), dst:@host) ];xudp = [ udp_header(src_port:0, dst_port:@port) ];xpacket = [ udp(@xip, @xudp, size:128, data:"This is a test.") ];
output("bias-study-4.event");x(@xpacket);
![Page 20: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/20.jpg)
Experiment #1
Note: p(sample) = 0.03.
Generate flows of lengths between 1 and 200 packets; find chance of detection.
![Page 21: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/21.jpg)
![Page 22: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/22.jpg)
Experiment #2
Try to recover a power law, gamma = 2.
Send to each of 10 hosts: 256 10-packet flows 128 20-packet flows 64 40-packet flows (etc.)
![Page 23: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/23.jpg)
![Page 24: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/24.jpg)
![Page 25: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/25.jpg)
Experiment #3
Second attempt to recover gamma = 2:
Send to each of 10 hosts: 2048 10-packet flows 1024 20-packet flows 512 40-packet flows (etc.)
![Page 26: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/26.jpg)
![Page 27: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/27.jpg)
Experiment #4
Third attempt to recover gamma = 2:
Send to each of 10 hosts: 1024 100-packet flows 512 200-packet flows 256 400-packet flows (etc.)
![Page 28: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/28.jpg)
![Page 29: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/29.jpg)
![Page 30: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/30.jpg)
Result
A preponderance of very small flows will lead to an overestimate of the exponent.
All flows smaller than a critical threshold are statistically indistinguishable.
![Page 31: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/31.jpg)
![Page 32: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/32.jpg)
![Page 33: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/33.jpg)
Result
With sufficiently large flow size, a range of exponents can be recovered reliably.
![Page 34: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/34.jpg)
Is this a problem?
What if we don’t have sufficiently large flow size?
![Page 35: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/35.jpg)
Aggregation
Aggregation is necessary for accurate results!
Flows repeat themselves. Coalescing flows with identical
endpoints allows us to distinguish smaller flows.
![Page 36: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/36.jpg)
Aggregation
Failure to aggregate on the experiments described causes an over-estimate of about 0.2.
This can make a large difference for modeling!
![Page 37: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/37.jpg)
Conclusions
Given appropriate aggregation, packet sampling does not affect the large-scale properties of graphs derived from flow data.
The effectiveness of aggregation in mitigating small-flow effects depends on repeated activity.
![Page 38: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/38.jpg)
Future Work
Effects on other properties (clustering, centrality, spectral).
Effects on network growth models (preferential attachment, etc.).
Effects on traffic models (PageRank, other Markov models).
![Page 39: An Analysis of Sampling Effects on Graph Structures Derived from Network Flow Data](https://reader036.vdocuments.net/reader036/viewer/2022070406/56814241550346895dae650b/html5/thumbnails/39.jpg)
Thank you!
Any questions or observations?