turboflow: information rich flow record generation on … · 2020-02-06 · turboflow architecture:...

36
TurboFlow: Information Rich Flow Record Generation on Commodity Switches John Sonchack 1 , Adam J. Aviv 2 , Eric Keller 3 , Jonathan M. Smith 1 1 University of Pennsylvania, 2 USNA, 3 University of Colorado

Upload: others

Post on 31-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

TurboFlow: Information Rich Flow Record Generation on

Commodity SwitchesJohn Sonchack1, Adam J. Aviv2, Eric Keller3, Jonathan M. Smith1

1University of Pennsylvania, 2USNA, 3University of Colorado

Page 2: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Introduction: Network Monitoring with Flow Records

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

2

Page 3: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebugging

3

Introduction: Network Monitoring with Flow Records

Page 4: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebuggingLow throughput because packets dropped at switch

2!

4

Introduction: Network Monitoring with Flow Records

Page 5: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebuggingCongestion at

switch 2!

5

Introduction: Network Monitoring with Flow Records

Page 6: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebuggingCongestion at

switch 2!

6

Introduction: Network Monitoring with Flow Records

Page 7: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebuggingHosts 2 and 4 are in a botnet!

From botnet 7

Introduction: Network Monitoring with Flow Records

Page 8: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Flow record <srcIp, dstIp,

srcPort, dstPort, arrivalTs,

avgInterArrival, pktsDroppedCt, queueLen, …>

Traffic Engineering SecurityDebugging

3.2$Tb/s >$100$M$packets/s

>$10$M$flows/s

8

Introduction: Network Monitoring with Flow Records

Page 9: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

9

Flow Monitoring Switches: Prior Work

Sampled Packets InaccurateSampling Flow

Records

Page 10: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Custom Hardware Offloading

Server Offloading Expensive

Restrictive

Packets (or other records)

Flow Monitoring Switches: Prior Work

10

Sampled Packets InaccurateSampling

Flow Records

Flow Records

Packets (or other records)

Flow Records

Page 11: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?

Onboard MicroserversProgrammable Forwarding Engines

11

Page 12: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?

A : Flow record generation for multi-terabit rate traffic without sampling or offloading.

Onboard MicroserversProgrammable Forwarding Engines

12

Page 13: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

13

Programmable Forwarding

Engine

MicroserverTurboFlow

Flow Record Generation

Flow Records

Packets

Pre-aggregation

Partial Flow

Records

Introduction: TurboFlow

Page 14: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

• Introduction

• Architecture

• Evaluation

• Conclusion

14

Outline

Flow Records

Packets

Programmable Forwarding

Engine

MicroserverTurboFlow

Pre-aggregation

Partial Flow

Records

Flow Record Generation

Page 15: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

15

TurboFlow Architecture

Flow Records

Packets

Programmable Forwarding

Engine

MicroserverTurboFlow

Partial Flow

Records

Pre-aggregation

Flow Record Generation

Page 16: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

Background: Programmable Forwarding Engines

16

Stateful VariablesActionMatch

Page 17: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Background: Programmable Forwarding Engines

17

Packet Count

Average Interarrival

Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …

Match Stateful VariablesAction

Flow (IP 5-tuple)

A -> BE -> GF -> G

Switch CPU

Page 18: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Background: Programmable Forwarding Engines

18

Switch CPU

Packet Count

Average Interarrival

Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …

Match Stateful VariablesAction

Flow (IP 5-tuple)

A -> BE -> GF -> G

Table Manager

Rule installation rate: < 10 K / s

Flow arrival rate @ 1 Tb/s:

> 10,000 K / s

Page 19: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

TurboFlow Architecture: Using the FE Efficiently

19

Table Manager

Page 20: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

TurboFlow Architecture: Using the FE Efficiently

20

Current Flow

(IP 5-tuple)Packet Count

Average Interarrival

Match Stateful Variables

Page 21: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

TurboFlow Architecture: Using the FE Efficiently

21

Match Stateful Variables

Flow Key Hash

1234

Current Flow

(IP 5-tuple)Packet Count

Average Interarrival

Table Manager

Page 22: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

TurboFlow Architecture: Using the FE Efficiently

22

Flow Key (IP 5-tuple)

Packet Count

Average Interarrival

A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …

Match Stateful Variables

Flow Key Hash

1234

HASH

Current Flow

(IP 5-tuple)Packet Count

Average Interarrival

A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …

Tracked Flow: Update Counters

A->B

Page 23: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Forwarding Engine

Switch CPU

TurboFlow Architecture: Using the FE Efficiently

23

Match Stateful Variables

Flow Key Hash

1234

HASH

Tracked Flow: Update Counters

A->B

G->H

Untracked Flow: Replace colliding record, send it to CPU

Z -> Q: 9 10 ms …

Record Aggregator

Tracked Flow

(IP 5-tuple)Packet Count

Average Interarrival

A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …G -> H 1 — …

Page 24: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

24

TurboFlow Design

Flow Records

Packets

Programmable Forwarding

Engine

MicroserverTurboFlow

Partial Flow

Records

Pre-aggregation

Flow Record Generation

Page 25: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Count: 12Count: 2Count: 10

TurboFlow Architecture: Using the CPU Efficiently

25

Partial Flow

Records

Flow Records

Key Count

A->B 12

Flow Stats Dictionary

Page 26: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

TurboFlow Architecture: Using the CPU Efficiently

Partial Flow

Records

Flow Records

Count: 12Key Count

A->B 12

Flow Stats Dictionary

Count: 2Count: 10

26

Optimization Performance Vs. Baseline

baseline (std::unordered_map)

-

Reduce Pointer Operations

1.64X

Vectorize Key Comparison

3.79X

Batch and Prefetch

4.9X

average of 146 cycles spent per partial flow record.

Page 27: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

27

Outline

• Introduction

• Architecture

• Evaluation

• Conclusion

Flow Records

Packets

Programmable Forwarding

Engine

MicroserverTurboFlowOptimized

Flow Record Generation

Pre-aggregation

Partial Flow

Records

Page 28: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

28

Implementation and Evaluation

Benchmark Workloads • 10 Gb/s Internet

Router Traces (CAIDA 2015)

• 144 Node Simulated Datacenter Cluster (YAPS simulator)

ImplementationsP4 Switch

(3.2 Tb/s Barefoot Tofino)

P4 SmartNIC (40 Gb/s Netronome NFP)

Page 29: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

29

Implementation and Evaluation

Benchmark Workloads • 10 Gb/s Internet

Router Traces (CAIDA 2015)

• 144 Node Simulated Datacenter Cluster (YAPS simulator)

ImplementationsP4 Switch

(3.2 Tb/s Barefoot Tofino)

P4 SmartNIC (40 Gb/s Netronome NFP)

Page 30: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

30

Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links

0.0 0.2 0.4 0.6 0.8 1.00

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

Page 31: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

31

0.0 0.2 0.4 0.6 0.8 1.00

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

Partial aggregation using 5 MB of FE memory reduces workload by ~4X.

Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links

Page 32: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

32

0.0 0.2 0.4 0.6 0.8 1.00

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

1 2 3 4Switch CPU Cores

0

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

std::unordered map Fully Optimized

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

Optimizations improve performance by ~5X.

Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links

Page 33: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

33

0.0 0.2 0.4 0.6 0.8 1.00

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

1 2 3 4Switch CPU Cores

0

10

20

30

40

Parti

alFl

owR

ecor

dpe

rSec

ond

(Mill

ions

)

std::unordered map Fully Optimized

No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)

FE pre-aggregation + optimizations = terabit rate workloads using 1 core and ~26% of FE memory.

Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links

Page 34: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

34

Outline

• Introduction

• TurboFlow Design

• Implementation and Evaluation

• Conclusion

Page 35: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

In the Paper

Cost analysis

More interesting flow features Pipeline layouts

Psuedocode

Expected worst case analysis

More Evaluation

35

Page 36: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count

Conclusion (and Thank You for Listening!)

• Flow records are important for monitoring, but difficult to generate at the switch due to high traffic rates.

• TurboFlow is a flow record generator carefully optimized for next generation commodity switch hardware that scales to multi-terabit rate traffic without sampling.

36