1 ssa: a power and memory efficient scheme to multi-match packet classification fang yu 1 t. v....

23
1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS Department, UC Berkeley , 2 Bell Laboratories, Lucent Technologies

Upload: victoria-gray

Post on 26-Mar-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

1

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification

Fang Yu1 T. V. Lakshman2 Marti Austin Motoyama1 Randy H. Katz1

1EECS Department, UC Berkeley , 2Bell Laboratories, Lucent Technologies

Page 2: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

2

Outline

• Introduction to multi-match classification

• Multi-match classification using TCAM– May consume a large amount of TCAM memory– May consume high power

• Set Splitting Algorithm (SSA)– A memory and power efficient scheme for multi-match

classification

• Simulation results

• Conclusions

Page 3: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

3

• Single-Match classification – Assumption: all the filters are associated with priorities– Only the highest priority match matters– E.g., longest prefix match

udp $EXTERNAL_NET any-> $HOME_NET 1434

content:"|04|"; depth:1;content:"|81 F1 03 01 04

9B 81 F1 01|";content:"sock";content:"send"

udp $EXTERNAL_NET any -> $HOME_NET any

content:"|00 01 86 A9|";offset:12; depth:4;

content:"|00 00 00 01|";distance:4; within:4;

byte_jump:4,4,relative,align;byte_jump:4,4,relative,align;byte_test:4,>,64,0,relative;

content:"|00 00 00 00|";offset:4; depth:4; sid:2027;

rev:4;

A rule for MS-SQLWorm detection.

A rule for RPC oldpassword overflow attempt

Packet header Packet Payload

• Multi-Match classification– Report all matching results– No priority among filters– Intrusion detection system:

identify all the related rules– Also required by accounting

applications

Packet Classification

Page 4: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

4

Ternary-CAM (TCAM)

• Fully associative memory: compare input string with all the entries in parallel– For multiple matches, report the

index of the first match

• Each cell takes one of three logic states – ‘0’, ‘1’, and ‘?’(don’t care)

192.128.101.100

168.100.???.???

192.128.???.???

Match192.128.101.???

Input

TCAM

entry

cell

width

Page 5: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

5

Challenges of Multi-match Classification using TCAM

• Memory efficient– 9Mbits – 18Mbits priced at $200-$300

• Power efficient

• Easy update

• High speed– TCAM is fast e.g., 4 ns, However, TCAM only returns

the first match result – We want all the matching results within a few cycles

• If returning a bit vector of the matching result?– Processing the bit vector can take time if the bit vector is long– Not efficient it is a sparse vector in most of the cases

0

2

4

6

8

10

12

0 20,000 40,000 60,000Number of Entries Searched in Parallel

Po

we

r (i

n w

att

s) 250 Million Lookups Per Second

207 Million Lookups Per Second165 Million Lookups Per Second125 Million Lookups Per Second

Page 6: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

6

Previous Solutions: Geometric Intersection-based Solution [Hot Interconnects 04]

• Add additional intersection filters– High speed

• Return all the matching results within one cycle

– Memory efficient • Create ~10N intersection filters

for the Snort rule set• May create O(NF) intersection

filters in the worst case

– Energy efficient

– Easily updatable

tcp $SQL_SERVER 1433$EXTERNAL_NET 139

tcp any any any 139

Match

tcp $SQL_SERVER 1433$EXTERNAL_NET any

Input

TCAMStores Rules

Filter 1

Filter 2

SRAM

Stores Match list(Index of rule)

tcp $SQL_SERVER 1433$EXTERNAL_NET 139

Filter 1&2

Page 7: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

7

Previous Solution: MUD [ Sigcomm’05]

• Encode the index of the entry and include the encoded value in each TCAM entry – Search the TCAM with initial MUD as all don’t

cares– After finding a matching result at index j, search

again with discriminator field value ‘> j’

Filter 3Filter 2

Filter 1

Packet InfoInput

TCAM

00110010

0001

Discrim-inators

Page 8: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

8

Previous Solution: MUD (Cont.)

• High speed– 1+d+(k-2)*(d-1) = O(dk) TCAM lookups to get k matching results

• d is the logarithm of the number of entries in TCAM (d=log2N)• Decreased to 1+d*(k-1)/r with DIRPE, where r (smaller than d)

• Memory efficient

• Energy efficient– All the entries in TCAMs are accessed each time high power

consumption.

• Easily updatable

Our Goal: Find a memory and power efficient solution

Page 9: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

9

Observation

• Split filters into two sets to reduce intersections – Report the union of results from all sets– No need to include the intersections of the filters from different sets – Decrease the number of filters in TCAM, decrease power consumption– Increase the number of TCAM access

N filters +O(N2) intersection1 TCAM lookup

N filters + 1 intersection2 TCAM lookups

Original Two sets

F1

FN

Matching F1 and FN

Matching F1

Matching FN

Page 10: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

10

Problem Definition

• Given a set of filters F(F1,F2, …., FN)

• Filters create a set of intersections I(I1,I2, …., IM)– e.g., I1= intersection of (F1, F5, F6)

• How to divide the filters into several sets – Residual intersection set I’: intersections from filters

in the same set– N + |I’| < TCAM size– Number of sets (TCAM accesses) is minimum – NP hard problem!

Page 11: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

11

Split Rules into Two Sets

• Still an NP hard problem (known as maximum set splitting or maximum hypergraph cut )

• Best known approximation algorithms– Yield a performance ratio of 0.72 to the optimum

solution– Require quadratic programming slow when the

number of filters is large

• Our SSA algorithm– Remove at least half of the intersections– O(NM) complexity, where N is the total number of

filters, and M is the total number of intersections

Page 12: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

12

Maximum Satisfiability Problem

• Maximum Satisfiability Problem– A set of literals {F1, F1, F2, F2,.., FN, FN}

– A set of clauses, each clause is a subset of literals

• E.g., C1={F1 F5 F6}

– Goal: Find an assignment of F to satisfy a maximum number of clauses

Page 13: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

13

Johnson’s Algorithm to Maximum Satisfiability Problem

• Assign each clause a weight = 2-|c|

• E.g., weight of C1={F1, F5 F6} is 2-3

• Let Fi be any literal which hasn’t been assigned a value yet– If the weight of all clauses containing Fi is higher than

those containing Fi

• Assign Fi a true value and remove all clauses containing Fi

• Multiply the weight of all the clauses containing Fi by 2

– Otherwise• Assign Fi a false value and remove all clauses containing Fi

• Multiply the weight of all the clauses containing Fi by 2

Page 14: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

14

Johnson’s Theorem

• If all the clauses have at least k literals– Johnson’s algorithm can satisfy at least

(2K-1)/ 2K percent of the total clauses – e.g., k=2, satisfy at least ¾ of the clauses– It is proved that (2K-1)/ 2K is the best

approximation bound for k>2

Page 15: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

15

Filter Set Split Algorithm (SSA)

• Convert set splitting problem into maximum satisfiablity problem– Each filter corresponds to a literal

– For any intersection (e.g., I1= intersection of F1,, F5, and F6), add two clauses

• C={F1, F5 F6} and C’={F1, F5 F6}

• Total number of clauses is 2M, M is the number of intersections

• Run Johnson’s algorithm and assign each filter Fi either a true (put in set one) or a false value (put in set two)

Page 16: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

16

Filter Set Split Algorithm (SSA) (cont.)

• According to Johnson’s theorem– At least ¾ of the clauses are satisfied

2M*3/4=1.5M

At least 0.5M of the intersections have both clauses satisfied

• Suppose for intersection of F1,, F5, and F6 , C={F1 F5 F6}

and C’={F1 F5 F6} both are satisfied

• At least one of F1,, F5, F6 is true and at least one is false• F1,, F5, F6 are split into different sets, thus this intersection

doesn’t need to be presented in TCAM

At least 50% of the intersections are removed!

Page 17: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

17

Review of the SSA Scheme

• High speed– Deterministic lookup rate. E.g., if filters are split into two sets, only 2

TCAM lookups per packet are needed. – Sets are logically independent Lookups can be parallelized

• Memory efficient– Guarantee the removal of at least 50% of the intersections each time the filter

set is split into two sets

• Energy efficient– Low memory requirement– Access each filter only once per packet

• Easily updatable– Updates can be inserted to one of the set that creates the least number

of intersections

Page 18: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

18

Simulation Setup

• Tests on the Snort rule header sets– Compare SSA with two TCAM-based solutions:

• MUD• Geometric Intersection-based solution

– Compare SSA with two representative software-based solutions:

• Hicuts • EGT-PC

– Evaluation metrics• Memory consumption• Lookup rate• Power consumption• Update cost

Page 19: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

19

Memory Usage

Total number of extra intersections filters in TCAMs.

VersionGeometric

Intersection-based

SSA-2 SSA-4

Extra Intersections Saving Extra Intersections Saving

2.0.0 3453 46 98.67% 1 99.97%

2.0.1 3754 47 98.75% 1 99.97%

2.1.0 3758 47 98.75% 0 100%

2.1.1 4067 55 98.65% 0 100%

Total number of TCAM entries used.

Version MUDGeometric

Intersection-based SSA-2 SSA-4

2.0.0 240 3693 286 241

2.0.1 255 4009 302 256

2.1.0 257 4015 304 257

2.1.1 263 4330 318 263

Page 20: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

20

Classification Speed

• MUD– One packet may match up to 12 unique filters, and requires a maximum of 20

TCAM lookups– Common packets like http packets match 4 unique filters and may require 5-9

TCAM lookups. A Napster packet requires 9 to 15 TCAM lookups

• Geometric Intersection-based solution– 1 TCAM lookup per packet

• SSA-2– 2 TCAM lookups per packet

• SSA-4– 4 TCAM lookups per packet– If average packet size is 402.7 bytes, SSA-4 operates at 201.35 Gbps

classification rate – Worst case, if every packet is 40 bytes, SSA-4 achieves 20Gbps rate

Page 21: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

21

Update Cost

Version MUD

GeometricIntersection-

basedSSA-2 SSA-4

Avg Max Avg Max Avg Max

2.0.0 1 31.73 157 1.33 17 1.002 2

2.0.1 1 35.24 135 1.34 19 1 1

2.1.0 1 34.71 135 1.36 20 1.002 2

2.1.1 1 36.00 172 1.41 26 1.006 2

• Update cost in terms of newly inserted filters

Page 22: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

22

Power Consumption

0

1000

2000

3000

4000

5000

6000

7000

8000

2.0.0 2.0.1 2.1.0 2.1.1Snort version

Nu

mb

er

of

TC

AM

en

trie

s

ac

ce

ss

ed

pe

r p

ac

ke

t

MUD (HTTP Packets)MUD (Napster Packets)MUD (worst case)Geometric Intersection-based SSA-2SSA-4

• Energy used by a TCAM is linear to – The number of entries searched in parallel – The number of TCAM accesses per packet

• Metric: total TCAM entries accessed per packet

Page 23: 1 SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Marti Austin Motoyama 1 Randy H. Katz 1 1 EECS

24

Conclusions

• SSA is a memory and power efficient solution to multi-match classification problem– O(NM) complexity– Guarantee to remove 50% of the intersections each

time the filter set splits– Comparing to MUD

• Use a similar amount of TCAM memory • Yield a 75% to 95% reduction in power consumption

– Comparing to the Geometric Intersection-based Solution

• Use 90% less TCAM memory and power • Require one additional TCAM lookup per packet