packet level algorithms michael mitzenmacher. goals of the talk consider algorithms/data structures...

78
Packet Level Algorithms Michael Mitzenmacher

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Packet Level Algorithms

Michael Mitzenmacher

Page 2: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Goals of the Talk

• Consider algorithms/data structures for measurement/monitoring schemes at the router level.– Focus on packets, flows.

• Emphasis on my recent work, future plans.– “Applied theory”.

• Less on experiments, more on design/analysis of data structures for applications.

– Hash-based schemes • Bloom filters and variants.

Page 3: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Vision

• Three-pronged research data.

• Low: Efficient hardware implementations of relevant algorithms and data structures.

• Medium: New, improved data structures and algorithms for old and new applications.

• High: Distributed infrastructure supporting monitoring and measurement schemes.

Page 4: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Background / Building Blocks

• Multiple-choice hashing

• Bloom filters

Page 5: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Multiple Choices: d-left Hashing

• Split hash table into d equal subtables.• To insert, choose a bucket uniformly for each subtable.• Place item in a cell in the least loaded bucket, breaking ties to the

left.

Page 6: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Properties of d-left Hashing

• Analyzable using both combinatorial methods and differential equations.– Maximum load very small: O(log log n).– Differential equations give very, very accurate

performance estimates.

• Maximum load is extremely close to average load for small values of d.

Page 7: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Example of d-left hashing

• Consider 3-left performance.

Average load 4

Average load 6.4

Load 0 2.3e-05

Load 1 6.0e-04

Load 2 1.1e-02

Load 3 1.5e-01

Load 4 6.6e-01

Load 5 1.8e-01

Load 6 2.3e-05

Load 7 5.6e-31

Load 0 1.7e-08

Load 1 5.6e-07

Load 2 1.2e-05

Load 3 2.1e-04

Load 4 3.5e-03

Load 5 5.6e-02

Load 6 4.8e-01

Load 7 4.5e-01

Load 8 6.2e-03

Load 9 4.8e-15

Page 8: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Example of d-left hashing• Consider 4-left performance with average load of 6, using differential equations.

Insertions onlyAlternating insertions/deletions

Steady stateLoad > 1 1.0000

Load > 2 1.0000

Load > 3 1.0000

Load > 4 0.9999

Load > 5 0.9971

Load > 6 0.8747

Load > 7 0.1283

Load > 8 1.273e-10

Load > 9 2.460e-138

Load > 1 1.0000

Load > 2 0.9999

Load > 3 0.9990

Load > 4 0.9920

Load > 5 0.9505

Load > 6 0.7669

Load > 7 0.2894

Load > 8 0.0023

Load > 9 1.681e-27

Page 9: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Review: Bloom Filters

• Given a set S = {x1,x2,x3,…xn} on a universe U, want to answer queries of the form:

• Bloom filter provides an answer in– “Constant” time (time to hash).– Small amount of space.– But with some probability of being wrong.

• Alternative to hashing with interesting tradeoffs.

.SyIs

Page 10: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Bloom FiltersStart with an m bit array, filled with 0s.

Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0B

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B

To check if y is in S, check B at Hi(y). All k values must be 1.

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0BPossible to have a false positive; all k values are 1, but y is not in S.

n items m = cn bits k hash functions

Page 11: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

False Positive Probability

• Pr(specific bit of filter is 0) is

• If is fraction of 0 bits in the filter then false positive probability is

• Approximations valid as is concentrated around E[]. – Martingale argument suffices.

• Find optimal at k = (ln 2)m/n by calculus.– So optimal fpp is about (0.6185)m/n

kckkkk pp )e1()1()'1()1( /

n items m = cn bits k hash functions

pmp mknkn /e)/11('

Page 12: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Example

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 1 2 3 4 5 6 7 8 9 10

Hash functions

Fal

se p

osit

ive

rate m/n = 8

Opt k = 8 ln 2 = 5.45...

n items m = cn bits k hash functions

Page 13: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Handling Deletions

• Bloom filters can handle insertions, but not deletions.

• If deleting xi means resetting 1s to 0s, then deleting xi will “delete” xj.

0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B

xi xj

Page 14: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Counting Bloom Filters

Start with an m bit array, filled with 0s.

Hash each item xj in S k times. If Hi(xj) = a, add 1 to B[a].

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0B

0 3 0 0 1 0 2 0 0 3 2 1 0 2 1 0B

To delete xj decrement the corresponding counters.

0 2 0 0 0 0 2 0 0 3 2 1 0 1 1 0B

Can obtain a corresponding Bloom filter by reducing to 0/1.

0 1 0 0 0 0 1 0 0 1 1 1 0 1 1 0B

Page 15: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Counting Bloom Filters: Overflow

• Must choose counters large enough to avoid overflow.

• Poisson approximation suggests 4 bits/counter.– Average load using k = (ln 2)m/n counters is ln 2. – Probability a counter has load at least 16:

• Failsafes possible.• We assume 4 bits/counter for comparisons.

17E78.6!16/)2(ln 162ln e

Page 16: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Bloomier Filters

• Instead of set membership, keep an r-bit function value for each set element.– Correct value should be given for each set element.

– Non-set elements should return NULL with high probability.

• Mutable version: function values can change.– But underlying set can not.

• First suggested in paper by Chazelle, Kilian, Rubenfeld, Tal.

Page 17: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

From Low to High

• Low– Hash Tables for Hardware

– New Bloom Filter/Counting Bloom Filter Constructions (Hardware Friendly)

• Medium– Approximate Concurrent State Machines

– Distance-Sensitive Bloom Filters

• High– A Distributed Hashing Infrastructure

Page 18: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Low Level : Better Hash Tables for Hardware• Joint work with Adam Kirsch.

– Simple Summaries for Hashing with Choices.– The Power of One Move: Hashing Schemes for

Hardware.

Page 19: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Perfect Hashing Approach

Element 1 Element 2 Element 3 Element 4 Element 5

Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3)

Page 20: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Near-Perfect Hash Functions

• Perfect hash functions are challenging.– Require all the data up front – no insertions or

deletions.– Hard to find efficiently in hardware.

• In [BM96], we note that d-left hashing can give near-perfect hash functions.– Useful even with insertions, deletions.– Some loss in space efficiency.

Page 21: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Near-Perfect Hash Functions via d-left Hashing

• Maximum load equals 1– Requires significant space to avoid all

collisions, or some small fraction of spillovers.

• Maximum load greater than 1– Multiple buckets must be checked, and multiple

cells in a bucket must be checked.– Not perfect in space usage.

• In practice, 75% space usage is very easy.• In theory, can do even better.

Page 22: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Hash Table Design : Example

• Desired goals:– At most 1 item per bucket.– Minimize space.

• And minimize number of hash functions.

– Small amount of spillover possible.• We model as a constant fraction, e.g. 0.2%.

• Can be placed in a content-addressable memory (CAM) if small enough.

Page 23: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Basic d-left Scheme

• For hash table holding up to n elements, with max load 1 per bucket, use 4 choices and 2n cells.– Spillover of approximately 0.002n elements

into CAM.

Page 24: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Improvements from Skew

• For hash table holding up to n elements, with max load 1 per bucket, use 4 choices and 1.8n cells.– Subtable sizes 0.79n, 0.51n, 0.32n, 0.18n.

– Spillover still approximately 0.002n elements into CAM.

– Subtable sizes optimized using differential equations, black-box optimization.

xk

Page 25: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Summaries to Avoid Lookups

• In hardware, d choices of location can be done by parallelization.– Look at d memory banks in parallel.

• But there’s still a cost: pin count.• Can we keep track of which hash function to use for

each item, using a small summary?– Yes: use a Bloom-filter like structure to track.

• Skew impacts summary performance; more skew better.

– Uses small amount of on-chip memory.– Avoids multiple look-ups.– Special case of a Bloomier filter.

Page 26: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Hash Tables with Moves

• Cuckoo Hashing (Pagh, Rodler)– Hashed items need not stay in their initial place.– With multiple choices, can move item to

another choice, without affecting lookups.• As long as hash values can be recomputed.

– When inserting, if all spots are filled, new item kicks out an old item, which looks for another spot, and might kick out another item, and so on.

Page 27: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Benefits and Problems of Moves

• Benefit: much better space utilization.– Multiple choices, multiple items per bucket, can

achieve 90+% with no spillover.

• Drawback: complexity.– Moves required can grow like log n.

• Constant on average.

– Bounded maximum time per operation important in many settings.

– Moves expensive.• Table usually in slow memory.

Page 28: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Question : Power of One Move

• How much leverage do we get by just allowing one move?– One move likely to be possible in practice.– Simple for hardware.– Analysis possible via differential equations.

• Cuckoo hard to analyze.

– Downside : some spillover into CAM.

Page 29: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Comparison, Insertions Only• 4 schemes

– No moves– Conservative : Place item if possible. If not, try to move earliest item that

has not already replaced another item to make room. Otherwise spill over.– Second chance : Read all possible locations, and for each location with an

item, check it it can be placed in the next subtable. Place new item as early as possible, moving up to 1 item left 1 level.

– Second chance, with 2 per bucket.

• Target of 0.2% spillover.• Balanced (all subtables the same) and skewed compared.• All done by differential equation analysis (and

simulations match).

Page 30: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Results of Moves : Insertions Only

Space overhead, balanced

Space overhead, skewed

Fraction moved, skewed

No moves 2.00 1.79 0%

Conservative

1.46 1.39 1.6%

Standard 1.41 1.29 12.0%

Standard, 2 1.14 1.06 14.9%

Page 31: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Conclusions, Moves

• Even one move saves significant space.– More aggressive schemes, considering all

possible single moves, save even more. (Harder to analyze, more hardware resources.)

• Importance of allowing small amounts of spillover in practical settings.

Page 32: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

From Low to High

• Low– Hash Tables for Hardware

– New Bloom Filter/Counting Bloom Filter Constructions (Hardware Friendly)

• Medium– Approximate Concurrent State Machines

– Distance-Sensitive Bloom Filters

• High– A Distributed Hashing Infrastructure

Page 33: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Low- Medium: New Bloom Filters /

Counting Bloom Filters

• Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese.

Page 34: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

A New Approach to Bloom Filters• Folklore Bloom filter construction.

– Recall: Given a set S = {x1,x2,x3,…xn} on a universe U, want to answer membership queries.

– Method: Find an n-cell perfect hash function for S.• Maps set of n elements to n cells in a 1-1 manner.

– Then keep bit fingerprint of item in each cell. Lookups have false positive < .

– Advantage: each bit/item reduces false positives by a factor of 1/2, vs ln 2 for a standard Bloom filter.

• Negatives:– Perfect hash functions non-trivial to find.– Cannot handle on-line insertions.

)/1(log2

Page 35: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Near-Perfect Hash Functions

• In [BM96], we note that d-left hashing can give near-perfect hash functions.– Useful even with deletions.

• Main differences– Multiple buckets must be checked, and multiple

cells in a bucket must be checked.– Not perfect in space usage.

• In practice, 75% space usage is very easy.• In theory, can do even better.

Page 36: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

First Design : Just d-left Hashing

• For a Bloom filter with n elements, use a 3-left hash table with average load 4, 60 bits per bucket divided into 6 fixed-size fingerprints of 10 bits.– Overflow rare, can be ignored.

• False positive rate of – Vs. 0.000744 for a standard Bloom filter.

• Problem: Too much empty, wasted space.– Other parametrizations similarly impractical.– Need to avoid wasting space.

01171875.0212 10

Page 37: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Just Hashing : Picture

Bucket

1011011100000111010110101010000000111111

EmptyEmpty

Page 38: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Key: Dynamic Bit Reassignment

• Use 64-bit buckets: 4 bit counter, 60 bits divided equally among actual fingerprints.– Fingerprint size depends on bucket load.

• False positive rate of 0.0008937– Vs. 0.0004587 for a standard Bloom filter.

• DBR: Within a factor of 2.– And would be better for larger buckets.– But 64 bits is a nice bucket size for hardware.

• Can we remove the cost of the counter?

Page 39: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

DBR : Picture

Bucket

Count : 4

000110110101111010100001101010101000101010110101010101101011

Page 40: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Semi-Sorting

• Fingerprints in bucket can be in any order.– Semi-sorting: keep sorted by first bit.

• Use counter to track #fingerprints and #fingerprints starting with 0.

• First bit can then be erased, implicitly given by counter info.

• Can extend to first two bits (or more) but added complexity.

Page 41: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

DBR + Semi-sorting : Picture

Bucket

Count : 4,2

000110110101111010100001101010101000101010110101010101101011

Page 42: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

DBR + Semi-Sorting Results

• Using 64-bit buckets, 4 bit counter.– Semi-sorting on loads 4 and 5.– Counter only handles up to load 6.– False positive rate of 0.0004477

• Vs. 0.0004587 for a standard Bloom filter.

– This is the tradeoff point.

• Using 128-bit buckets, 8 bit counter, 3-left hash table with average load 6.4.– Semi-sorting all loads: fpr of 0.00004529– 2 bit semi-sorting for loads 6/7: fpr of 0.00002425

• Vs. 0.00006713 for a standard Bloom filter.

Page 43: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Additional Issues

• Futher possible improvements– Group buckets to form super-buckets that share

bits.– Conjecture: Most further improvements are not

worth it in terms of implementation cost.

• Moving items for better balance?• Underloaded case.

– New structure maintains good performance.

Page 44: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Improvements to Counting Bloom Filter

• Similar ideas can be used to develop an improved Counting Bloom Filter structure.– Same idea: use fingerprints and a d-left hash table.

• Counting Bloom Filters waste lots of space.– Lots of bits to record counts of 0.

• Our structure beats standard CBFs easily, by factors of 2 or more in space.– Even without dynamic bit reassignment.

Page 45: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Deletion ProblemSuppose x and y have the same fingerprint z.

Insert x xx x

x

Insert y z

Delete x? z z

yy y

y

Page 46: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Deletion Problem

• When you delete, if you see the same fingerprint at two of the location choices, you don’t know which is the right one.– Take both out: false negatives.– Take neither out: false positives/eventual overflow.

Page 47: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Handling the Deletion Problem

• Want to make sure the fingerprint for an element cannot appear in two locations.

• Solution: make sure it can’t happen.– Trick: uses (pseudo)random permtuations

instead of hashing.

Page 48: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Two Stages• Suppose we have d subtables, each with 2b buckets, and

want f bit fingerprints.• Stage 1: Hash element x into b+f bits using a “strong”

hash function H(x).• Stage 2: Apply d permutations taking {0… 2b+f-

1} {0… 2b+f-1}

– Bucket Bi and fingerprint Fi for ith subtable given by ith permtuation.

– Also, Bi and Fi completely determine H(x).

),())(( iii FBxH

Page 49: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Handling the Deletion Problem

• Lemma: if x and y yield the same fingerprint in the same bucket, then H(x) = H(y). – Proof: because of permutation setup, fingerprint and bucket

determine H(x).

• Each cell has a small counter.– In case two elements have same hash, H(x) = H(y).– Note they would match for all buckets/fingerprints.– 2 bit counters generally suffice.

• Deletion problem avoided.– Can’t have two fingerprints for x in the table at the same time;

handled by the counter.

Page 50: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

A Problem for Analysis

• Permutations implies no longer “pure” d-left hashing.– Dependence.

– Analysis no longer applies.

• Some justification: – Balanced Allocation on Graphs (SODA 2006,

Kenthapadi and Panigrahy.)

– Differential equations.

• Justified experimentally.

Page 51: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Other Practical Issues

• Simple, linear permtuations

– High order bits for bucket, low order for fingerprint.– Not analyzed, works fine in practice.

• Invertible permutations allow moving elements if hash table overflows.– Move element from overflow bucket to another choice.– Powerful paradigm…

• Cuckoo hashing and related schemes.

– But more expensive in implemenation terms.

axaHxH fbi odd2mod)())((

Page 52: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Space Comparison : Theory

• Standard counting Bloom filter uses c counters/element = 4c bits/element.

• The d-left CBF using r bit remainders, 4 hash functions, 8 cells/bucket uses 4(r+2)/3 bits/element.

• Space equalized when c = (r+2)/3.

• Can change parameters to get other tradeoffs.

232ln 2242posfalseleftposfalseStandard

cc

d

Page 53: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Space Comparison : Practice

• Everything behaves essentially according to expectations.– Not surprising: everything is a “balls-and-bins”

process.

• Using 4-left hashing:– Save over a factor of 2 in space with 1% false

postive rate.– Save over a factor of 2.5 in space with 0.1%

false positive rate.

Page 54: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

From Low to High

• Low– Hash Tables for Hardware

– New Bloom Filter/Counting Bloom Filter Constructions (Hardware Friendly)

• Medium– Approximate Concurrent State Machines

– Distance-Sensitive Bloom Filters

• High– A Distributed Hashing Infrastructure

Page 55: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Approximate Concurrent State Machines

• Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese.

• Extending the Bloomier filter idea to handle dynamic sets, dynamic function values, in practical setting.

Page 56: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Approximate ConcurrentState Machines

• Model for ACSMs– We have underlying state machine, states 1…X.– Lots of concurrent flows.– Want to track state per flow.– Dynamic: Need to insert new flows and delete

terminating flows.– Can allow some errors.– Space, hardware-level simplicity are key.

Page 57: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Motivation: Router State Problem• Suppose each flow has a state to be tracked.

Applications:– Intrusion detection– Quality of service– Distinguishing P2P traffic– Video congestion control– Potentially, lots of others!

• Want to track state for each flow.– But compactly; routers have small space.– Flow IDs can be ~100 bits. Can’t keep a big lookup table

for hundreds of thousands or millions of flows!

Page 58: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Problems to Be Dealt With

• Keeping state values with small space, small probability of errors.

• Handling deletions.• Graceful reaction to adversarial/erroneous

behavior.– Invalid transitions.– Non-terminating flows.

• Could fill structure if not eventually removed.

– Useful to consider data structures in well-behaved systems and ill-behaved systems.

Page 59: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

ACSM Basics• Operations

– Insert new flow, state– Modify flow state– Delete a flow– Lookup flow state

• Errors– False positive: return state for non-extant flow– False negative: no state for an extant flow– False return: return wrong state for an extant flow– Don’t know: return don’t know

• Don’t know may be better than other types of errors for many applications, e.g., slow path vs. fast path.

Page 60: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

ACSM via Counting Bloom Filters

• Dynamically track a set of current (FlowID,FlowState) pairs using a CBF.

• Consider first when system is well-behaved.– Insertion easy.– Lookups, deletions, modifications are easy

when current state is given.• If not, have to search over all possible states. Slow,

and can lead to don’t knows for lookups, other errors for deletions.

Page 61: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Direct Bloom Filter (DBF) Example

0 0 1 0 2 3 0 0 2 1 0 1 1 2 0 0

(123456,3) (123456,5)

0 0 0 0 1 3 0 0 3 1 1 1 1 2 0 0

Page 62: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Timing-Based Deletion

• Motivation: Try to turn non-terminating flow problem into an advantage.

• Add a 1-bit flag to each cell, and a timer.– If a cell is not “touched” in a phase, 0 it out.

• Non-terminating flows eventually zeroed.• Counters can be smaller or non-existent; since

deletions occur via timing.• Timing-based deletion required for all of our

schemes.

Page 63: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Timer Example

3 0 0 2 1 0 1 1

1 0 0 0 1 0 1 0

3 0 0 0 1 0 1 0

0 0 0 0 0 0 0 0

RESET

Timer bits

Page 64: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Stateful Bloom Filters

• Each flow hashed to k cells, like a Bloom filter.• Each cell stores a state.• If two flows collide at a cell, cell takes on don’t know

value.• On lookup, as long as one cell has a state value, and

there are not contradicting state values, return state.• Deletions handled by timing mechanism (or counters in

well-behaved systems). • Similar in spirit to [KM], Bloom filter summaries for

multiple choice hash tables.

Page 65: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Stateful Bloom Filter (SBF) Example

1 4 3 4 3 3 0 0 2 1 0 1 4 ? 0 2

(123456,3) (123456,5)

1 4 5 4 5 3 0 0 2 1 0 1 4 ? 0 2

Page 66: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

What We Need : A New Design

• These Bloom filter generalizations were not doing the job.– Poor performance experimentally.

• Maybe we need a new design for Bloom filters!

• In real life, things went the other way; we designed a new ACSM structure, and found that it led to the new Bloom filter/counting Bloom filter designs.

Page 67: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Fingerprint Compressed Filter

• Each flow hashed to d choices in the table, placed at the least loaded.– Fingerprint and state stored.

• Deletions handled by timing mechanism or explicitly.• False positives/negatives can still occur (especially in

ill-behaved systems).• Lots of parameters: number of hash functions, cells

per bucket, fingerprint size, etc.– Useful for flexible design.

Page 68: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Fingerprint Compressed Filter (FCF) Example

11111111110000000 4

00011110011101101 1

11110111001001011 2

11110101001000111 2

11100010010111110 1

01110010001011111 3

10101110010101011 2

x : 11110111001001011 : State 2 to State 4

01110010010101111 6

01110100100010111 1

10001110011111100 3

Fingerprint State

Page 69: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Experiment Summary

• FCF-based ACSM is the clear winner.– Better performance than less space for the

others in test situations.

• ACSM performance seems reasonable:– Sub 1% error rates with reasonable size.

Page 70: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Distance-Sensitive Bloom Filters

• Instead of answering questions of the form

we would like to answer questions of the form

• That is, is the query close to some element of the set, under some metric and some notion of close.

• Applications:– DNA matching– Virus/worm matching– Databases

.SyIs

.SxyIs

Page 71: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Distance-Sensitive Bloom Filters

• Goal: something in same spirit as Bloom filters.– Don’t exhaustively check set.

• Initial results for Hamming distance show it is possible. [KM]

• Closely related to locality-sensitive hashing.• Not currently practical.• New ideas?

Page 72: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

From Low to High

• Low– Hash Tables for Hardware

– New Bloom Filter/Counting Bloom Filter Constructions (Hardware Friendly)

• Medium– Approximate Concurrent State Machines

– Distance-Sensitive Bloom Filters

• High– A Distributed Hashing Infrastructure

Page 73: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

A Distributed Router Infrastructure

• Recently funded FIND proposal.

• Looking for ideas/collaborators.

Page 74: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

The High-Level Pitch

• Lots of hash-based schemes being designed for approximate measurement/monitoring tasks.– But not built into the system to begin with.

• Want a flexible router architecture that allows:– New methods to be easily added. – Distributed cooperation using such schemes.

Page 75: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

What We Need

On-ChipMemory

Hashing Computation

Unit

Off-ChipMemory

CAM(s)

Programming Language

Memory

Unit for Other

Computation

Computation

Communication+ Control

ControlSystem

CommunicationArchitecture

Page 76: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Lots of Design Questions

• How much space for various memory levels? How can we dynamically divide memory among multiple competing applications?

• What hash functions should be included? How open should system be to new hash functions?

• What programming functionality should be included? What programming language to use?

• What communication is necessary to achieve distributed monitoring tasks given the architecture?

• Should security be a consideration? What security approaches are possible?

• And so on…

Page 77: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Related Theory Work

• What hash functions should be included?– Joint work with Salil Vadhan.– Using theory of randomness extraction, we show

that for d-left hashing, Bloom filters, and other hashing methods, choosing a hash function from a pairwise independent family is enough – if data has sufficient entropy.

• Behavior matches truly random hash function with high probability.

• Radnomness of hash function and data “combine”.• Pairwise independence enough for many applications.

Page 78: Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level

Conclusions and Future Work

• Low: Mapping current hashing techniques to hardware is fruitful for practice.

• Medium: Big boom in hashing-based algorithms/data structures. Trend is likely to continue.– Approximate concurrent state machines: Natural progression from

set membership to functions (Bloomier filter) to state machines. What is next?

– Power of d-left hashing variants for near-perfect matchings.

• High: Wide open. Need to systematize our knowledge for next generation systems.– Measurement and monitoring infrastructure built into the system.