structures from the future: bloom filters, · data structures from the future: bloom filters,...

111
Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC [email protected] 1 Thursday, November 11, 2010

Upload: others

Post on 27-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Data Structures from the Future:Bloom Filters, Distributed Hash Tables, and More!Tom Limoncelli, Google NYC

[email protected]

1Thursday, November 11, 2010

Page 2: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Why am I here?

I have no idea.

2Thursday, November 11, 2010

Page 3: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Why are you here?

I have 3 theories...

3Thursday, November 11, 2010

Page 4: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Why are you here?

1. You thought this was the Dreamworks talk.

4Thursday, November 11, 2010

Page 5: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Why are you here?

2. You’re still drunk from last night.

5Thursday, November 11, 2010

Page 6: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Why are you here?

3. You can’t manage what you don’t understand.

6Thursday, November 11, 2010

Page 7: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Overview

1. Hashes & Caches

2. Bloom Filters

3. Distributed Hash Tables (DHTs)

4. Key/Value Stores (NoSQL)

5. Google Bigtable

7Thursday, November 11, 2010

Page 8: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Disclaimer #1

There will be hand-waving.

The Presence of Slides

!=

“Being Prepared”

8Thursday, November 11, 2010

Page 9: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Disclaimer #2

You could learn most of this from Wikipedia. Really. Did I mention they’re talking about

Shrek in the other room?

9Thursday, November 11, 2010

Page 10: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Disclaimer #3

My LISA 2008 talk also conflicted with a talk from

Dreamworks.

10Thursday, November 11, 2010

Page 11: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

To understand this talk, you must understand:! ! Hashes! ! Caches

11Thursday, November 11, 2010

Page 12: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hashes

12Thursday, November 11, 2010

Page 13: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a Hash?

A fixed-size summary of a large amount of data.

13Thursday, November 11, 2010

Page 14: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Checksum

Simple checksum:

Sum the byte values. Take the last digit of the total.

Pros: Easy. Cons: Change order, same checksum.

Improvement: Cyclic Redundancy Check

Detects change in order.

14Thursday, November 11, 2010

Page 15: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hash

“Cryptographically Unique”

Difficult to generate 2 files with the same MD5 hash

Even more difficult to make a “valid second file”:

The second file is a valid example of the same format. (i.e. both are HTML files)

15Thursday, November 11, 2010

Page 16: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

How do crypto hashes work?

“It works because of math.”Matt Blaze, Ph.D

16Thursday, November 11, 2010

Page 17: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Reversible/Irreversible Functions

[ ] + 105 = 205

[ ] mod 10 = 4

17Thursday, November 11, 2010

Page 18: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Some common hashes

MD4

MD5

SHA1

SHA2

AES-Hash

18Thursday, November 11, 2010

Page 19: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hashes

19Thursday, November 11, 2010

Page 20: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Caches

20Thursday, November 11, 2010

Page 21: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a Cache?

Using a small/expensive/fast thing to make a big/cheap/slow thing faster.

21Thursday, November 11, 2010

Page 22: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Database

Big, Slow, Cheap

UserCache

Fast butexpensive.

22Thursday, November 11, 2010

Page 23: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Metric used to grade?

The “hit rate”: hits / total queries

How to tune?

Add additional storage

Smallest increment: Result size.

23Thursday, November 11, 2010

Page 24: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Suppose cache is X times faster

...but Y times more expensive

Balance cost of cache vs. savings you can get:

Web cache achieves 30% hit rate, costs $/MB

33% of cachable traffic costs $/MB from ISP.

What about non-cachable traffic?

What about query size?

24Thursday, November 11, 2010

Page 25: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Value of next increment is less than the previous:

10 units of cache achieves 30% hit rate

+10 units, hit rate goes to 32%

+10 more units, hit rate goes to 33% 0

25

50

75

100

10 20 30

$/unit # units

25Thursday, November 11, 2010

Page 26: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Data

Big, Slow, Cheap

UserCache

Fast butexpensive.

26Thursday, November 11, 2010

Page 27: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Data

Big, Slow, Cheap

NYC

Cache

Fast butexpensive.

CHI

LAX

Cache

Cache

Cache

27Thursday, November 11, 2010

Page 28: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

SimpleCache

NCACHE Intelligent

Add new data?

Ok Not found Ok

Delete data? Stale Stale Ok

Modify data? Stale Stale Ok

28Thursday, November 11, 2010

Page 29: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Caches

29Thursday, November 11, 2010

Page 30: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bloom Filters

30Thursday, November 11, 2010

Page 31: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a Bloom Filter?

Knowing when NOT to waste time seeking out data.

Invented in Burton Howard Bloom in 2070

31Thursday, November 11, 2010

Page 32: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a Bloom Filter?

Knowing when NOT to waste time seeking out data.

Invented in Burton Howard Bloom in 1970

32Thursday, November 11, 2010

Page 33: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

I invented Bloom Filters when I was 10 years old.

33Thursday, November 11, 2010

Page 34: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

34Thursday, November 11, 2010

Page 35: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Data

Big, Slow, Cheap

UserBloom

(Or, precocious10 year old)

35Thursday, November 11, 2010

Page 36: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Using the last 3 bits of hash:

000001010011100 101110111

Olson 000100001111Polk 000000000011Smith 001011101110Singh 001000011110

36Thursday, November 11, 2010

Page 37: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Using the last 3 bits of hash:

000001010011100 101110111

Olson 000100001111Polk 000000000011Smith 001011101110Singh 001000011110

Lakey 111110000000Baird 001011011111Camp 001101001010Johns 010100010100Burd 111000001101Bloom 110111000011

37Thursday, November 11, 2010

Page 38: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

10001001101010111100 110111101111

Using the last 4 bits of hash:

00000001001000110100 010101100111

Olson 000100001111Polk 000000000011Smith 001011101110Singh 001000011110

Lakey 111110000000Baird 001011011111Camp 001101001010Johns 010100010100Burd 111000001101Bloom 110111000011 7/16 = 44%

38Thursday, November 11, 2010

Page 39: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

bits of hash

# Entries Bytes <25% 1’s

3 2^3 8 1 2

4 2^4 16 2 4

5 2^5 32 4 8

6 2^6 64 8 16

7 2^7 128 16 32

8 2^8 256 32 64

20 2^8 1048576 131072 262144

24 2^32 16777216 2M 4.1 Million

32 2^64 4294967296 512M 1 Billion

39Thursday, November 11, 2010

Page 40: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

When to use? Sparse Data

When to tune: When more than x% are “1”

Pitfall: To resize, must rescan all keys.

Minimum Increment doubles memory usage:

Each increment is MORE USEFUL than the previous.

But exponentially MORE EXPENSIVE!

40Thursday, November 11, 2010

Page 41: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Databases: Accelerate lookups of indices.

Simulations: Often having, big, sparse databases.

Routers: Speeds up route table lookups.

Bloom Filter sample uses

41Thursday, November 11, 2010

Page 42: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Distributed Bloom Filters?

42Thursday, November 11, 2010

Page 43: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Data

NYC

BFCHI

LAX

BF

BF

BF

43Thursday, November 11, 2010

Page 44: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What if your Bloom Filter is out of date?

New data added: BAD. Clients may not see it.

Data changed: Ok

Data deleted: Ok, but not as efficient.

44Thursday, November 11, 2010

Page 45: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

How to perform updates?

Master calculates bitmap once.

Sends it to all clients

For a 20-bit table, that’s 130K. Smaller than most GIFs!

Reasonable for daily, hourly, updates.

45Thursday, November 11, 2010

Page 46: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

46Thursday, November 11, 2010

Page 47: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Big Bloom Filters often use 96, 120 or 160 bits!

47Thursday, November 11, 2010

Page 48: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bloom Filters

48Thursday, November 11, 2010

Page 49: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hash Tables

49Thursday, November 11, 2010

Page 50: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a Hash Table?

It’s like an array.

But the index can be anything “hashable”.

50Thursday, November 11, 2010

Page 51: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hash tablesPerl hash:

$thing{’b’} = 123;

$thing{‘key2’} = ”value2”;

print $thing{‘key2’};

Python Dictionary or “dict”:

thing = {}

thing[‘b’] = 123

thing[‘key2’] = “value2”

print thing[‘key2’]

51Thursday, November 11, 2010

Page 52: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

hash(‘cow’) = 78f825

hash(‘bee’) = 92eb5f

Bucket Data

78f825 (“cow”, “moo”)

92eb5f (“bee”, “buzz”) , (‘sheep’, ‘baah’)

hash(‘sheep’) = 92eb5f

52Thursday, November 11, 2010

Page 53: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Hash Tables

53Thursday, November 11, 2010

Page 54: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Distributed Hash Tables(DHTs)

54Thursday, November 11, 2010

Page 55: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

What is a DHT?

A hash table so big you have to spread it over multiple of machines.

55Thursday, November 11, 2010

Page 56: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Wouldn’t an infinitely large hash table be awesome?

56Thursday, November 11, 2010

Page 57: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Web server

lookup(url) -> page contents

‘index.html’ -> ‘<html><head>...’

‘/images/smile.png’ -> 0x4d4d2a...

57Thursday, November 11, 2010

Page 58: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Virtual Web server

lookup(vhost/url) -> page contents

‘cnn.com/index.html’ -> ‘<html><he...’

‘time.com/images/smile.png’ -> 0x4d...

58Thursday, November 11, 2010

Page 59: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Virtual FTP server

lookup(host:path/file) -> file contents

‘ftp.gnu.org:public/gcc.tgz’

‘ftp.usenix.org:public/usenix.bib’

59Thursday, November 11, 2010

Page 60: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

NFS server

lookup(host:path/file) -> file contents

‘srv1:home/tlim/Documents/foo.txt’ -> file contents

‘srv2:home/tlim/TODO.txt‘ -> file contents

60Thursday, November 11, 2010

Page 61: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Usenet (remember usenet?)

lookup(group:groupname:artnumber) -> article

lookup(‘group:comp.sci.math:987765’)

lookup(id:message-id) -> pointer

lookup(‘id:foo-12345@uunet’) -> ‘group:comp.sci.math:987765’

61Thursday, November 11, 2010

Page 62: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

IMAP

lookup(‘server:user:folder:NNNN’)

-> email message

62Thursday, November 11, 2010

Page 63: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Our DVD Collection

hash(disc image) -> disc image

How do I find a particular disk?

Keep a lookup table of name -> hash

Benefit: Two people with the same DVD? It only gets stored once.

63Thursday, November 11, 2010

Page 64: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

How would this work?

64Thursday, November 11, 2010

Page 65: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

001

Load it up!Root Host

01001001110110010001000101100011

11100010100101101001110100110111

0011000000000100

4

65Thursday, November 11, 2010

Page 66: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

11

0

SplitRoot Host

01001001110110010001000101100011

11100010100101101001110100110111

0011000000000100

3

2

011000011110110001000000011010110010111000000001

3

001100010111100074

66Thursday, November 11, 2010

Page 67: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

11

0

’01...’Root Host

01001001110110010001000101100011

11100010100101101001110100110111

0011000000000100

3

2

011000011110110001000000011010110010111000000001

3

001100010111100074

67Thursday, November 11, 2010

Page 68: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

11

0

‘0...’Root Host

01001001110110010001000101100011

11100010100101101001110100110111

0011000000000100

3

2

011000011110110001000000011010110010111000000001

3

001100010111100074

68Thursday, November 11, 2010

Page 69: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

11

0

‘1...’Root Host

01001001110110010001000101100011

11100010100101101001110100110111

0011000000000100

3

2

011000011110110001000000011010110010111000000001

3

001100010111100074

69Thursday, November 11, 2010

Page 70: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

11

0

SplitRoot Host

0100100111011001

0001000101100011

11100010100101101001110100110111

0011000000000100

3

201100001111011000100000001101011

0010111000000001

3

001100010111100074

70Thursday, November 11, 2010

Page 71: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

011

010Root Host

011

000

010001

000011

001

010001

000

010001

000

011

011

010001

000

Find: 0100100111011001...

Root Host

010

010

Find: 0100100111011001...Find: 0100100111011001...

71Thursday, November 11, 2010

Page 72: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Find: 0100110111011...

72Thursday, November 11, 2010

Page 73: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

011

010Root Host

011

000

010001

000011

001

010001

000

010001

000

011

011

010001

000

Find: 0100110111011...

Root Host

010

011

Find: 0100110111011...Find: 0100110111011...

73Thursday, November 11, 2010

Page 74: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Each host stores:

All the data that “leaf” there.

The list of parent nodes talking to it.

The list of children it knows about.

74Thursday, November 11, 2010

Page 75: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Dynamically Adjusting:

Data hashes in “clumps” making some hosts under-full and some hosts over-full.

Host running out of storage?

Split in two. Give half the data to another node.

Host running out of bandwidth?

Clone data and load-balance.

75Thursday, November 11, 2010

Page 76: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

011

010

011

000

010001

000011

001

010001

000

010001

000

011

011

010001

000

Root HostRoot HostRoot HostRoot Host

000000000

010010

000000

001001001001001001

76Thursday, November 11, 2010

Page 77: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Real DHTs in action

Peer 2 Peer file-sharing networks.

Content Delivery Networks (CDNs like Akamai)

Cooperative Caches

77Thursday, November 11, 2010

Page 78: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Distributed Hash Tables(DHTs)

78Thursday, November 11, 2010

Page 79: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key/Value Stores

79Thursday, November 11, 2010

Page 80: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Some commonKey/Value Stores

“NoSQL”

CouchDB

MongoDB

Apache Cassandra

Terrastore

Google Bigtable

80Thursday, November 11, 2010

Page 81: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Name Email Address

Tom Limoncelli [email protected] Main

Street

Mary Smith [email protected] 111 One Street

Joe Bond [email protected] 7 Seventh St

81Thursday, November 11, 2010

Page 82: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Name Email Address

Tom Limoncelli [email protected] 1515 Main Street

Mary Smith [email protected] 111 One Street

Joe Bond [email protected] 7 Seventh St

User Transaction Amount

Tom Limoncelli Deposit 100

Mary Smith Deposit 200

Tom Limoncelli Withdraw 50

82Thursday, November 11, 2010

Page 83: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Id Name Email Address

1 Tom Limoncelli

[email protected] 1515 Main Street

2 Mary Smith [email protected]

111 One Street

3 Joe Bond [email protected] 7 Seventh St

User Id Transaction Amount

1 Deposit 100

2 Deposit 200

1 Withdraw 50

83Thursday, November 11, 2010

Page 84: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Id Name Email Address

1 Tom Limoncelli

[email protected] 1515 Main Street

2 Mary Bond [email protected]

111 One Street

3 Joe Bond [email protected] 7 Seventh St

User Id Transaction Amount

1 Deposit 100

2 Deposit 200

3 Withdraw 50

84Thursday, November 11, 2010

Page 85: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Relational Databases

1st Normal Form

2nd Normal Form

3rd Normal Form

ACID: Atomicity, Consistency, Isolation, Durability

85Thursday, November 11, 2010

Page 86: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key/Value Stores

Keys

Values

BASE: Basically Available, Soft-state, Eventually consistent

86Thursday, November 11, 2010

Page 87: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Eventually?

Who cares! This is the web, not payroll!

Change the address listed in your profile.

Might not propagate to Europe for 15 minutes.

Can you fly to Europe in less than 15 minutes?

And if you could, would you care?

87Thursday, November 11, 2010

Page 88: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key/Value example:Key Value

[email protected] BLOB OF DATA

[email protected] BLOB OF DATA

[email protected] BLOB OF DATA

88Thursday, November 11, 2010

Page 89: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key/Value example:Key Value

[email protected]{ ‘name’: ‘Tom Limoncelli’, ‘address’: ‘1515 Main Street’}

[email protected]{ ‘name’: ‘Mary Smith’, ‘address’: ‘111 One Street’}

[email protected]{ ‘name’: ‘Joe Bond’, ‘address’: ‘7 Seventh St’}

89Thursday, November 11, 2010

Page 90: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Google Protobuf:http://code.google.com/p/protobuf/

Key Value

[email protected]

message Person {" required string name = 1;" optional string address = 2; repeated string phone = 3;}

[email protected]

{ ‘name’: ‘Mary Smith’, ‘address’: ‘111 One Street’, ‘phone’: [‘201-555-3456’, ‘908-444-1111’]}

[email protected]{ ‘name’: ‘Joe Bond’, ‘phone’: [‘862-555-9876’]}

90Thursday, November 11, 2010

Page 91: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key/Value Stores

91Thursday, November 11, 2010

Page 92: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bigtable

92Thursday, November 11, 2010

Page 93: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bigtable

Google’s very very large database.

OSDI'06

http://labs.google.com/papers/bigtable.html

Petabytes of data across thousands of commodity servers.

Web indexing, Google Earth, and Google Finance

93Thursday, November 11, 2010

Page 94: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bigtable Keys

Can be very huge.

Don’t have to have a value! (i.e the value is “null”)

Query by

Key

Key start/stop range (lexigraphical order)

94Thursday, November 11, 2010

Page 95: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Long keys are cool.

Key Value

Main St/123/Apt1 Jones

Main St/123/Apt2 Smith

Main St/200 Olson

Query range:Start: “Main St/123”End: infinity

95Thursday, November 11, 2010

Page 96: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bigtable Values

Values can be huge. Gigabytes.

Multiple values per key, grouped in “families”:

“key:family:family:family:...”

96Thursday, November 11, 2010

Page 97: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Families

Within a family:

Sub-keys that link to data.

Sub-keys are dynamic: no need to pre-define.

Sub-keys can be repeated.

97Thursday, November 11, 2010

Page 98: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Example: Crawl the web

For every URL:

Store the HTML at that location.

Store a list of which URLs link to that URL.

Store the “anchor text” those sites used.

<a href=”URL”>ANCHOR TEXT</a>

98Thursday, November 11, 2010

Page 99: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

http://www.cnn.com

<html>.........</html>

http://tomontime.com

<html>

<p>As you may have read on <a href=”http://www.cnn.com”>my favorite news site</a> there is...

99Thursday, November 11, 2010

Page 100: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Key contents: anchor:tomontime.com anchor:cnnsi.com

com.cnn.www <html>... my favorite news site CNN

Another familyFamily

Key contents: anchor:everythingsysadmin.com

com.tomontime <html>... videos

100Thursday, November 11, 2010

Page 101: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Each Family has its own...

Permissions (who can read/write/admin)

QoS (optimize for speed, storage diversity, etc.)

101Thursday, November 11, 2010

Page 102: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

All updates are timestamped.

Retains at least n recent updates or “never”.

Expired updates are garbage collected “eventually”.

Plus “time”

102Thursday, November 11, 2010

Page 103: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Bigtable

103Thursday, November 11, 2010

Page 104: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Further Reading:

Bigtable:

http://research.google.com

A visual guide to NoSQL:

http://blog.nahurst.com/visual-guide-to-nosql-systems

HashTables, DHTs, everything else

Wikipedia

104Thursday, November 11, 2010

Page 105: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Other futuristic topics:

Stop using “locks”, eliminate all deadlocks:

STM: Software Transactional Memory

Centralized routing: (you’d be surprised)

2 minute overview: www.openflowswitch.org

(the 4 minute demo video is MUCH BETTER)

“Network Coding”: n^2 more bandwidth?

SciAm.com: “Breaking Network Logjams”

105Thursday, November 11, 2010

Page 106: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

Q&A

106Thursday, November 11, 2010

Page 107: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

How to do a query?

107Thursday, November 11, 2010

Page 108: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

KEY VALUE

bird “{ legs=2, horns=0, covering=‘feathers’ }”

cat “{ legs=4, horns=0, covering=‘fur’ }”

dog “{ legs=4, horns=0, covering=‘fur’ }”

spider “{ legs=8, horns=0, covering=‘hair’ }”

unicorn “{ legs=4, horns=1, covering=‘hair’ }”

108Thursday, November 11, 2010

Page 109: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

“Which animals have 4 legs?”

Iterate over entire list

Open up each blob

Parse data

Accumulate list

SLOW!

109Thursday, November 11, 2010

Page 110: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

KEY VALUE

animal:bird “{ legs=2, horns=0, covering=‘feathers’ }”

animal:cat “{ legs=4, horns=0, covering=‘fur’ }”

animal:dog “{ legs=4, horns=0, covering=‘fur’ }”

animal:spider “{ legs=8, horns=0, covering=‘hair’ }”

animal:unicorn “{ legs=4, horns=1, covering=‘hair’ }”

legs:2:bird

legs:4:cat

legs:4:dog

legs:4:unicorn

legs:8:spider

Iterate:Start: “legs:4”End: “legs:5” Up to, but not

including “end”

110Thursday, November 11, 2010

Page 111: Structures from the Future: Bloom Filters, · Data Structures from the Future: Bloom Filters, Distributed Hash Tables, and More! Tom Limoncelli, Google NYC tlim@google.com Thursday,

legs=4 AND covering=fur

More indexes + the “zig zag” algorithm.

More indexed attributes = the slower insertions

Automatic if you use AppEngine’s storage system

111Thursday, November 11, 2010