improve sketching of hamming distance with error correcting

34
Improve sketching of Hamming Distance with Error Correcting Ely Porat Bar-Ilan University Google Inc Ohad Lipsky Bar-Ilan University Check Point Inc December 2003

Upload: redell

Post on 30-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Improve sketching of Hamming Distance with Error Correcting. Ely Porat Bar-Ilan University Google Inc. Ohad Lipsky Bar-Ilan University Check Point Inc. December 2003. Problem Definition (1). Alice. Bob. T A. T B. n. n. hamm(T A ,T B ). Given k - bound on the number of mismatches. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Improve sketching of Hamming Distance with Error Correcting

Improve sketching of Hamming Distance with Error Correcting

Ely Porat

Bar-Ilan University

Google Inc

Ohad Lipsky

Bar-Ilan University

Check Point Inc

December 2003

Page 2: Improve sketching of Hamming Distance with Error Correcting

Problem Definition (1)Alice Bob

n nTA TB

hamm(TA,TB)

Given k - bound on the number of mismatches

December 2003

Page 3: Improve sketching of Hamming Distance with Error Correcting

Problem Definition (2)

n nTA TB

Calculate hamm(TA,TB) given only SA,SB

SA SB

S S

Finding the mistakes

Given k - bound on the number of mismatches

December 2003

Page 4: Improve sketching of Hamming Distance with Error Correcting

Motivations

• Data Bases

• Internet

• Error Correcting

Router A

Router B

Router C

Router D

December 2003

Page 5: Improve sketching of Hamming Distance with Error Correcting

Outline:

• Simple Solution

• Error Correcting

• Improved Solution

• Improve more

• Recursion

• File sharing

December 2003

Page 6: Improve sketching of Hamming Distance with Error Correcting

Simplest Solution - O(k2log1/)

• Binary Alphabet

• Allocate k2 cells.

• Take the input array and hash each bit to one of the cells.

• In each cell remember the xor of all the values hash to it.

0 1 1 0December 2003

Page 7: Improve sketching of Hamming Distance with Error Correcting

Simplest Solution - O(k2log1/)

1 1 0 0

0 1 0 0

December 2003

Page 8: Improve sketching of Hamming Distance with Error Correcting

Simplest Solution - O(k2log1/)

• Due to the birthday principal:The probability that 2 Error will fallto the same cell < 1/2

• log1/ - to get a probability to fail

0 1 1 0December 2003

Page 9: Improve sketching of Hamming Distance with Error Correcting

Alphabet

• Denote with S the size of the alphabet.• We can encode each latter with it’s unary

representation.

• The only effect is that each mistake will be counted twice.

0 - 1000000….01 - 0100000….0.S-1 - 0000000….1

0 - 1000000….05 - 0000010….0

December 2003

Page 10: Improve sketching of Hamming Distance with Error Correcting

Error correcting - O(k2logNS)

• Here we allocate two kind of k2 cellsk2 of logS bits. k2 of logNS bits.

5 8 3 2

15 6 7 8

C1[h(A[i])]+=A[i]

C2[h(A[i])]+=iA[i]

December 2003

Page 11: Improve sketching of Hamming Distance with Error Correcting

Error correcting - O(k2logNS)

• As before with probability > 1/2 there won’t fall 2 Errors in the same cell.

5 8 3 2

15 6 7 8

C1[h(A[i])]+=A[i]

C1[h(A[i])]+=iA[i]

December 2003

Page 12: Improve sketching of Hamming Distance with Error Correcting

Error correcting - O(k2logNS)

• We get from the red cells:

5 8 3 2

C1[h(A[i])]+=A[i]

5 6 3 2

5

3

8 - 6 = 5 - 3

December 2003

Page 13: Improve sketching of Hamming Distance with Error Correcting

Error correcting - O(k2logNS)

• We get from the blue cells:

15 11 7 5

15 9 7 5

5

3

11 - 9 = 2*(5 - 3) => i=2

C2[h(A[i])]+=iA[i]

0 1 2

December 2003

Page 14: Improve sketching of Hamming Distance with Error Correcting

Error correcting - O(k2logNS)

• The probability to succeed is about 1/2.

• To lower the failer probability we will run it 3 times.

• We will get a list of possible mistakes each time.

• Output all the mistakes that appear in at least 2 of the 3 runs.

December 2003

Page 15: Improve sketching of Hamming Distance with Error Correcting

O(klog2k) - Solution

• The Idea is two stage hashes:

k/logk

w.h.p O(logk)

Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

Page 16: Improve sketching of Hamming Distance with Error Correcting

O(klog2k) - Solution

O(logk)

O(log2k)

The Probability to fail is less then 1/2.

Run it 2logk timesAnd take the max.

=> failer probabilty less then 1/k2

Space = O(log3k)

keep accumulated XOR

Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

Page 17: Improve sketching of Hamming Distance with Error Correcting

O(klog2k) - Solution

k/logkO(log3k) O(log3k) O(log3k) O(log3k)

O(klog2k)

P(Failer) k/logk * 1/k2 < 1/k

Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003

Page 18: Improve sketching of Hamming Distance with Error Correcting

O(k2log*klogk) -Idea (recursion)

k/logk

logk/loglogk

Pr(F)<1/logck

logk/loglogk runs, take max

December 2003

Page 19: Improve sketching of Hamming Distance with Error Correcting

Error Correcting O(klogNS)Alice Bob

n nTA TB

r0r1r2…

p=(N3S)

ri random w.p

1

k0 o.w

1 TA riaimod pi0

n 1

1 TB ribimod pi0

n 1

1 TA

1 TA 1 TB 0

rj a j b j random

nomistake

onemistake

more thenone

Constant Probability

December 2003

Page 20: Improve sketching of Hamming Distance with Error Correcting

Error Correcting O(klogNS)Alice Bob

n nTA TB

1 TA riaimod pi0

n 1

1 TB ribimod pi0

n 1

1 TA

1 TA 1 TB 0

rj a j b j random

nomistake

onemistake

more thenone

1' TA iriaimod pi0

n 1

1' TB iribimod pi0

n 1

1' TA

1' TA 1' TB 1 TA 1 TB

jrj a j b j rj a j b j

j

If we wrong w.h.p j>n

December 2003

Page 21: Improve sketching of Hamming Distance with Error Correcting

Error Correcting O(klogNS)Alice Bob

n nTA TB

1' TA 1' TB 1 TA 1 TB

j

rj , aj - bj

December 2003

Page 22: Improve sketching of Hamming Distance with Error Correcting

Error Correcting O(klogNS)Alice Bob

n nTA TB

1 TA ,1' TA

2 TA ,2 ' TA

ck ln k TA ,ck ln k ' TA

O(klnk)

December 2003

Page 23: Improve sketching of Hamming Distance with Error Correcting

RecursionAlice Bob

n nTA TB

1 TA ,1' TA

2 TA ,2 ' TA

ck TA ,ck ' TA

ck

ri random w.p

1

k0 o.w

n nTA TB

1 TA ,1' TA

2 TA ,2 ' TA

ck2TA ,ck

2' TA

ri random w.p

2

k0 o.w

ck

2

December 2003

Page 24: Improve sketching of Hamming Distance with Error Correcting

RecursionAlice Bob

n nTA TB

ck

ri random w.p

1

k0 o.w

ri random w.p

2

k0 o.w

ck

2

ri random w.p

4

k0 o.w

ck

4

ck ck

2ck

4 2ck

O(klogNS)

December 2003

Page 25: Improve sketching of Hamming Distance with Error Correcting

Complexity

n nTA TB

SA SB

S S

Size: O(klogNS)Computing sketch: O(nlogk)Comparing sketches: O(klogk)

December 2003

Page 26: Improve sketching of Hamming Distance with Error Correcting

O(klogk) -Solution

• We can just encode in unary and hash the input to k3 cells and then run the O(klogNS)=O(klogk) algorithm.

December 2003

Page 27: Improve sketching of Hamming Distance with Error Correcting

Reed-Solomon Codes

1 1 1 1

1 2 3 2k

1 22 32 2k 2

1 2n 3n 2k n

a0

a1

a2

an 1

p 1 p 2

p 2k

p x a0 a1x a2x2 an 1x

n 1

We manage to develop a deterministic algorithm based on that.But the encoding and the decoding is slower.

Amir, Farach 95Feigenbaum, Ishai, Malkin, Nissim, Strauss, Wright 01Bar-Yossef, Jayram, Kumar, Sivakumar 03

Efremenko, Porat, Rothschild 06Efremenko, Porat 07

Page 28: Improve sketching of Hamming Distance with Error Correcting

File Sharing

nsource Napster

Source need to stay until someone will have the whole file. (and willing to stay)

There is bottleneck at the end.

Page 29: Improve sketching of Hamming Distance with Error Correcting

File Sharing

nsource emule/kazaa/torrent

The source has to send nlnn blocksbefore disconnecting.

Sometimes there are some bottlenecks

Page 30: Improve sketching of Hamming Distance with Error Correcting

Improved File Sharing - Ver 1

a0a1a2…………….an-1n

source

p x a0 a1x a2x2 an 1x

n 1

ai F2b

0 , p

0 ,

1 , p1 ,

2 , p2 , n6, p n6

n6

Page 31: Improve sketching of Hamming Distance with Error Correcting

Improved File Sharing - Ver 1n6

Each client that got n points can recreate the file

There is no more nlnn

Almost no bottlenecks

Page 32: Improve sketching of Hamming Distance with Error Correcting

Improved File Sharing - Ver 2

ai F2ba0a1a2…………….an-1

nsource

Send linear equations on the file.

r0,0 r0,1 r0,n 1

r1,0 r1,1 r1,n 1

rn 1,0 rn 1,1 rn 1,n 1

Pr success 12b

n 1

2bn

1

2bn 2

2bn

1

2bn i

2bn

1

1

2bn

1 2 b 1

Page 33: Improve sketching of Hamming Distance with Error Correcting

Improved File Sharing - Ver 2

a0a1a2…………….an-1n

source

Problems: 1. Heavy to encode each packet we need to go over all the file.2. Very heavy to decode O(n2) block operation + O(n3) fields operations.

Facts:1. If you get n(1/2-) random combination of two blocks you won’t have dependents w.h.p.2. If you have d - pairs combinations you can easilly reduce your system to n-d variables.

Solution: Use sparse functionals

Page 34: Improve sketching of Hamming Distance with Error Correcting

Improved File Sharing - Ver 2

a0a1a2…………….an-1n

source

Futures: 1. Backward compatibility.2. Even if you don’t have the whole file you can mix functionals.