new results and open problems for insertion/deletion channels

52
New Results and Open Problems for Insertion/Deletion Channels Michael Mitzenmacher Harvard University Much is joint work with Eleni Drinea

Upload: edie

Post on 15-Mar-2016

46 views

Category:

Documents


0 download

DESCRIPTION

New Results and Open Problems for Insertion/Deletion Channels. Michael Mitzenmacher Harvard University Much is joint work with Eleni Drinea. The Most Basic Channels. Binary erasure channel. Each bit replaced by a ? with probability p . Binary symmetric channel. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: New Results and Open Problems for Insertion/Deletion Channels

New Results and Open Problems for Insertion/Deletion Channels

Michael MitzenmacherHarvard University

Much is joint work with Eleni Drinea

Page 2: New Results and Open Problems for Insertion/Deletion Channels

2M. Mitzenmacher

The Most Basic Channels• Binary erasure channel.

– Each bit replaced by a ? with probability p.

• Binary symmetric channel.– Each bit flipped with probability p.

• Binary deletion channel.– Each bit deleted with probability p.

Page 3: New Results and Open Problems for Insertion/Deletion Channels

3M. Mitzenmacher

The Most Basic Channels• Binary erasure channel.

– Each bit replaced by a ? with probability p.

• Binary symmetric channel.– Each bit flipped with probability p.

• Binary deletion channel.– Each bit deleted with probability p.

01?01?0111?1011101110010

011001111011011101110010

101110101011101110010

Page 4: New Results and Open Problems for Insertion/Deletion Channels

4M. Mitzenmacher

The Most Basic Channels• Binary erasure channel.

– Each bit is replaced by a ? with probability p.– Very well understood.

• Binary symmetric channel.– Each bit flipped with probability p.– Very well understood.

• Binary deletion channel.– Each bit deleted with probability p.– We don’t even know the capacity!!!

Page 5: New Results and Open Problems for Insertion/Deletion Channels

5M. Mitzenmacher

Motivation

This seems a disturbing, sad state of affairs. It bothers me greatly.

Page 6: New Results and Open Problems for Insertion/Deletion Channels

6M. Mitzenmacher

Motivation

This seems a disturbing, sad state of affairs. It bothers me greatly.

And there may be applications…

Page 7: New Results and Open Problems for Insertion/Deletion Channels

7M. Mitzenmacher

Motivation

This seems a disturbing, sad state of affairs. It bothers me greatly.

And there may be applications… Hard disks, pilot tones, etc.

Page 8: New Results and Open Problems for Insertion/Deletion Channels

8M. Mitzenmacher

What’s the Problem?• Erasure and error channels have pleasant symmetries;

deletion channels do not.• Example:

– Delete one bit from 1010101010.– Delete one bit from 0000000000.

• Understanding this asymmetry seems fundamental.• Requires deep understanding of combinatorics of

random sequences and subsequences.– Not a historical strength of coding theorists.– But it is for this audience….

Page 9: New Results and Open Problems for Insertion/Deletion Channels

9M. Mitzenmacher

In This Talk• Main result: capacity of binary deletion channel is at least

(1- p)/9.– Compare to capacity (1- p) for erasure channel.– First within constant factor result.– Still not tight….

• We describe path to this result.– Generally, we’ll follow the history chronologically.

• We describe recent advances on related problems.– Insertion channels, more limited models.

• We describe many related open problems.– What do random subsequences of random sequences look like?

Page 10: New Results and Open Problems for Insertion/Deletion Channels

10M. Mitzenmacher

Capacity Lower BoundsShannon-based approach:

1. Choose a random codebook.

2. Define “typical” received sequences.

3. Construct a decoding algorithm.

Page 11: New Results and Open Problems for Insertion/Deletion Channels

11M. Mitzenmacher

Capacity Lower Bounds: Erasures

1. Choose a random codebook.– Each bit chosen uniformly at random.

2. Define “typical” received sequences.– No more than (p + ) fraction of erasures.

3. Construct a decoding algorithm.– Find unique matching codeword.

Page 12: New Results and Open Problems for Insertion/Deletion Channels

12M. Mitzenmacher

Capacity Lower Bounds: Errors

1. Choose a random codebook.– Each bit chosen uniformly at random.

2. Define “typical” received sequences.– Between (p – ),(p + ) fraction of errors.

3. Construct a decoding algorithm.– Find unique matching codeword.

Page 13: New Results and Open Problems for Insertion/Deletion Channels

13M. Mitzenmacher

Capacity Lower Bounds: Deletions

1. Choose a random codebook.– Each bit chosen uniformly at random.

2. Define “typical” received sequences.– No more that (p + ) fraction of deletions.

3. Construct a decoding algorithm.– Find unique matching codeword.

Yields poor bounds, and no bound for p > 0.5.

Page 14: New Results and Open Problems for Insertion/Deletion Channels

14M. Mitzenmacher

GREEDY Subsequence Algorithm

• Is S a subsequence of T?– Start from leftmost point of S and T– Move right on T until match next character of S– Move to next character of T

T 0 0 0 1 1 0 0 1 0

S 0 1 0 1 0

Page 15: New Results and Open Problems for Insertion/Deletion Channels

15M. Mitzenmacher

Basic Failure Argument

• When codeword X of length n is sent, and p just greater than 0.5, received sequence R has approx. n/2 bits.

• Is R a subsequence of another codeword Y?• Consider GREEDY algorithm

– If Y is chosen u.a.r., on average two bits of Y are needed to cover each bit of R.

– So most other codewords match!

Page 16: New Results and Open Problems for Insertion/Deletion Channels

16M. Mitzenmacher

Deletions: Diggavi/Grossglauser

1. Choose a random codebook.– Codeword sequences chosen by a symmetric

first order Markov chain.

2. Define “typical” received sequences.– No more that (p + ) fraction of deletions.

3. Construct a decoding algorithm.– Find unique matching codeword.

Page 17: New Results and Open Problems for Insertion/Deletion Channels

17M. Mitzenmacher

Symmetric First Order Markov Chain

0 1

1– q/1

1– q/0

q/0 q/1

0’s tend to be followed by 0’s, 1’s tend to be followed by 1’s

Page 18: New Results and Open Problems for Insertion/Deletion Channels

18M. Mitzenmacher

Intuition

• To send a 0 bit, if deletions are likely, send many copies in a block.– Lowers the rate by a constant factor.– But makes it more likely that the bit gets

through.• First order Markov chain gives natural

blocks.

Page 19: New Results and Open Problems for Insertion/Deletion Channels

19M. Mitzenmacher

Diggavi/Grossglauser Results• Calculate distribution of number of bits required for

GREEDY to cover each bit of received sequence R using “random” codeword Y.– If R is a subsequence of Y, GREEDY algorithm will show

it! – Received sequence R also behaves like a sym. first order

Markov chain, with parameter q’.• Use Chernoff bounds to determine how many

codewords Y of length n are needed before R is covered.

• Get a lower bound on capacity!

Page 20: New Results and Open Problems for Insertion/Deletion Channels

20M. Mitzenmacher

The Block Point of View

• Instead of thinking of codewords being randomly chosen bit by bit:0, 00, 000, 0001, 00011, 000110, 0001101….

• Think of codewords as being a sequence of maximal blocks: 000, 00011, 000110, ….

Page 21: New Results and Open Problems for Insertion/Deletion Channels

21M. Mitzenmacher

Improvements, Random Codebook

1. Choose a random codebook.– Codeword sequences chosen by laying out

blocks according to a given distribution.2. Define “typical” received sequences.

– No more that (p + ) fraction of deletions, and number of blocks of each length close to the expectation.

3. Construct a decoding algorithm.– Find unique matching codeword.

Page 22: New Results and Open Problems for Insertion/Deletion Channels

22M. Mitzenmacher

Changing the Codebook

• Fix a distribution Z on positive integers.– Probability of j is Zj.

• Start sequence with 0’s.• First block of 0’s has length given by Z.

Then block of 1’s has length given by Z. And so on.– Generalizes previous work: first order Markov

chains lead to geometric distributions Z.

Page 23: New Results and Open Problems for Insertion/Deletion Channels

23M. Mitzenmacher

Choosing a Distribution

• Intuition: when a mismatch between received sequence and random codeword occurs under GREEDY, want it to be long lasting with significant probability.

• (a,b,q)-distributions:– A short block a with probability q, long block b

with probability 1– q.– Like Morse code.

Page 24: New Results and Open Problems for Insertion/Deletion Channels

24M. Mitzenmacher

Results So Far

Page 25: New Results and Open Problems for Insertion/Deletion Channels

25M. Mitzenmacher

So Far…

• Decoding algorithm has always been GREEDY.

• Can’t we do better?• For bigger capacity improvements, it seems

we need better decoding. • Best algorithm: maximum likelihood.

– Find the most likely codeword given the received sequence.

Page 26: New Results and Open Problems for Insertion/Deletion Channels

26M. Mitzenmacher

Maximum Likelihood• Pick the most likely codeword.• Given codeword X and received sequence R, count the

number of ways R is obtained as a subsequence of X. Most likely = biggest count.– Via dynamic programming.– Let C(j,k) = number of ways first k characters of R are

subsequence of first j characters of X.

• Potentially exponential time, but we just want capacity bounds.

)1,1(][),1(),( kjCRXIkjCkjC kj

Page 27: New Results and Open Problems for Insertion/Deletion Channels

27M. Mitzenmacher

The Recurrence

• I would love to analyze this recurrence when:– Y is independent of X– Y is obtained from X by random deletions.

• If the I[Xj = Rk] values were all independent, would be possible.

• But dependence in both cases makes analysis challenging.

• I bet someone here can do it. Let’s talk.

)1,1(][),1(),( kjCRXIkjCkjC kj

Page 28: New Results and Open Problems for Insertion/Deletion Channels

28M. Mitzenmacher

Maximum Likelihood

• Standard union bound argument:– Let sequence R be obtained from codeword X via a

binary deletion channel; let S be a random sequence obtained from another random codeword Y.

– Let C(R) = # of ways R is a subsequence of X.• Similarly C(S) = # of ways S is a subsequence of X.

– What are the distributions of C(R), C(S)?• Unknown; guess is a lognormal or power law type

distribution. Also C(S) is often 0 for many parameters.

– Want C(R) > C(S) with suitably high probability.

Page 29: New Results and Open Problems for Insertion/Deletion Channels

29M. Mitzenmacher

Conclude: Maximum Likelihood

• This is really the holy grail.– As far as capacity arguments.

• Questions:– What is the distribution of the number of times

a small “random” sequence appears as a subsequence of a larger “random” sequence?

– Same question, when the smaller “random” sequence is derived from the larger through a deletion process.

Page 30: New Results and Open Problems for Insertion/Deletion Channels

30M. Mitzenmacher

Better Decoding

• Maximum likelihood – haven’t got it yet…• An “approximation”, intuitively like mutual

information:– Consider a received block of 0’s (or 1’s). What block(s)

did it arise from?– Call that sequence a type.– For random codewords and deletions, number of

(type,block) pairs for each type/block combination is highly concentrated around its expectation.

Page 31: New Results and Open Problems for Insertion/Deletion Channels

31M. Mitzenmacher

Type Examples1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 0

1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 0

Received sequence: 1 1 0 0 0 0 0 1 1 1 1 0 0

(type,block) pairs:(1 1 1 1 , 1 1)(0 0 0 0 1 1 0 0 0 1 0 0 0 , 0 0 0 0 0)(1 1 1 0 0 1 1 0 0 1 1 , 1 1 1 1)(0 0 0 , 0 0)

Page 32: New Results and Open Problems for Insertion/Deletion Channels

32M. Mitzenmacher

New Decoding Algorithm1. Choose a random codebook.

– Codeword sequences chosen by laying out blocks according to a given distribution.

2. Define “typical” received sequences.– No more that (p + ) fraction of deletions, and has

near the expected number of (type,block) occurrences for each (type,block) pair.

3. Construct a decoding algorithm.– Find unique codeword that could be derived from the

“expected” number of (type,block) occurrences given the received sequence.

Page 33: New Results and Open Problems for Insertion/Deletion Channels

33M. Mitzenmacher

Jigsaw Puzzle Decoding

Received sequence: … 0 0 1 1 0 0 1 1 0 1 …

0 0 0 0

0 0

0 1 1 0

0 0

1 0 1 1

1 1

1 1

1 1

Jigsaw puzzle pieces

0 0 0 0 0

0 0

1 1 1

1 1

Page 34: New Results and Open Problems for Insertion/Deletion Channels

34M. Mitzenmacher

Jigsaw Puzzle Decoding : ExamplesReceived sequence: … 0 0 1 1 0 0 1 1 …

0 0 0 0 0

0 0

1 0 1 1

1 1

0 1 1 0

0 0

1 1

1 1

…000001011011011…

…0 0 1 1 0 0 1 1…

0 0 0 0

0 0

1 1 1

1 1

0 0 0 0 0

0 0

1 0 1 1

1 1

…0000111000001011…

…0 0 1 1 0 0 1 1…

0 0 0 0 0

0 0

0 0 0 0

0 0

1 1

1 1

1 1 1

1 1

…00000111000011…

…0 0 1 1 0 0 1 1…

Page 35: New Results and Open Problems for Insertion/Deletion Channels

35M. Mitzenmacher

Formal Argument

• Calculate upper bound on number of possible jigsaw puzzle coverings. Get lower bound on capacity.– Challenge 1: Don’t get exactly the expected number of

pieces for each (type,block) pair; just close.– Challenge 2: For very rare pieces, might not even be

close.• End result: an expression that can be numerically

computed to give a lower bound, given input distribution.

Page 36: New Results and Open Problems for Insertion/Deletion Channels

36M. Mitzenmacher

Calculations

• All done by computer.• Numerical precision – not too challenging

for moderate deletion probabilities.– Terms in sums become small quickly.– Fairly smooth.– We guarantee our output is a lower bound.

• Computations become time-consuming for large deletion probabilities.

Page 37: New Results and Open Problems for Insertion/Deletion Channels

37M. Mitzenmacher

Improved Results

Page 38: New Results and Open Problems for Insertion/Deletion Channels

38M. Mitzenmacher

Ullman’s Bound• Ullman has an upper bound for synchronization

channels.– For insertions of a specific form.– Zero-error probability.

• Does not apply to this channel – although it has been used as an upper bound!

• We are the first to show Ullman’s bound does not hold for this case.

• What is a (non-trivial) upper bound for this channel?– We have some initial results.

Page 39: New Results and Open Problems for Insertion/Deletion Channels

39M. Mitzenmacher

Insertion/Deletion Channels

• Our techniques apply for some insertion/deletion channels.– GREEDY decoding cannot; depends on received

sequence being a subsequence of the original codeword.

• Specifically, the case of duplications: 0 becomes 000….– Maintains block structure.

Page 40: New Results and Open Problems for Insertion/Deletion Channels

40M. Mitzenmacher

Poisson Channels

• Recall discrete Poisson distribution with mean m.

• Consider a channel that replaces each bit with a Poisson number of copies.

• Call this a Poisson channel.• Poisson channels can be studied using our

insertion/deletion analysis.• Capacity when m = 1 is approx. 0.1171.

– From numerical calculations.

!/]Pr[ kmekX km

Page 41: New Results and Open Problems for Insertion/Deletion Channels

41M. Mitzenmacher

Reduction!

• A code for a Poisson channel gives a code for a deletion channel.– To send codeword over deletion channel with deletion

probability p, use a codeword X for the Poisson channel code, but independently replace each bit by a Poisson distributed number of bits with mean 1/(1 – p).

– At output, each bit of X appears as a Poisson distributed number of copies (with mean 1) – a Poisson channel.

– Decode for the Poisson channel.

Page 42: New Results and Open Problems for Insertion/Deletion Channels

42M. Mitzenmacher

Code PictureTake codeword X for

Poisson channel

Randomly expand to X’ for deletion channel using a

Poisson number of copies per bit

Send X’ over deletion channel

Receive R

Decode R using the Poisson channel codebook

Expands by 1/(1– p) factor

Page 43: New Results and Open Problems for Insertion/Deletion Channels

43M. Mitzenmacher

Capacity Result

• Input to the deletion channel is 1/(1 – p) factor larger than for Poisson channel.

• Implies capacity for the deletion channel is at least 0.1171(1 – p) > (1 – p) / 9. – Deletion channel capacity is within a constant factor of

the erasure channel (1 – p).– First result of this type that we know of.– Best result (using a different mean) is 0.1185(1 – p).

Page 44: New Results and Open Problems for Insertion/Deletion Channels

44M. Mitzenmacher

More New Directions

• Sticky channels• Segmented deletion/insertion channels

Page 45: New Results and Open Problems for Insertion/Deletion Channels

45M. Mitzenmacher

Sticky Channels• Motivation: insertion/deletion channels are hard. So

what is the easiest such channel we can study?• Sticky channels: each symbol duplicated a number

of times.– Like a sticky keyboard! xxxxxxxxxxxxx– Examples: each bit duplicated with probability p, each bit

replaced by a geometrically distributed number of copies.• Key point: no deletions.• Intuitively easy: block structure at sender completely

preserved at the receiver.

Page 46: New Results and Open Problems for Insertion/Deletion Channels

46M. Mitzenmacher

Sticky Channels : Results

• New work: numerical method that give near-tight bounds on the capacity of such channels.– Key idea: symbols are block lengths.

• 000 becomes 3.– Capacity for original channel becomes capacity

per unit cost in this channel.– Use techniques for capacity per unit cost.

Page 47: New Results and Open Problems for Insertion/Deletion Channels

47M. Mitzenmacher

Segmented Channels

• Motivation: what about deletions makes them so hard?– Can we restrict deletions and make them easy?

• Segmented deletion channel: at most one deletion per segment. – Example: At most 1 deletion per original byte.

Page 48: New Results and Open Problems for Insertion/Deletion Channels

48M. Mitzenmacher

Segmented Channels : Results

• New work: 0-error, deterministic algorithms for segmented deletion/insertion channels.– With reasonable rates.– Simple computationally.

Page 49: New Results and Open Problems for Insertion/Deletion Channels

49M. Mitzenmacher

Open Questions• Capacity lower bounds: Improvements to argument.

– What is the best distribution for codewords?– Can even more general (type,block) pairs be usefully used?– Avoid overcounting jigsaw solutions that appear multiple times?– Specific better lower bounds for Poisson channel, translate

immediately into better general bounds!• Upper bounds:

– Tighter upper bound for capacity of binary deletion channel?• Maximum likelihood arguments:

– How do random subsequences of random sequences behave?– Look for threshold behaviors, heavy-tailed distributions.

Page 50: New Results and Open Problems for Insertion/Deletion Channels

50M. Mitzenmacher

Open Questions : Coding

• All this has been on lower bounds – almost nothing about coding!– Except segmented deletion channel.

• There has been some experimental work done, but very, very limited results so far.– And little good theory.

• Can we take the insight here and use it to develop good codes?

Page 51: New Results and Open Problems for Insertion/Deletion Channels

51M. Mitzenmacher

Specific Code Challenges

• Find a good code for the Poisson channel.– Code for the Poisson channel immediately

gives codes for the deletion channel!• Find good codes for basic sticky channels.

– Easiest channels with block structure : no deletions, just duplicates.

– May yield insight for other channels.– Low-density parity-check coding techniques

seem applicable.

Page 52: New Results and Open Problems for Insertion/Deletion Channels

52M. Mitzenmacher

The Goals

• Simple, clear, tight bounds for capacity for binary deletion channels, and practical codes that are close to capacity.

• It’s been done for erasure channels and error-correcting channels, why not deletion channels too?