unusual coding possibilities information theory in...

33
Unusual coding possibilities โ€“ information theory in practice statistical physics on hypercube of 2 sequences 1) Entropy coding: remove statistical redundancy in data compression: Huffman, arithmetic coding, ANS 2) (Forward) Error correction: add deterministic redundancy to repair damaged message (erasure, BSC, deletion channels): LDPC, sequential decoding, fountain codes and expansions 3) Rate distortion: shorter description approximation (lossy compression) , Kuznetsov-Tsybakov: (statistical) constraints known only to sender (steganography, QR codes) and their โ€œdualityโ€ 4) Distributed source coding โ€“ for multiple correlated sources that donโ€™t communicate with each other. Jarosล‚aw Duda, Krakรณw, 23.XI.2015

Upload: others

Post on 06-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Unusual coding possibilities โ€“

information theory in practice statistical physics on hypercube of 2๐‘› sequences

1) Entropy coding: remove statistical redundancy in data compression:

Huffman, arithmetic coding, ANS

2) (Forward) Error correction: add deterministic redundancy to repair

damaged message (erasure, BSC, deletion channels):

LDPC, sequential decoding, fountain codes and expansions

3) Rate distortion: shorter description approximation (lossy

compression) , Kuznetsov-Tsybakov: (statistical) constraints known

only to sender (steganography, QR codes) and their โ€œdualityโ€

4) Distributed source coding โ€“ for multiple correlated sources that

donโ€™t communicate with each other.

Jarosล‚aw Duda, Krakรณw, 23.XI.2015

Page 2: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

1. ENTROPY CODING โ€“ DATA COMPRESSION

We need ๐’ bits of information to choose one of ๐Ÿ๐’ possibilities.

For length ๐’ 0/1 sequences with ๐’‘๐’ of โ€œ1โ€, how many bits we need to choose one?

๐‘›!

(๐‘๐‘›)! ((1โˆ’๐‘)๐‘›)!โ‰ˆ

(๐‘›/๐‘’)๐‘›

(๐‘๐‘›/๐‘’)๐‘๐‘›((1โˆ’๐‘)๐‘›/๐‘’)(1โˆ’๐‘)๐‘› = 2๐‘› lg(๐‘›)โˆ’๐‘๐‘› lg(๐‘๐‘›)โˆ’(1โˆ’๐‘)๐‘› lg((1โˆ’๐‘)๐‘›) = 2๐‘›(โˆ’๐‘ lg(๐‘)โˆ’(1โˆ’๐‘)lg (1โˆ’๐‘))

Entropy coder: encodes sequence with (๐‘๐‘ )๐‘ =0..๐‘šโˆ’1 probability distribution

using asymptotically at least ๐‘ฏ = โˆ‘ ๐’‘๐’” ๐ฅ๐ (๐Ÿ/๐’‘๐’”)๐’” bits/symbol (๐ป โ‰ค lg(๐‘š))

Seen as weighted average: symbol of probability ๐’‘ contains ๐ฅ๐ (๐Ÿ/๐’‘) bits

Page 3: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

symbol of probability ๐’‘

contains ๐ฅ๐ (๐Ÿ/๐’‘) bits

symbol โ†’ length ๐‘Ÿ bit sequence

assumes: Pr(symbol) โ‰ˆ 2โˆ’๐‘Ÿ

Huffman coding:

- Sort symbols by frequencies

- Replace two least frequent

with tree of them and so on

generally: prefix codes

defined by a tree

unique decoding:

no sequence is a prefix of another

http://en.wikipedia.org/wiki/Huffman_coding

๐ด โ†’ 0, ๐ต โ†’ 100, ๐ถ โ†’ 101, ๐ท โ†’ 110, ๐ธ โ†’ 111

Page 4: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Symbol of probability ๐’‘ contains ๐ฅ๐ (๐Ÿ/๐’‘) bits

Encoding (๐‘๐‘ ) distribution with entropy coder optimal for (๐‘ž๐‘ ) distribution costs

ฮ”๐ป = โˆ‘ ๐‘๐‘ lg(1/๐‘ž๐‘ ) โˆ’ โˆ‘ ๐‘๐‘ lg(1/๐‘๐‘ ) = โˆ‘ ๐‘๐‘ lg (๐‘๐‘ 

๐‘ž๐‘ ) โ‰ˆ

1

ln (4)๐‘ ๐‘  โˆ‘(๐‘๐‘ โˆ’๐‘ž๐‘ )2

๐‘๐‘  ๐‘  ๐‘ 

more bits/symbol โ€“ so called Kullback-Leibler โ€œdistanceโ€ (not symmetric).

Huffman/prefix codes โ€“ encode symbols directly as bit sequences

Approximates probabilities as powers of 1/2

e.g. p(A) =1/2, p(B)=1/4, p(C)=1/4

example of Huffman penalty

for ๐†(๐Ÿ โˆ’ ๐†)๐’ distribution on 256 size alphabet

Huffman: 8 bits โ†’ at least 1 bit

ratio โ‰ค 8 here

terrible at encoding very frequent events

Ratio โ†’ 8/H Huffman ANS ๐œŒ = 0.5 4.001 3.99 4.00 ๐œŒ = 0.6 4.935 4.78 4.93 ๐œŒ = 0.7 6.344 5.58 6.33 ๐œŒ = 0.8 8.851 6.38 8.84 ๐œŒ = 0.9 15.31 7.18 15.29 ๐œŒ = 0.95 26.41 7.58 26.38 ๐œŒ = 0.99 96.95 7.90 96.43

0

0

1

1A

B C

A 0

C 11

B 10

Page 5: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Huffman coding (HC), prefix codes, needs sorting: example of Huffman penalty

most of everyday compression, for truncated ๐œŒ(1 โˆ’ ๐œŒ)๐‘› distribution

e.g. zip, gzip, cab, jpeg, gif, png, tiff, pdf, mp3โ€ฆ (canโ€™t use less than 1 bits/symbol) Zlibh (the fastest generally available implementation):

Encoding ~ 320 MB/s (/core, 3GHz)

Decoding ~ 300-350 MB/s

Range coding (RC): large alphabet arithmetic coding,

needs multiplication, e.g. 7-Zip, VP Google video codecs

(e.g. YouTube, Hangouts).

Encoding ~ 100-150 MB/s tradeoffs

Decoding ~ 80 MB/s

(binary) Arithmetic coding (AC):

H.264, H.265 video, ultracompressors e.g. PAQ Huffman: 1byte โ†’ at least 1 bit

Encoding/decoding ~ 20-30MB/s ratio โ‰ค 8 here

Asymmetric Numeral Systems (ANS) tabled (tANS) - no multiplication or sorting

FSE implementation of tANS: Encoding ~ 350 MB/s Decoding ~ 500 MB/s

RC โ†’ ANS: ~7x decoding speedup, no multiplication (switched e.g. in LZA compressor)

HC โ†’ ANS means better compression and ~ 1.5x decoding speedup (e.g. zhuff, lzturbo)

Ratio โ†’ 8/H zlibh FSE ๐œŒ = 0.5 4.001 3.99 4.00 ๐œŒ = 0.6 4.935 4.78 4.93 ๐œŒ = 0.7 6.344 5.58 6.33 ๐œŒ = 0.8 8.851 6.38 8.84 ๐œŒ = 0.9 15.31 7.18 15.29

๐œŒ = 0.95 26.41 7.58 26.38 ๐œŒ = 0.99 96.95 7.90 96.43

Page 6: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Huffman vs ANS in compressors (LZ + entropy coder): from Matt Mahoney benchmarks http://mattmahoney.net/dc/text.html

Enwiki8 100,000,000B

encode time [ns/byte]

decode time [ns/byte]

LZA 0.82b โ€“mx9 โ€“b7 โ€“h7 26,396,613 449 9.7 lzturbo 1.2 โ€“39 โ€“b24 26,915,461 582 2.8

WinRAR 5.00 โ€“ma5 โ€“m5 27,835,431 1004 31 WinRAR 5.00 โ€“ma5 โ€“m2 29,758,785 228 30

lzturbo 1.2 โ€“32 30,979,376 19 2.7 zhuff 0.97 โ€“c2 34,907,478 24 3.5 gzip 1.3.5 โ€“9 36,445,248 101 17

pkzip 2.0.4 โ€“ex 36,556,552 171 50 ZSTD 0.0.1 40,024,854 7.7 3.8

WinRAR 5.00 โ€“ma5 โ€“m1 40,565,268 54 31

zhuff, ZSTD (Yann Collet): LZ4 + tANS (switched from Huffman)

lzturbo (Hamid Bouzidi): LZ + tANS (switched from Huffman)

LZA (Nania Francesco): LZ + rANS (switched from range coding)

e.g. lzturbo vs gzip: better compression, 5x faster encoding, 6x faster decoding

saving time and energy in extremely frequent task

Page 7: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Apple LZFSE = Lempel-Ziv + Finite State Entropy

Default in iOS9 and OS X 10.11 โ€œmatching the compression ratio of

ZLIB level 5, but with much higher

energy efficiency and speed

(between 2x and 3x) for both

encode and decode operationโ€

Finite State Entropy is Yann Colletโ€™s (known from e.g. LZ4)

implementation of tANS

CRAM 3.0 of European Bioinformatics Institute for DNA compression

Format Size Encoding(s) Decoding(s)

BAM 8164228924 1216 111

CRAM v2 (LZ77+Huff) 5716980996 1247 182

CRAM v3 (order 1 rANS) 4922879082 574 190

Page 8: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

gzip used everywhere โ€ฆ but itโ€™s ancient (1993)โ€ฆ what next?

Google โ€“ first Zopfli (2013, gzip compatible) Now Brotli (incompatible, Huffman) โ€“ Firefox support, all news:

โ€œGoogleโ€™s new Brotli compression algorithm is 26 percent better than

existing solutionsโ€ (22.09.2015)

First tests of new ZSTD_HC (Yann Collet, open source, tANS, 28.10.2015):

zstd v0.2 427 ms (239 MB/s), 51367682, 189 ms (541 MB/s)

zstd_HC v0.2 -1 996 ms (102 MB/s), 48918586, 244 ms (419 MB/s)

zstd_HC v0.2 -2 1152 ms (88 MB/s), 47193677, 247 ms (414 MB/s)

zstd_HC v0.2 -3 2810 ms (36 MB/s), 45401096, 249 ms (411 MB/s)

zstd_HC v0.2 -4 4009 ms (25 MB/s), 44631716, 243 ms (421 MB/s)

zstd_HC v0.2 -5 4904 ms (20 MB/s), 44529596, 244 ms (419 MB/s)

zstd_HC v0.2 -6 5785 ms (17 MB/s), 44281722, 244 ms (419 MB/s)

zstd_HC v0.2 -7 7352 ms (13 MB/s), 44108456, 241 ms (424 MB/s)

zstd_HC v0.2 -8 9378 ms (10 MB/s), 43978979, 237 ms (432 MB/s)

zstd_HC v0.2 -9 12306 ms (8 MB/s), 43901117, 237 ms (432 MB/s)

brotli 2015-10-11 level 0 1202 ms (85 MB/s), 47882059, 489 ms (209 MB/s)

brotli 2015-10-11 level 3 1739 ms (58 MB/s), 47451223, 475 ms (215 MB/s)

brotli 2015-10-11 level 4 2379 ms (43 MB/s), 46943273, 477 ms (214 MB/s)

brotli 2015-10-11 level 5 6175 ms (16 MB/s), 43363897, 528 ms (193 MB/s)

brotli 2015-10-11 level 6 9474 ms (10 MB/s), 42877293, 463 ms (221 MB/s)

lzturbo (tANS) seems even better, but close-source.

Page 9: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Operating on fractional number of bits

restricting range ๐‘ to length ๐ฟ subrange contains ๐ฅ๐ (๐‘ต/๐‘ณ) bits

number ๐‘ฅ contains ๐ฅ๐ (๐’™) bits

adding symbol of probability ๐‘ - containing ๐ฅ๐ (๐Ÿ/๐’‘) bits

lg(๐‘/๐ฟ) + lg(1/๐‘) โ‰ˆ lg(๐ฟ/๐ฟโ€ฒ) for ๐‘ณโ€ฒ โ‰ˆ ๐’‘ โ‹… ๐‘ณ lg(๐‘ฅ) + lg(1/๐‘) = lg(๐‘ฅโ€ฒ) for ๐’™โ€ฒ โ‰ˆ ๐’™/๐’‘

Page 10: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

RENORMALIZATION to prevent ๐‘ฅ โ†’ โˆž

Enforce ๐‘ฅ โˆˆ ๐ผ = {๐ฟ, โ€ฆ ,2๐ฟ โˆ’ 1} by

transmitting lowest bits to bitstream

โ€œbufferโ€ ๐’™ contains ๐ฅ๐ (๐’™) bits of information

โ€“ produces bits when accumulated

Symbol of ๐‘ƒ๐‘Ÿ = ๐‘ contains lg (1/๐‘) bits:

lg(๐‘ฅ) โ†’ lg(๐‘ฅ) + lg (1/๐‘) modulo 1

Page 11: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding
Page 12: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

What symbol spread should we choose?

(using PRNG seeded with cryptkey to add encryption)

for example: rate loss for ๐‘ = (0.04, 0.16, 0.16, 0.64)

๐ฟ = 16 states, ๐‘ž = (1, 3, 2, 10), ๐‘ž/๐ฟ = (0.0625, 0.1875, 0.125, 0.625)

method symbol spread dH/H rate

loss

- - ~0.011 penalty of quantizer itself

Huffman 0011222233333333 ~0.080 would give Huffman

decoder

spread_range_i() 0111223333333333 ~0.059 Increasing order

spread_range_d() 3333333333221110 ~0.022 Decreasing order

spread_fast() 0233233133133133 ~0.020 fast

spread_prec() 3313233103332133 ~0.015 close to quantization dH/H

spread_tuned() 3233321333313310 ~0.0046 better than quantization dH/H due to using also ๐‘

spread_tuned_s() 2333312333313310 ~0.0040 L log L complexity (sort)

spread_tuned_p() 2331233330133331 ~0.0058 testing 1/(p ln(1+1/i)) ~ i/p

approximation

Page 13: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

2. (forward) ERROR CORRECTION

Add deterministic redundancy to repair eventual errors

Binary Symmetric Channel (BSC): (e.g. 0101 โ†’ 0111)

Every bit has independent ๐‘๐‘ probability of being flipped (0 โ†” 1)

simplest error correction - Triple Modular Redundacy:

Use 3 (or more) identical copies, decode using Majority Voting

0 โ†’ 000, 1 โ†’ 111 โ€ฆ 001 โ†’ 0, 110 โ†’ 1

Codewords: the used coding sequences (000, 111)

Decoder chooses the closest codeword (in its ball)

Hamming distance: the number of different bits (e. g. ๐‘‘(001,000) = 1)

Page 14: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Triple Modular Redundancy:

2 radius 1 balls (000,111) cover all 3 bits sequences

2 codewords โ‹… (1 + 3) ball size = 23 codes

(7,4) Hamming codes: ๐‘‘(๐‘1, ๐‘2) โ‰ฅ 2 (โ‰ )

7 bits โ€“ ball of radius 1 contains 8 sequences

24 codewords, union of their balls: 24 โ‹… 8 = 27

Finally: 27 codes are divided among 24 radius 1 balls around codewords

Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

...

Encoded data

bits p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 d12 d13 d14 d15

Parity

bit

coverage

p1 X X X X X X X X X X

p2 X X X X X X X X X X

p4 X X X X X X X X X

p8 X X X X X X X X

p16 X X X X X

Page 15: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Triple Modular Redundancy: rate 1/3 (we need to transmit 3 times more)

Assume ๐œ– = 0.01, probability of changed 2 of 3 bits is โ‰ˆ 0.0003 > 0

Theory: rate 1 โˆ’ โ„Ž(0.01) โ‰ˆ 0.919 can fully repair the message

The size of radius ๐œ–๐‘› ball in the space of length ๐‘› sequences (hypercube):

(๐‘›

๐œ–๐‘›) โ‰ˆ 2๐‘›โ„Ž(๐œ–)

The number of such balls: 2๐‘›/2๐‘›โ„Ž(๐œ–) = 2๐‘›(1โˆ’โ„Ž(๐œ–))

We can send ๐‘›(1 โˆ’ โ„Ž(๐œ–)) bits

by choosing one of balls (codewords)

Alternative explanation of rate ๐‘น(๐) = ๐Ÿ โˆ’ ๐’‰(๐) bits/symbol:

additionally having bitflip positions, worth โ„Ž(๐œ–) bits/symbol,

we would get 1 bit/symbol

Theoretical rate is asymptotic โ€“ required long bit blocks

Page 16: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Directly long blocks - Low Density Parity Check (LDPC)

randomly distributed parity bits

Correction uses belief propagation โ€“ operates on Pr(ci = 1)

Ci = Pr(ci = 1) for input message (initialization of the process)

qij (1) (= Ci ) โ€“ โ€˜beliefโ€™ from ci to fj rji (b) โ€“ Pr(ci = b) from fj to ci

Page 17: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Sequential decoding โ€“ search tree of possible corrections

๐‘ ๐‘ก๐‘Ž๐‘ก๐‘’ = ๐‘“(๐‘–๐‘›๐‘๐‘ข๐‘ก_๐‘๐‘™๐‘œ๐‘๐‘˜, ๐‘ ๐‘ก๐‘Ž๐‘ก๐‘’)

e.g. ๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก_๐‘๐‘™๐‘œ๐‘๐‘˜ =

โ€๐‘–๐‘›๐‘๐‘ข๐‘ก_๐‘๐‘™๐‘œ๐‘๐‘˜ | (๐‘ ๐‘ก๐‘Ž๐‘ก๐‘’ & 1)โ€

๐’”๐’•๐’‚๐’•๐’† โ€“ depends on the history โ€“ connects redundancies,

guards the proper correction, uniformly spreading checksums

Encoding uses only some subset of ๐‘ ๐‘ก๐‘Ž๐‘ก๐‘’s (e.g. last bit = 0)

Page 18: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Channels

Binary Symmetric Channel (difficult to repair, well understood) โ€“

bits have independent Pr = ๐‘๐‘ of being flipped (e.g. 0101 โ†’ 0111)

rate = 1 โˆ’ โ„Ž(๐‘๐‘)

Erasure Channel (simple to repair, well understood) -

bits have independent Pr = ๐‘๐‘’ of being erased (e.g. 0101 โ†’ 01? 1)

rate = 1 โˆ’ ๐‘๐‘’

synchronization errors , for example Deletion Channel -

Bits have independent Pr = ๐‘๐‘‘ of being deleted (e.g. 0101 โ†’ 011)

Extremely difficult to repair, rate = ???

Page 19: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Rateless erasure codes: produce some large number of packets,

any large enough subset (โ‰ฅ ๐‘) is sufficient for reconstruction

For example Reed-Salomon codes (slow):

๐‘ โˆ’ 1 degree polynomial is defined by values in any ๐‘ points

Fountain codes (from 2002, Michael Luby)

๐’€ are XORs of some (random) subsets of ๐‘ฟ (๐‘‹1, ๐‘‹2, โ€ฆ ) โ†’ (๐‘Œ1, ๐‘Œ2, โ€ฆ )

e.g. if ๐‘Œ1 = ๐‘‹1 , ๐‘Œ2 = ๐‘‹1 XOR ๐‘‹2 then ๐‘‹1 = ๐‘Œ1 , ๐‘‹2 = ๐‘Œ2 XOR ๐‘‹1

What densities of XORred numbers should we chose?

Pr(1) =1

๐‘ Pr(๐‘‘) =

1

๐‘‘(๐‘‘โˆ’1) for ๐‘‘ = 2, โ€ฆ , ๐‘

Quick, but needs (1 + ๐œ–)๐‘ to reliably decode (matrix invertible?)

Page 20: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

If packet wasnโ€™t lost, it is probable damaged

protecting with Error Correction requires a priori damage level knowledge

unavailable in: data storage, watermarking, unknown packet route

Joint Reconstruction Codes: the same using a posteriori damage level knowledge

Page 21: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Some packets are lost, some are damaged

Adding redundancy (sender)

requires knowing the damage level

Not available in many situations (JRCโ€ฆ):

- damage of storage medium depends

on its history,

- broadcaster doesnโ€™t know individual

damage levels,

- damage of watermarking

depends on capturing conditions

- damage of packet in network

depends on its route

- rapidly varying conditions

can prevent adaptation

Page 22: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

We could add some statistical/deterministic knowledge to decoder,

Getting compression/entropy coding without encoderโ€™s knowledge

Page 23: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

JRC : undamaged case when received ๐‘ + 1 packets

Large ๐‘: Gaussian distribution, on average 1.5 more nodes to test

Damaged case (๐‘ = 3, 2: ๐œ– = 0, ๐‘‘: ๐œ– = 0.2): โ‰ˆ Pareto disstribution

Page 24: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Sequential decoding: choose and expand most promising candidate (largest weight)

Rate is ๐‘…๐‘(๐œ–) = 1 โˆ’ โ„Ž1/(1+๐‘)(๐œ–) for Pareto coefficient ๐‘

where โ„Ž๐‘ข(๐œ–) =๐œ–๐‘ข+(1โˆ’๐œ–)๐‘ข

1โˆ’๐‘ข is Renyi entropy, โ„Ž1 is Shannon entropy

๐‘…0(๐œ–) = 1 โˆ’ โ„Ž(๐œ–)

๐‘…1(๐œ–) = lg (1 + 2โˆš๐œ–(1 โˆ’ ๐œ–))

Pr(# steps > ๐‘ ) โˆ ๐‘ โˆ’๐‘

Page 25: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

3. KUZNETSOV-TSYBAKOV PROBLEM, rate distortion

Imagine we can send ๐’ bits, but ๐’Œ = ๐’‘๐’‡๐’ of them are damaged (fixed)

For example QR-like codes with fixed some fraction of pixels (e.g. as contour of a plane):

If the receiver would know positions of these strong constraints,

we could just use the remaining ๐‘› โˆ’ ๐‘˜ = (1 โˆ’ ๐‘๐‘“)๐‘› bits

Kuznetsov-Tsybakov problem: what if only the sender knows the constraints?

Surprisingly, we can still approach the same capacity.

But the cost (paid only by the sender!) grows to infinity there.

Page 26: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Weak/statistical constraints can be applied to all positions

e.g. standard steganography problem:

To avoid detection, we would like to modify at least as possible

Entropy coder: difference map carries โ„Ž(๐‘) โ‰ˆ 0.5 bits/symbol โ€ฆ picture needed for XOR

Kuznetsov-Tsybakov โ€ฆ do receiver have to know the constraints?

Page 27: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Homogeneous contrast case: use (๐‘, 1 โˆ’ ๐‘) or (1 โˆ’ ๐‘, ๐‘) distributions

Page 28: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

General constraints: grayness of a pixel is probability of using 1 there

Page 29: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Approaching the optimum: use the freedom of choosing the exact encoding sequence!

Find the most suitable encoding sequence (F) for our constraints.

The cost (search) is paid while encoding, decoding is straightforward.

Decoder cannot directly deduce the constraints.

Uniform grayness: probability that a random length ๐‘› bit sequence have ๐’ˆ of โ€œ1โ€ is

1

2๐‘› ( ๐‘›๐‘”๐‘›

) โ‰ˆ 2โˆ’๐‘›(1โˆ’โ„Ž(๐‘”))

so if we had essentially more than ๐Ÿ๐’(๐Ÿโˆ’๐’‰(๐’ˆ)) different (random) ways to encode

single message, with high probability some of them will use ๐’ˆ of โ€œ1โ€.

The cost: the number of different messages reduces to at most

2๐‘›/2๐‘›(1โˆ’โ„Ž(๐‘”)) = 2๐‘›โ„Ž(๐‘”) giving rate limit exactly as if the receiver would know ๐’ˆ

Kuznetsov-Tsybakov: probability of accidently fulfilling all ๐’๐’‘๐’‡ constraints is ๐Ÿโˆ’๐’๐’‘๐’‡

so we need to have more than 2๐‘›๐‘๐‘“ different (pseudorandom) ways of encoding โ€“

we could encode 2๐‘›/2๐‘›๐‘๐‘“ = 2๐‘›(1โˆ’๐‘๐‘“) different messages there โ€“

the rate limit is ๐Ÿ โˆ’ ๐’‘๐’‡ as if the receiver would know positions of fixed bits.

Page 30: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding
Page 31: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Constrained Coding: find the closest ๐’€ = ๐‘ป(๐ฉ๐š๐ฒ๐ฅ๐จ๐š๐_๐›๐ข๐ญ๐ฌ, ๐Ÿ๐ซ๐ž๐ž๐๐จ๐ฆ_๐›๐ข๐ญ๐ฌ), send ๐’€

Rate Distortion: find the closest ๐‘ป(๐ŸŽ, ๐Ÿ๐ซ๐ž๐ž๐๐จ๐ฆ_๐›๐ข๐ญ๐ฌ), send freedom_bits

Because T is bijection, for the same constraints/message to fulfill/resemble,

density of freedom bits + density of payload bits = 1

rate of RD + rate of CC = 1

we can translate previous calculations by just using 1 - rate

B&W picture: 1 bit per pixel = visual aspect (RD) + hidden message (CC)

Page 32: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

4) Distributed Source Coding โ€“ separated compression and

joint decompression of uncommunicated correlated sources

e.g. for correlated sensors, stereo recording (sound, video)

Page 33: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐’‘ ใ‘…contains /๐’‘ใ‘† bits Encoding

Statistical physics on 2๐‘› hypercube of sequences

( ๐‘›๐‘๐‘›

) โ‰ˆ 2๐‘›โ„Ž(๐‘) โ„Ž(๐‘) = โˆ’๐‘ lg(๐‘) โˆ’ (1 โˆ’ ๐‘)lg (1 โˆ’ ๐‘)

Entropy coding: symbol of probability ๐‘ contains lg (1

๐‘) bits of information

Error correction: ๐œ–-bitflip probability bit carries 1 โˆ’ โ„Ž(๐œ–) bits of information

encode as one of codewords: centers of size 2๐‘›โ„Ž(๐œ–) balls

JRC: reconstruct if sum of 1 โˆ’ โ„Ž(๐œ–) over received packets is sufficient

Rate distortion: approximate sequence as the closest codeword

Constrained coding: for each message there is a (disjoint) set of codewords

for given constraints, we find the closest codeword representing our message

Distributed source coding: e.g. codewords include correlations