unusual coding possibilities information theory in...
TRANSCRIPT
![Page 1: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/1.jpg)
Unusual coding possibilities โ
information theory in practice statistical physics on hypercube of 2๐ sequences
1) Entropy coding: remove statistical redundancy in data compression:
Huffman, arithmetic coding, ANS
2) (Forward) Error correction: add deterministic redundancy to repair
damaged message (erasure, BSC, deletion channels):
LDPC, sequential decoding, fountain codes and expansions
3) Rate distortion: shorter description approximation (lossy
compression) , Kuznetsov-Tsybakov: (statistical) constraints known
only to sender (steganography, QR codes) and their โdualityโ
4) Distributed source coding โ for multiple correlated sources that
donโt communicate with each other.
Jarosลaw Duda, Krakรณw, 23.XI.2015
![Page 2: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/2.jpg)
1. ENTROPY CODING โ DATA COMPRESSION
We need ๐ bits of information to choose one of ๐๐ possibilities.
For length ๐ 0/1 sequences with ๐๐ of โ1โ, how many bits we need to choose one?
๐!
(๐๐)! ((1โ๐)๐)!โ
(๐/๐)๐
(๐๐/๐)๐๐((1โ๐)๐/๐)(1โ๐)๐ = 2๐ lg(๐)โ๐๐ lg(๐๐)โ(1โ๐)๐ lg((1โ๐)๐) = 2๐(โ๐ lg(๐)โ(1โ๐)lg (1โ๐))
Entropy coder: encodes sequence with (๐๐ )๐ =0..๐โ1 probability distribution
using asymptotically at least ๐ฏ = โ ๐๐ ๐ฅ๐ (๐/๐๐)๐ bits/symbol (๐ป โค lg(๐))
Seen as weighted average: symbol of probability ๐ contains ๐ฅ๐ (๐/๐) bits
![Page 3: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/3.jpg)
symbol of probability ๐
contains ๐ฅ๐ (๐/๐) bits
symbol โ length ๐ bit sequence
assumes: Pr(symbol) โ 2โ๐
Huffman coding:
- Sort symbols by frequencies
- Replace two least frequent
with tree of them and so on
generally: prefix codes
defined by a tree
unique decoding:
no sequence is a prefix of another
http://en.wikipedia.org/wiki/Huffman_coding
๐ด โ 0, ๐ต โ 100, ๐ถ โ 101, ๐ท โ 110, ๐ธ โ 111
![Page 4: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/4.jpg)
Symbol of probability ๐ contains ๐ฅ๐ (๐/๐) bits
Encoding (๐๐ ) distribution with entropy coder optimal for (๐๐ ) distribution costs
ฮ๐ป = โ ๐๐ lg(1/๐๐ ) โ โ ๐๐ lg(1/๐๐ ) = โ ๐๐ lg (๐๐
๐๐ ) โ
1
ln (4)๐ ๐ โ(๐๐ โ๐๐ )2
๐๐ ๐ ๐
more bits/symbol โ so called Kullback-Leibler โdistanceโ (not symmetric).
Huffman/prefix codes โ encode symbols directly as bit sequences
Approximates probabilities as powers of 1/2
e.g. p(A) =1/2, p(B)=1/4, p(C)=1/4
example of Huffman penalty
for ๐(๐ โ ๐)๐ distribution on 256 size alphabet
Huffman: 8 bits โ at least 1 bit
ratio โค 8 here
terrible at encoding very frequent events
Ratio โ 8/H Huffman ANS ๐ = 0.5 4.001 3.99 4.00 ๐ = 0.6 4.935 4.78 4.93 ๐ = 0.7 6.344 5.58 6.33 ๐ = 0.8 8.851 6.38 8.84 ๐ = 0.9 15.31 7.18 15.29 ๐ = 0.95 26.41 7.58 26.38 ๐ = 0.99 96.95 7.90 96.43
0
0
1
1A
B C
A 0
C 11
B 10
![Page 5: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/5.jpg)
Huffman coding (HC), prefix codes, needs sorting: example of Huffman penalty
most of everyday compression, for truncated ๐(1 โ ๐)๐ distribution
e.g. zip, gzip, cab, jpeg, gif, png, tiff, pdf, mp3โฆ (canโt use less than 1 bits/symbol) Zlibh (the fastest generally available implementation):
Encoding ~ 320 MB/s (/core, 3GHz)
Decoding ~ 300-350 MB/s
Range coding (RC): large alphabet arithmetic coding,
needs multiplication, e.g. 7-Zip, VP Google video codecs
(e.g. YouTube, Hangouts).
Encoding ~ 100-150 MB/s tradeoffs
Decoding ~ 80 MB/s
(binary) Arithmetic coding (AC):
H.264, H.265 video, ultracompressors e.g. PAQ Huffman: 1byte โ at least 1 bit
Encoding/decoding ~ 20-30MB/s ratio โค 8 here
Asymmetric Numeral Systems (ANS) tabled (tANS) - no multiplication or sorting
FSE implementation of tANS: Encoding ~ 350 MB/s Decoding ~ 500 MB/s
RC โ ANS: ~7x decoding speedup, no multiplication (switched e.g. in LZA compressor)
HC โ ANS means better compression and ~ 1.5x decoding speedup (e.g. zhuff, lzturbo)
Ratio โ 8/H zlibh FSE ๐ = 0.5 4.001 3.99 4.00 ๐ = 0.6 4.935 4.78 4.93 ๐ = 0.7 6.344 5.58 6.33 ๐ = 0.8 8.851 6.38 8.84 ๐ = 0.9 15.31 7.18 15.29
๐ = 0.95 26.41 7.58 26.38 ๐ = 0.99 96.95 7.90 96.43
![Page 6: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/6.jpg)
Huffman vs ANS in compressors (LZ + entropy coder): from Matt Mahoney benchmarks http://mattmahoney.net/dc/text.html
Enwiki8 100,000,000B
encode time [ns/byte]
decode time [ns/byte]
LZA 0.82b โmx9 โb7 โh7 26,396,613 449 9.7 lzturbo 1.2 โ39 โb24 26,915,461 582 2.8
WinRAR 5.00 โma5 โm5 27,835,431 1004 31 WinRAR 5.00 โma5 โm2 29,758,785 228 30
lzturbo 1.2 โ32 30,979,376 19 2.7 zhuff 0.97 โc2 34,907,478 24 3.5 gzip 1.3.5 โ9 36,445,248 101 17
pkzip 2.0.4 โex 36,556,552 171 50 ZSTD 0.0.1 40,024,854 7.7 3.8
WinRAR 5.00 โma5 โm1 40,565,268 54 31
zhuff, ZSTD (Yann Collet): LZ4 + tANS (switched from Huffman)
lzturbo (Hamid Bouzidi): LZ + tANS (switched from Huffman)
LZA (Nania Francesco): LZ + rANS (switched from range coding)
e.g. lzturbo vs gzip: better compression, 5x faster encoding, 6x faster decoding
saving time and energy in extremely frequent task
![Page 7: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/7.jpg)
Apple LZFSE = Lempel-Ziv + Finite State Entropy
Default in iOS9 and OS X 10.11 โmatching the compression ratio of
ZLIB level 5, but with much higher
energy efficiency and speed
(between 2x and 3x) for both
encode and decode operationโ
Finite State Entropy is Yann Colletโs (known from e.g. LZ4)
implementation of tANS
CRAM 3.0 of European Bioinformatics Institute for DNA compression
Format Size Encoding(s) Decoding(s)
BAM 8164228924 1216 111
CRAM v2 (LZ77+Huff) 5716980996 1247 182
CRAM v3 (order 1 rANS) 4922879082 574 190
![Page 8: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/8.jpg)
gzip used everywhere โฆ but itโs ancient (1993)โฆ what next?
Google โ first Zopfli (2013, gzip compatible) Now Brotli (incompatible, Huffman) โ Firefox support, all news:
โGoogleโs new Brotli compression algorithm is 26 percent better than
existing solutionsโ (22.09.2015)
First tests of new ZSTD_HC (Yann Collet, open source, tANS, 28.10.2015):
zstd v0.2 427 ms (239 MB/s), 51367682, 189 ms (541 MB/s)
zstd_HC v0.2 -1 996 ms (102 MB/s), 48918586, 244 ms (419 MB/s)
zstd_HC v0.2 -2 1152 ms (88 MB/s), 47193677, 247 ms (414 MB/s)
zstd_HC v0.2 -3 2810 ms (36 MB/s), 45401096, 249 ms (411 MB/s)
zstd_HC v0.2 -4 4009 ms (25 MB/s), 44631716, 243 ms (421 MB/s)
zstd_HC v0.2 -5 4904 ms (20 MB/s), 44529596, 244 ms (419 MB/s)
zstd_HC v0.2 -6 5785 ms (17 MB/s), 44281722, 244 ms (419 MB/s)
zstd_HC v0.2 -7 7352 ms (13 MB/s), 44108456, 241 ms (424 MB/s)
zstd_HC v0.2 -8 9378 ms (10 MB/s), 43978979, 237 ms (432 MB/s)
zstd_HC v0.2 -9 12306 ms (8 MB/s), 43901117, 237 ms (432 MB/s)
brotli 2015-10-11 level 0 1202 ms (85 MB/s), 47882059, 489 ms (209 MB/s)
brotli 2015-10-11 level 3 1739 ms (58 MB/s), 47451223, 475 ms (215 MB/s)
brotli 2015-10-11 level 4 2379 ms (43 MB/s), 46943273, 477 ms (214 MB/s)
brotli 2015-10-11 level 5 6175 ms (16 MB/s), 43363897, 528 ms (193 MB/s)
brotli 2015-10-11 level 6 9474 ms (10 MB/s), 42877293, 463 ms (221 MB/s)
lzturbo (tANS) seems even better, but close-source.
![Page 9: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/9.jpg)
Operating on fractional number of bits
restricting range ๐ to length ๐ฟ subrange contains ๐ฅ๐ (๐ต/๐ณ) bits
number ๐ฅ contains ๐ฅ๐ (๐) bits
adding symbol of probability ๐ - containing ๐ฅ๐ (๐/๐) bits
lg(๐/๐ฟ) + lg(1/๐) โ lg(๐ฟ/๐ฟโฒ) for ๐ณโฒ โ ๐ โ ๐ณ lg(๐ฅ) + lg(1/๐) = lg(๐ฅโฒ) for ๐โฒ โ ๐/๐
![Page 10: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/10.jpg)
RENORMALIZATION to prevent ๐ฅ โ โ
Enforce ๐ฅ โ ๐ผ = {๐ฟ, โฆ ,2๐ฟ โ 1} by
transmitting lowest bits to bitstream
โbufferโ ๐ contains ๐ฅ๐ (๐) bits of information
โ produces bits when accumulated
Symbol of ๐๐ = ๐ contains lg (1/๐) bits:
lg(๐ฅ) โ lg(๐ฅ) + lg (1/๐) modulo 1
![Page 11: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/11.jpg)
![Page 12: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/12.jpg)
What symbol spread should we choose?
(using PRNG seeded with cryptkey to add encryption)
for example: rate loss for ๐ = (0.04, 0.16, 0.16, 0.64)
๐ฟ = 16 states, ๐ = (1, 3, 2, 10), ๐/๐ฟ = (0.0625, 0.1875, 0.125, 0.625)
method symbol spread dH/H rate
loss
- - ~0.011 penalty of quantizer itself
Huffman 0011222233333333 ~0.080 would give Huffman
decoder
spread_range_i() 0111223333333333 ~0.059 Increasing order
spread_range_d() 3333333333221110 ~0.022 Decreasing order
spread_fast() 0233233133133133 ~0.020 fast
spread_prec() 3313233103332133 ~0.015 close to quantization dH/H
spread_tuned() 3233321333313310 ~0.0046 better than quantization dH/H due to using also ๐
spread_tuned_s() 2333312333313310 ~0.0040 L log L complexity (sort)
spread_tuned_p() 2331233330133331 ~0.0058 testing 1/(p ln(1+1/i)) ~ i/p
approximation
![Page 13: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/13.jpg)
2. (forward) ERROR CORRECTION
Add deterministic redundancy to repair eventual errors
Binary Symmetric Channel (BSC): (e.g. 0101 โ 0111)
Every bit has independent ๐๐ probability of being flipped (0 โ 1)
simplest error correction - Triple Modular Redundacy:
Use 3 (or more) identical copies, decode using Majority Voting
0 โ 000, 1 โ 111 โฆ 001 โ 0, 110 โ 1
Codewords: the used coding sequences (000, 111)
Decoder chooses the closest codeword (in its ball)
Hamming distance: the number of different bits (e. g. ๐(001,000) = 1)
![Page 14: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/14.jpg)
Triple Modular Redundancy:
2 radius 1 balls (000,111) cover all 3 bits sequences
2 codewords โ (1 + 3) ball size = 23 codes
(7,4) Hamming codes: ๐(๐1, ๐2) โฅ 2 (โ )
7 bits โ ball of radius 1 contains 8 sequences
24 codewords, union of their balls: 24 โ 8 = 27
Finally: 27 codes are divided among 24 radius 1 balls around codewords
Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
...
Encoded data
bits p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 d12 d13 d14 d15
Parity
bit
coverage
p1 X X X X X X X X X X
p2 X X X X X X X X X X
p4 X X X X X X X X X
p8 X X X X X X X X
p16 X X X X X
![Page 15: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/15.jpg)
Triple Modular Redundancy: rate 1/3 (we need to transmit 3 times more)
Assume ๐ = 0.01, probability of changed 2 of 3 bits is โ 0.0003 > 0
Theory: rate 1 โ โ(0.01) โ 0.919 can fully repair the message
The size of radius ๐๐ ball in the space of length ๐ sequences (hypercube):
(๐
๐๐) โ 2๐โ(๐)
The number of such balls: 2๐/2๐โ(๐) = 2๐(1โโ(๐))
We can send ๐(1 โ โ(๐)) bits
by choosing one of balls (codewords)
Alternative explanation of rate ๐น(๐) = ๐ โ ๐(๐) bits/symbol:
additionally having bitflip positions, worth โ(๐) bits/symbol,
we would get 1 bit/symbol
Theoretical rate is asymptotic โ required long bit blocks
![Page 16: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/16.jpg)
Directly long blocks - Low Density Parity Check (LDPC)
randomly distributed parity bits
Correction uses belief propagation โ operates on Pr(ci = 1)
Ci = Pr(ci = 1) for input message (initialization of the process)
qij (1) (= Ci ) โ โbeliefโ from ci to fj rji (b) โ Pr(ci = b) from fj to ci
![Page 17: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/17.jpg)
Sequential decoding โ search tree of possible corrections
๐ ๐ก๐๐ก๐ = ๐(๐๐๐๐ข๐ก_๐๐๐๐๐, ๐ ๐ก๐๐ก๐)
e.g. ๐๐ข๐ก๐๐ข๐ก_๐๐๐๐๐ =
โ๐๐๐๐ข๐ก_๐๐๐๐๐ | (๐ ๐ก๐๐ก๐ & 1)โ
๐๐๐๐๐ โ depends on the history โ connects redundancies,
guards the proper correction, uniformly spreading checksums
Encoding uses only some subset of ๐ ๐ก๐๐ก๐s (e.g. last bit = 0)
![Page 18: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/18.jpg)
Channels
Binary Symmetric Channel (difficult to repair, well understood) โ
bits have independent Pr = ๐๐ of being flipped (e.g. 0101 โ 0111)
rate = 1 โ โ(๐๐)
Erasure Channel (simple to repair, well understood) -
bits have independent Pr = ๐๐ of being erased (e.g. 0101 โ 01? 1)
rate = 1 โ ๐๐
synchronization errors , for example Deletion Channel -
Bits have independent Pr = ๐๐ of being deleted (e.g. 0101 โ 011)
Extremely difficult to repair, rate = ???
![Page 19: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/19.jpg)
Rateless erasure codes: produce some large number of packets,
any large enough subset (โฅ ๐) is sufficient for reconstruction
For example Reed-Salomon codes (slow):
๐ โ 1 degree polynomial is defined by values in any ๐ points
Fountain codes (from 2002, Michael Luby)
๐ are XORs of some (random) subsets of ๐ฟ (๐1, ๐2, โฆ ) โ (๐1, ๐2, โฆ )
e.g. if ๐1 = ๐1 , ๐2 = ๐1 XOR ๐2 then ๐1 = ๐1 , ๐2 = ๐2 XOR ๐1
What densities of XORred numbers should we chose?
Pr(1) =1
๐ Pr(๐) =
1
๐(๐โ1) for ๐ = 2, โฆ , ๐
Quick, but needs (1 + ๐)๐ to reliably decode (matrix invertible?)
![Page 20: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/20.jpg)
If packet wasnโt lost, it is probable damaged
protecting with Error Correction requires a priori damage level knowledge
unavailable in: data storage, watermarking, unknown packet route
Joint Reconstruction Codes: the same using a posteriori damage level knowledge
![Page 21: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/21.jpg)
Some packets are lost, some are damaged
Adding redundancy (sender)
requires knowing the damage level
Not available in many situations (JRCโฆ):
- damage of storage medium depends
on its history,
- broadcaster doesnโt know individual
damage levels,
- damage of watermarking
depends on capturing conditions
- damage of packet in network
depends on its route
- rapidly varying conditions
can prevent adaptation
![Page 22: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/22.jpg)
We could add some statistical/deterministic knowledge to decoder,
Getting compression/entropy coding without encoderโs knowledge
![Page 23: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/23.jpg)
JRC : undamaged case when received ๐ + 1 packets
Large ๐: Gaussian distribution, on average 1.5 more nodes to test
Damaged case (๐ = 3, 2: ๐ = 0, ๐: ๐ = 0.2): โ Pareto disstribution
![Page 24: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/24.jpg)
Sequential decoding: choose and expand most promising candidate (largest weight)
Rate is ๐ ๐(๐) = 1 โ โ1/(1+๐)(๐) for Pareto coefficient ๐
where โ๐ข(๐) =๐๐ข+(1โ๐)๐ข
1โ๐ข is Renyi entropy, โ1 is Shannon entropy
๐ 0(๐) = 1 โ โ(๐)
๐ 1(๐) = lg (1 + 2โ๐(1 โ ๐))
Pr(# steps > ๐ ) โ ๐ โ๐
![Page 25: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/25.jpg)
3. KUZNETSOV-TSYBAKOV PROBLEM, rate distortion
Imagine we can send ๐ bits, but ๐ = ๐๐๐ of them are damaged (fixed)
For example QR-like codes with fixed some fraction of pixels (e.g. as contour of a plane):
If the receiver would know positions of these strong constraints,
we could just use the remaining ๐ โ ๐ = (1 โ ๐๐)๐ bits
Kuznetsov-Tsybakov problem: what if only the sender knows the constraints?
Surprisingly, we can still approach the same capacity.
But the cost (paid only by the sender!) grows to infinity there.
![Page 26: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/26.jpg)
Weak/statistical constraints can be applied to all positions
e.g. standard steganography problem:
To avoid detection, we would like to modify at least as possible
Entropy coder: difference map carries โ(๐) โ 0.5 bits/symbol โฆ picture needed for XOR
Kuznetsov-Tsybakov โฆ do receiver have to know the constraints?
![Page 27: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/27.jpg)
Homogeneous contrast case: use (๐, 1 โ ๐) or (1 โ ๐, ๐) distributions
![Page 28: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/28.jpg)
General constraints: grayness of a pixel is probability of using 1 there
![Page 29: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/29.jpg)
Approaching the optimum: use the freedom of choosing the exact encoding sequence!
Find the most suitable encoding sequence (F) for our constraints.
The cost (search) is paid while encoding, decoding is straightforward.
Decoder cannot directly deduce the constraints.
Uniform grayness: probability that a random length ๐ bit sequence have ๐ of โ1โ is
1
2๐ ( ๐๐๐
) โ 2โ๐(1โโ(๐))
so if we had essentially more than ๐๐(๐โ๐(๐)) different (random) ways to encode
single message, with high probability some of them will use ๐ of โ1โ.
The cost: the number of different messages reduces to at most
2๐/2๐(1โโ(๐)) = 2๐โ(๐) giving rate limit exactly as if the receiver would know ๐
Kuznetsov-Tsybakov: probability of accidently fulfilling all ๐๐๐ constraints is ๐โ๐๐๐
so we need to have more than 2๐๐๐ different (pseudorandom) ways of encoding โ
we could encode 2๐/2๐๐๐ = 2๐(1โ๐๐) different messages there โ
the rate limit is ๐ โ ๐๐ as if the receiver would know positions of fixed bits.
![Page 30: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/30.jpg)
![Page 31: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/31.jpg)
Constrained Coding: find the closest ๐ = ๐ป(๐ฉ๐๐ฒ๐ฅ๐จ๐๐_๐๐ข๐ญ๐ฌ, ๐๐ซ๐๐๐๐จ๐ฆ_๐๐ข๐ญ๐ฌ), send ๐
Rate Distortion: find the closest ๐ป(๐, ๐๐ซ๐๐๐๐จ๐ฆ_๐๐ข๐ญ๐ฌ), send freedom_bits
Because T is bijection, for the same constraints/message to fulfill/resemble,
density of freedom bits + density of payload bits = 1
rate of RD + rate of CC = 1
we can translate previous calculations by just using 1 - rate
B&W picture: 1 bit per pixel = visual aspect (RD) + hidden message (CC)
![Page 32: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/32.jpg)
4) Distributed Source Coding โ separated compression and
joint decompression of uncommunicated correlated sources
e.g. for correlated sensors, stereo recording (sound, video)
![Page 33: Unusual coding possibilities information theory in practicechaos.if.uj.edu.pl/ZOA/files/semianria/chaos/23.11.2015.pdfย ยท Symbol of probability ๐ ใ
contains /๐ใ bits Encoding](https://reader031.vdocuments.net/reader031/viewer/2022041217/5e0691060e104e54d3466265/html5/thumbnails/33.jpg)
Statistical physics on 2๐ hypercube of sequences
( ๐๐๐
) โ 2๐โ(๐) โ(๐) = โ๐ lg(๐) โ (1 โ ๐)lg (1 โ ๐)
Entropy coding: symbol of probability ๐ contains lg (1
๐) bits of information
Error correction: ๐-bitflip probability bit carries 1 โ โ(๐) bits of information
encode as one of codewords: centers of size 2๐โ(๐) balls
JRC: reconstruct if sum of 1 โ โ(๐) over received packets is sufficient
Rate distortion: approximate sequence as the closest codeword
Constrained coding: for each message there is a (disjoint) set of codewords
for given constraints, we find the closest codeword representing our message
Distributed source coding: e.g. codewords include correlations