coding data for networks - uzhuser.math.uzh.ch/trautmann/network_data_coding2.pdfcoding data for...

Coding Data for Networks

Anna-Lena Horlemann-Trautmann

Algorithmics Laboratory, EPFL, Schweiz

September 27th, 2016University of St. Gallen

Information theory (100 years Shannon)

– 2016 –

100th anniversary of the Father of Information Theory

Claude Shannon (1916 - 2001)1

1picture from www.techzibits.com

1 / 22

Shannon’s pioneering works in information theory:

Channel coding (1948):

Noisy-channel coding theorem/Shannon capacity (maximuminformation transfer rate for a given channel and noise level)

Compression (1948):

Source coding theorem (limits to possible data compression)

Cryptography (1949):

One-time pad is the only theoretically unbreakable cipher

2 / 22

Shannon provided answers to questions of the type

“What is possible in theory?”

Subsequent research:

how to algorithmically achieve those optimal scenarios

other types of channels

lossy compression

computationally secure cryptography

3 / 22

Channel Coding

1 Information theory (100 years Shannon)

2 Channel Coding

3 Network CodingSubspace CodesThe Schubert Calculus Point of View

4 Conclusions and Related Work

Channel Coding

... deals with noisy transmission of information

over space (communication)

over time (storage)

To deal with the noise

the data is encoded with added redundancy,

the receiver can “filter out” the noise (decoding)

and then recover the sent data.

4 / 22

Channel Coding

... deals with noisy transmission of information

over space (communication)

over time (storage)

To deal with the noise

the data is encoded with added redundancy,

the receiver can “filter out” the noise (decoding)

and then recover the sent data.

4 / 22

Channel Coding

Why is the channel noisy?

badly insulated cable

interference of electronic devices

fading in wireless signals

scratches on CD

charge leakage in flash drive

power outage in data center

and many more...

5 / 22

Channel Coding

Different channel models:

binary symmetric channel (symbols from F2, additiveerrors)

binary erasure channel (symbols from F2, erasures)

fading channels (symbols from C, multiplicative fading andadditive errors)

network channels (symbols from Fqm , network operationsand additive errors)

and many more...

6 / 22

Channel Coding

Different channel models:

binary symmetric channel (symbols from F2, additiveerrors)

binary erasure channel (symbols from F2, erasures)

fading channels (symbols from C, multiplicative fading andadditive errors)

network channels (symbols from Fqm , network operationsand additive errors)

and many more...

6 / 22

Channel Coding

Example (Binary symmetric channel, p < 1/2)

We encode 0 as 000 and 1 as 111 (repetition code). If one errorhappens we can uniquely decide which was the sent codeword.E.g., if we receive r = 110 we decode to 111, since

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Channel Coding

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.

With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Channel Coding

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Channel Coding

General setup:

codewords are vectors of length n over a finite field Fq

received word = codeword + error vector (r = c+ e ∈ Fnq )

most likely sent codeword ∼= closest codeword w.r.t.Hamming distance:

dH((u1, . . . , un), (v1, . . . , vn)) := |{i | ui = vi}|.

Definition

A block code is a subset C ⊆ Fnq . The minimum (Hamming)

distance of the code is defined as

dH(C) := min{dH(u, v) | u, v ∈ C, u = v}.

error-correction capability of C is (dH(C)− 1)/2

transmission rate of C is logq(|C|)/n

8 / 22

Channel Coding

General setup:

codewords are vectors of length n over a finite field Fq

received word = codeword + error vector (r = c+ e ∈ Fnq )

most likely sent codeword ∼= closest codeword w.r.t.Hamming distance:

dH((u1, . . . , un), (v1, . . . , vn)) := |{i | ui = vi}|.

Definition

A block code is a subset C ⊆ Fnq . The minimum (Hamming)

distance of the code is defined as

dH(C) := min{dH(u, v) | u, v ∈ C, u = v}.

error-correction capability of C is (dH(C)− 1)/2

transmission rate of C is logq(|C|)/n8 / 22

Channel Coding

Typical questions in channel coding theory:

For a given error correction capability, what is the besttransmission rate?=⇒ packing problem in given metric space (Fn

How can one efficiently encode, decode, recover themessages?=⇒ algebraic structure in the code

What is the trade-off between the two above?

Typical tools to answer these questions:

linear subspaces of Fnq

polynomials in Fq[x]

projective geometry

9 / 22

Channel Coding

Typical questions in channel coding theory:

For a given error correction capability, what is the besttransmission rate?=⇒ packing problem in given metric space (Fn

How can one efficiently encode, decode, recover themessages?=⇒ algebraic structure in the code

What is the trade-off between the two above?

Typical tools to answer these questions:

linear subspaces of Fnq

polynomials in Fq[x]

projective geometry

9 / 22

Network Coding

Subspace Codes

2 Channel Coding

Network Coding

Subspace Codes

Multicast Model

All receivers want to receive the same information.

10 / 22

Network Coding

Subspace Codes

Example (Butterfly Network)

Linearly combining is better than forwarding:

R1 receives only a, R2 receives a and b.

Forwarding: need 2 transmissions to transmit a, b to bothreceivers

Linearly combining: need 1 transmission to transmit a, b toboth receivers

11 / 22

Network Coding

Subspace Codes

Example (Butterfly Network)

Linearly combining is better than forwarding:

a+ba+b

R1 and R2 can both recover a and b with one operation.

Forwarding: need 2 transmissions to transmit a, b to bothreceivers

Linearly combining: need 1 transmission to transmit a, b toboth receivers

11 / 22

Network Coding

Subspace Codes

It turns out that linear combinations at the inner nodes are“sufficient” to reach capacity:

Theorem

One can reach the capacity of a single-source multicast networkchannel with linear combinations at the inner nodes.

When we consider large or time-varying networks, we allow theinner nodes to transmit random linear combinations of theirincoming vectors.

Theorem

One can reach the capacity of a single-source multicast networkchannel with random linear combinations at the inner nodes,provided that the field size is large.

12 / 22

Network Coding

Subspace Codes

It turns out that linear combinations at the inner nodes are“sufficient” to reach capacity:

Theorem

One can reach the capacity of a single-source multicast networkchannel with linear combinations at the inner nodes.

When we consider large or time-varying networks, we allow theinner nodes to transmit random linear combinations of theirincoming vectors.

Theorem

One can reach the capacity of a single-source multicast networkchannel with random linear combinations at the inner nodes,provided that the field size is large.

12 / 22

Network Coding

Subspace Codes

Problem 1: errors propagate!

Problem 2: receiver does not know the random operations

Solution: Use a metric space such that

1 # of errors is reflected in the distance between points, and

2 the points are invariant under linear combinations.

13 / 22

Network Coding

Subspace Codes

13 / 22

Network Coding

Subspace Codes

13 / 22

Network Coding

Subspace Codes

13 / 22

Network Coding

Subspace Codes

Definition

Grassmann variety: Gq(k, n) := {U ≤ Fnq | dim(U) = k}

subspace distance: dS(U, V ) := 2k − 2 dim(U ∩ V )

Gq(k, n) equipped with dS is a metric space.

Definition

A (constant dimension) subspace code is a subset of Gq(k, n).The minimum distance of the code C ⊆ Gq(k, n) is defined as

dS(C) := min{dS(U, V ) | U, V ∈ C,U = V }.

The error-correction capability in the network coding setting ofa subspace code C is (dS(C)− 1)/2.

14 / 22

Network Coding

Subspace Codes

Example (in G2(2, 4))

(1 0 0 00 1 0 0

(1 0 1 00 1 0 1

)}, dS(C) = 4.

10001100

11000100

No errors: receive a (different) basis of the same vector space

15 / 22

Network Coding

Subspace Codes

Example (in G2(2, 4))

(1 0 0 00 1 0 0

(1 0 1 00 1 0 1

)}, dS(C) = 4.

10011101

11010100

One error: dS(received, sent) = 2, dS(received, other) = 4

15 / 22

Network Coding

Subspace Codes

Research goals

Find good packings in Gq(k, n) with a given maximalintersection of the subspaces.=⇒ best transmission rate for given error correctioncapability

Find good packings in Gq(k, n) with algebraic structure.=⇒ good encoding/decoding algorithms

Typical tools

linearized polynomials in Fq[x]

Singer cycles, difference sets

(partial) spreads

16 / 22

Network Coding

Subspace Codes

Research goals

Find good packings in Gq(k, n) with a given maximalintersection of the subspaces.=⇒ best transmission rate for given error correctioncapability

Find good packings in Gq(k, n) with algebraic structure.=⇒ good encoding/decoding algorithms

Typical tools

linearized polynomials in Fq[x]

Singer cycles, difference sets

(partial) spreads

16 / 22

Network Coding

The Schubert Calculus Point of View

2 Channel Coding

Network Coding

Schubert Calculus

A part of algebraic geometry introduced by Hermann Schubertin the 19th century, dealing with various counting problems inprojective geometry. In particular, the intersection number ofSchubert varieties is studied.

Hermann Schubert (1848 - 1911)2

2picture from Wikipedia: Hermann Schubert

17 / 22

Network Coding

Hilbert’s Problem #15

To establish rigorously and with an exact determination of thelimits of their validity those geometrical numbers whichSchubert especially has determined on the basis of the so-calledprinciple of special position, or conservation of number, bymeans of the enumerative calculus developed by him.

David Hilbert (1862 - 1943)3

3picture from Wikipedia: David Hilbert

18 / 22

Network Coding

Classical Schubert calculus is defined over C=⇒ many problems are solved

Schubert calculus over R has been studied since 1990s=⇒ some problems are solved

Schubert calculus over Fq is mostly unknown

Theorem

The balls w.r.t. the subspace metric in the GrasmmanianGq(k, n) are Schubert varieties.

=⇒ Packing and decoding problems in Gq(k, n) are instancesof Schubert calculus over finite fields.

19 / 22

Network Coding

Classical Schubert calculus is defined over C=⇒ many problems are solved

Schubert calculus over R has been studied since 1990s=⇒ some problems are solved

Schubert calculus over Fq is mostly unknown

Theorem

The balls w.r.t. the subspace metric in the GrasmmanianGq(k, n) are Schubert varieties.

=⇒ Packing and decoding problems in Gq(k, n) are instancesof Schubert calculus over finite fields.

19 / 22

Network Coding

Overview of some results4

The balls w.r.t. the subspace metric in Gq(k, n) are linearvarieties in the Plucker embedding of Gq(k, n).

We can describe a known family of constant dimensioncodes as linear varieties in the Plucker embedding.

Decoding can be done by solving a system of linear andbilinear equations over Fq.=⇒ polynomial time algorithm for certain parameters

4Rosenthal and Trautmann, Decoding of Subspace Codes, a Problem of Schubert Calculus

over Finite Fields, CreateSpace, 2012Rosenthal, Silberstein and Trautmann, On the geometry of balls in the Grassmannian and listdecoding of lifted Gabidulin codes, Springer, 2013Chumbalov (supervised by Horlemann-Trautmann), Improving the Performance of SubspaceDecoding in the Plucker Embedding, Master thesis EPFL, 2016

20 / 22

Conclusions and Related Work

2 Channel Coding

Conclusions

Information theory deals with various aspects ofinformation, such as channel coding, compression andsecurity.

In channel coding we handle noisy data.=⇒ Difference to statistical noise filtering: we designcodes beforehand for better decoding performance.

Different channel models require different metric spaces forcode design.

In random network coding this space is the Grassmannvariety Gq(k, n) with the subspace distance.

Many coding theoretic problems in Gq(k, n) are instances ofSchubert calculus over finite fields.=⇒ Use algebraic tools to tackle these problems.

21 / 22

Conclusions

21 / 22

Conclusions

21 / 22

Related Work

Network coding is related to

distributed storage (big data centers, cloud storage)index coding (communication with side information)

Random linear network coding is useful for

robustness w.r.t. changes and failures in the networkresilience against eavesdropping or data corruption

Subspace codes can be used for secure storage of biometrics

Thank you for your attention!

22 / 22

Related Work

Network coding is related to

distributed storage (big data centers, cloud storage)index coding (communication with side information)

Random linear network coding is useful for

robustness w.r.t. changes and failures in the networkresilience against eavesdropping or data corruption

Subspace codes can be used for secure storage of biometrics

Thank you for your attention!

22 / 22

coding data for networks - uzhuser.math.uzh.ch/trautmann/network_data_coding2.pdfcoding data for...

Documents

the multcomp r add-on package -...

curriculum vitae -...

uzhuser.math.uzh.ch/rosenthal/masterthesis/08710345/riedtmann_2013… ·...

análisis de herramientas de gestión de voip -...

shan-000 - queen's universitymath474/shannon59.pdfcoding...

coding of still pictures - university of california,...

networks networks 1 1 integrated smart networks

networks and the shortest path problem. physical networks ...

(agentle) introductiontostatistics -...

coding central venous access...

tm - quarbzinternetworks, legacy networks, advanced routing...

big data science -...

statistical analysis of networks - networks,...

cell, vol. 93, 47–60, april 3, 1998, copyright 1998 by ......

ieee transactions on circuits and systems for...

coding guidelines...

network models. 2 networks physical networks road networks ...

coding endoscopic sinus surgery - american health...

note sur les opÉrations de grothendieck et la ... -...

coding for respiratory...