coding data for networks - uzhuser.math.uzh.ch/trautmann/network_data_coding2.pdfcoding data for...

Post on 09-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Coding Data for Networks

Coding Data for Networks

Anna-Lena Horlemann-Trautmann

Algorithmics Laboratory, EPFL, Schweiz

September 27th, 2016University of St. Gallen

Coding Data for Networks

Information theory (100 years Shannon)

– 2016 –

100th anniversary of the Father of Information Theory

Claude Shannon (1916 - 2001)1

1picture from www.techzibits.com

1 / 22

Coding Data for Networks

Information theory (100 years Shannon)

Shannon’s pioneering works in information theory:

Channel coding (1948):

Noisy-channel coding theorem/Shannon capacity (maximuminformation transfer rate for a given channel and noise level)

Compression (1948):

Source coding theorem (limits to possible data compression)

Cryptography (1949):

One-time pad is the only theoretically unbreakable cipher

2 / 22

Coding Data for Networks

Information theory (100 years Shannon)

Shannon provided answers to questions of the type

“What is possible in theory?”

Subsequent research:

how to algorithmically achieve those optimal scenarios

other types of channels

lossy compression

computationally secure cryptography

3 / 22

Coding Data for Networks

Channel Coding

1 Information theory (100 years Shannon)

2 Channel Coding

3 Network CodingSubspace CodesThe Schubert Calculus Point of View

4 Conclusions and Related Work

Coding Data for Networks

Channel Coding

Channel Coding

... deals with noisy transmission of information

over space (communication)

over time (storage)

To deal with the noise

the data is encoded with added redundancy,

the receiver can “filter out” the noise (decoding)

and then recover the sent data.

4 / 22

Coding Data for Networks

Channel Coding

Channel Coding

... deals with noisy transmission of information

over space (communication)

over time (storage)

To deal with the noise

the data is encoded with added redundancy,

the receiver can “filter out” the noise (decoding)

and then recover the sent data.

4 / 22

Coding Data for Networks

Channel Coding

Why is the channel noisy?

badly insulated cable

interference of electronic devices

fading in wireless signals

scratches on CD

charge leakage in flash drive

power outage in data center

and many more...

5 / 22

Coding Data for Networks

Channel Coding

Different channel models:

binary symmetric channel (symbols from F2, additiveerrors)

0

1 1

0

1-p

p

1-p

binary erasure channel (symbols from F2, erasures)

fading channels (symbols from C, multiplicative fading andadditive errors)

network channels (symbols from Fqm , network operationsand additive errors)

and many more...

6 / 22

Coding Data for Networks

Channel Coding

Different channel models:

binary symmetric channel (symbols from F2, additiveerrors)

0

1 1

0

1-p

p

1-p

binary erasure channel (symbols from F2, erasures)

fading channels (symbols from C, multiplicative fading andadditive errors)

network channels (symbols from Fqm , network operationsand additive errors)

and many more...

6 / 22

Coding Data for Networks

Channel Coding

Example (Binary symmetric channel, p < 1/2)

We encode 0 as 000 and 1 as 111 (repetition code). If one errorhappens we can uniquely decide which was the sent codeword.E.g., if we receive r = 110 we decode to 111, since

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Coding Data for Networks

Channel Coding

Example (Binary symmetric channel, p < 1/2)

We encode 0 as 000 and 1 as 111 (repetition code). If one errorhappens we can uniquely decide which was the sent codeword.E.g., if we receive r = 110 we decode to 111, since

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.

With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Coding Data for Networks

Channel Coding

Example (Binary symmetric channel, p < 1/2)

We encode 0 as 000 and 1 as 111 (repetition code). If one errorhappens we can uniquely decide which was the sent codeword.E.g., if we receive r = 110 we decode to 111, since

P (r = 110 | c = 111) = p(1− p)2

> P (r = 110 | c = 000) = p2(1− p).

But if two errors happened we will decode to the wrongcodeword. Hence the error-correction capability of the code is 1error.With this code we need to send three bits to transmit one bit ofinformation. Thus the transmission rate is 1/3.

7 / 22

Coding Data for Networks

Channel Coding

General setup:

codewords are vectors of length n over a finite field Fq

received word = codeword + error vector (r = c+ e ∈ Fnq )

most likely sent codeword ∼= closest codeword w.r.t.Hamming distance:

dH((u1, . . . , un), (v1, . . . , vn)) := |{i | ui = vi}|.

Definition

A block code is a subset C ⊆ Fnq . The minimum (Hamming)

distance of the code is defined as

dH(C) := min{dH(u, v) | u, v ∈ C, u = v}.

error-correction capability of C is (dH(C)− 1)/2

transmission rate of C is logq(|C|)/n

8 / 22

Coding Data for Networks

Channel Coding

General setup:

codewords are vectors of length n over a finite field Fq

received word = codeword + error vector (r = c+ e ∈ Fnq )

most likely sent codeword ∼= closest codeword w.r.t.Hamming distance:

dH((u1, . . . , un), (v1, . . . , vn)) := |{i | ui = vi}|.

Definition

A block code is a subset C ⊆ Fnq . The minimum (Hamming)

distance of the code is defined as

dH(C) := min{dH(u, v) | u, v ∈ C, u = v}.

error-correction capability of C is (dH(C)− 1)/2

transmission rate of C is logq(|C|)/n8 / 22

Coding Data for Networks

Channel Coding

Typical questions in channel coding theory:

For a given error correction capability, what is the besttransmission rate?=⇒ packing problem in given metric space (Fn

q )

How can one efficiently encode, decode, recover themessages?=⇒ algebraic structure in the code

What is the trade-off between the two above?

Typical tools to answer these questions:

linear subspaces of Fnq

polynomials in Fq[x]

projective geometry

9 / 22

Coding Data for Networks

Channel Coding

Typical questions in channel coding theory:

For a given error correction capability, what is the besttransmission rate?=⇒ packing problem in given metric space (Fn

q )

How can one efficiently encode, decode, recover themessages?=⇒ algebraic structure in the code

What is the trade-off between the two above?

Typical tools to answer these questions:

linear subspaces of Fnq

polynomials in Fq[x]

projective geometry

9 / 22

Coding Data for Networks

Network Coding

Subspace Codes

1 Information theory (100 years Shannon)

2 Channel Coding

3 Network CodingSubspace CodesThe Schubert Calculus Point of View

4 Conclusions and Related Work

Coding Data for Networks

Network Coding

Subspace Codes

Multicast Model

All receivers want to receive the same information.

10 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Example (Butterfly Network)

Linearly combining is better than forwarding:

R2

R1

a

a

a

aa

b b

b

a

R1 receives only a, R2 receives a and b.

Forwarding: need 2 transmissions to transmit a, b to bothreceivers

Linearly combining: need 1 transmission to transmit a, b toboth receivers

11 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Example (Butterfly Network)

Linearly combining is better than forwarding:

R2

R1

a

a

a

a+ba+b

b b

b

a+b

R1 and R2 can both recover a and b with one operation.

Forwarding: need 2 transmissions to transmit a, b to bothreceivers

Linearly combining: need 1 transmission to transmit a, b toboth receivers

11 / 22

Coding Data for Networks

Network Coding

Subspace Codes

It turns out that linear combinations at the inner nodes are“sufficient” to reach capacity:

Theorem

One can reach the capacity of a single-source multicast networkchannel with linear combinations at the inner nodes.

When we consider large or time-varying networks, we allow theinner nodes to transmit random linear combinations of theirincoming vectors.

Theorem

One can reach the capacity of a single-source multicast networkchannel with random linear combinations at the inner nodes,provided that the field size is large.

12 / 22

Coding Data for Networks

Network Coding

Subspace Codes

It turns out that linear combinations at the inner nodes are“sufficient” to reach capacity:

Theorem

One can reach the capacity of a single-source multicast networkchannel with linear combinations at the inner nodes.

When we consider large or time-varying networks, we allow theinner nodes to transmit random linear combinations of theirincoming vectors.

Theorem

One can reach the capacity of a single-source multicast networkchannel with random linear combinations at the inner nodes,provided that the field size is large.

12 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Problem 1: errors propagate!

bb

+ea

c

Problem 2: receiver does not know the random operations

Solution: Use a metric space such that

1 # of errors is reflected in the distance between points, and

2 the points are invariant under linear combinations.

13 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Problem 1: errors propagate!

bb

+ea

c

Problem 2: receiver does not know the random operations

Solution: Use a metric space such that

1 # of errors is reflected in the distance between points, and

2 the points are invariant under linear combinations.

13 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Problem 1: errors propagate!

bb

+ea

c

Problem 2: receiver does not know the random operations

Solution: Use a metric space such that

1 # of errors is reflected in the distance between points, and

2 the points are invariant under linear combinations.

13 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Problem 1: errors propagate!

bb

+ea

c

Problem 2: receiver does not know the random operations

Solution: Use a metric space such that

1 # of errors is reflected in the distance between points, and

2 the points are invariant under linear combinations.

13 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Definition

Grassmann variety: Gq(k, n) := {U ≤ Fnq | dim(U) = k}

subspace distance: dS(U, V ) := 2k − 2 dim(U ∩ V )

Gq(k, n) equipped with dS is a metric space.

Definition

A (constant dimension) subspace code is a subset of Gq(k, n).The minimum distance of the code C ⊆ Gq(k, n) is defined as

dS(C) := min{dS(U, V ) | U, V ∈ C,U = V }.

The error-correction capability in the network coding setting ofa subspace code C is (dS(C)− 1)/2.

14 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Example (in G2(2, 4))

C =

{rs

(1 0 0 00 1 0 0

), rs

(1 0 1 00 1 0 1

)}, dS(C) = 4.

1000

0100

1000

0100

1100

10001100

11000100

No errors: receive a (different) basis of the same vector space

15 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Example (in G2(2, 4))

C =

{rs

(1 0 0 00 1 0 0

), rs

(1 0 1 00 1 0 1

)}, dS(C) = 4.

1000

0100

1001

0100

1101

10011101

11010100

+0001

One error: dS(received, sent) = 2, dS(received, other) = 4

15 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Research goals

Find good packings in Gq(k, n) with a given maximalintersection of the subspaces.=⇒ best transmission rate for given error correctioncapability

Find good packings in Gq(k, n) with algebraic structure.=⇒ good encoding/decoding algorithms

Typical tools

linearized polynomials in Fq[x]

Singer cycles, difference sets

(partial) spreads

16 / 22

Coding Data for Networks

Network Coding

Subspace Codes

Research goals

Find good packings in Gq(k, n) with a given maximalintersection of the subspaces.=⇒ best transmission rate for given error correctioncapability

Find good packings in Gq(k, n) with algebraic structure.=⇒ good encoding/decoding algorithms

Typical tools

linearized polynomials in Fq[x]

Singer cycles, difference sets

(partial) spreads

16 / 22

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

1 Information theory (100 years Shannon)

2 Channel Coding

3 Network CodingSubspace CodesThe Schubert Calculus Point of View

4 Conclusions and Related Work

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

Schubert Calculus

A part of algebraic geometry introduced by Hermann Schubertin the 19th century, dealing with various counting problems inprojective geometry. In particular, the intersection number ofSchubert varieties is studied.

Hermann Schubert (1848 - 1911)2

2picture from Wikipedia: Hermann Schubert

17 / 22

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

Hilbert’s Problem #15

To establish rigorously and with an exact determination of thelimits of their validity those geometrical numbers whichSchubert especially has determined on the basis of the so-calledprinciple of special position, or conservation of number, bymeans of the enumerative calculus developed by him.

David Hilbert (1862 - 1943)3

3picture from Wikipedia: David Hilbert

18 / 22

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

Classical Schubert calculus is defined over C=⇒ many problems are solved

Schubert calculus over R has been studied since 1990s=⇒ some problems are solved

Schubert calculus over Fq is mostly unknown

Theorem

The balls w.r.t. the subspace metric in the GrasmmanianGq(k, n) are Schubert varieties.

=⇒ Packing and decoding problems in Gq(k, n) are instancesof Schubert calculus over finite fields.

19 / 22

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

Classical Schubert calculus is defined over C=⇒ many problems are solved

Schubert calculus over R has been studied since 1990s=⇒ some problems are solved

Schubert calculus over Fq is mostly unknown

Theorem

The balls w.r.t. the subspace metric in the GrasmmanianGq(k, n) are Schubert varieties.

=⇒ Packing and decoding problems in Gq(k, n) are instancesof Schubert calculus over finite fields.

19 / 22

Coding Data for Networks

Network Coding

The Schubert Calculus Point of View

Overview of some results4

The balls w.r.t. the subspace metric in Gq(k, n) are linearvarieties in the Plucker embedding of Gq(k, n).

We can describe a known family of constant dimensioncodes as linear varieties in the Plucker embedding.

Decoding can be done by solving a system of linear andbilinear equations over Fq.=⇒ polynomial time algorithm for certain parameters

4Rosenthal and Trautmann, Decoding of Subspace Codes, a Problem of Schubert Calculus

over Finite Fields, CreateSpace, 2012Rosenthal, Silberstein and Trautmann, On the geometry of balls in the Grassmannian and listdecoding of lifted Gabidulin codes, Springer, 2013Chumbalov (supervised by Horlemann-Trautmann), Improving the Performance of SubspaceDecoding in the Plucker Embedding, Master thesis EPFL, 2016

20 / 22

Coding Data for Networks

Conclusions and Related Work

1 Information theory (100 years Shannon)

2 Channel Coding

3 Network CodingSubspace CodesThe Schubert Calculus Point of View

4 Conclusions and Related Work

Coding Data for Networks

Conclusions and Related Work

Conclusions

Information theory deals with various aspects ofinformation, such as channel coding, compression andsecurity.

In channel coding we handle noisy data.=⇒ Difference to statistical noise filtering: we designcodes beforehand for better decoding performance.

Different channel models require different metric spaces forcode design.

In random network coding this space is the Grassmannvariety Gq(k, n) with the subspace distance.

Many coding theoretic problems in Gq(k, n) are instances ofSchubert calculus over finite fields.=⇒ Use algebraic tools to tackle these problems.

21 / 22

Coding Data for Networks

Conclusions and Related Work

Conclusions

Information theory deals with various aspects ofinformation, such as channel coding, compression andsecurity.

In channel coding we handle noisy data.=⇒ Difference to statistical noise filtering: we designcodes beforehand for better decoding performance.

Different channel models require different metric spaces forcode design.

In random network coding this space is the Grassmannvariety Gq(k, n) with the subspace distance.

Many coding theoretic problems in Gq(k, n) are instances ofSchubert calculus over finite fields.=⇒ Use algebraic tools to tackle these problems.

21 / 22

Coding Data for Networks

Conclusions and Related Work

Conclusions

Information theory deals with various aspects ofinformation, such as channel coding, compression andsecurity.

In channel coding we handle noisy data.=⇒ Difference to statistical noise filtering: we designcodes beforehand for better decoding performance.

Different channel models require different metric spaces forcode design.

In random network coding this space is the Grassmannvariety Gq(k, n) with the subspace distance.

Many coding theoretic problems in Gq(k, n) are instances ofSchubert calculus over finite fields.=⇒ Use algebraic tools to tackle these problems.

21 / 22

Coding Data for Networks

Conclusions and Related Work

Related Work

Network coding is related to

distributed storage (big data centers, cloud storage)index coding (communication with side information)

Random linear network coding is useful for

robustness w.r.t. changes and failures in the networkresilience against eavesdropping or data corruption

Subspace codes can be used for secure storage of biometrics

Thank you for your attention!

22 / 22

Coding Data for Networks

Conclusions and Related Work

Related Work

Network coding is related to

distributed storage (big data centers, cloud storage)index coding (communication with side information)

Random linear network coding is useful for

robustness w.r.t. changes and failures in the networkresilience against eavesdropping or data corruption

Subspace codes can be used for secure storage of biometrics

Thank you for your attention!

22 / 22

top related