theory and practice of codes with locality

Theory and Practice of Codes with LocalityIntroduction

July 10, 2016

Theory and Practice of Codes with Locality July 10, 2016 1 / 22

Information Era

• We live in Information Era

• Big Data players: Facebook, Google, MSFT, Amazon, Alibaba, Dropbox, etc.

• Node failures are the norm

Cluster of machines running Hadoop at Yahoo!

Information Era

State-of-the-Art Coding technique

RAID: Redundant Array of Independent Disks

RAID 1 – Replication (typically 3x)• Simple implementation!• High availability of the information• Can tolerate any 2 disk failures• Widely used in Hadoop and many other systems• Storage overhead of 200%!!!!!

RAID 6• MDS codes with two parities

[n, k] MDS codes• Can tolerate any n− k disk failures• FB uses (14, 10) RS codes• Poor handling of single disk failures (The Repair Problem)

RAID 1 – Replication (typically 3x)

• Simple implementation!• High availability of the information• Can tolerate any 2 disk failures• Widely used in Hadoop and many other systems• Storage overhead of 200%!!!!!

RAID 6

• MDS codes with two parities

[n, k] MDS codes

• Can tolerate any n− k disk failures• FB uses (14, 10) RS codes• Poor handling of single disk failures (The Repair Problem)

[n, k] MDS codes• Can tolerate any n− k disk failures

• FB uses (14, 10) RS codes• Poor handling of single disk failures (The Repair Problem)

[n, k] MDS codes• Can tolerate any n− k disk failures• FB uses (14, 10) RS codes

• Poor handling of single disk failures (The Repair Problem)

Limitations of Reed-Solomon codes

Example: [14, 10] RS code

• A disk is lost - Repair job starts to repair it

• Transmit information from 10 disks to recover one lost disk

• Generates 10x more traffic compared to replication for recovery of one disk

• If a large portion the data is RS-coded =⇒ saturation of the network

• Goal: Construct efficient codes with ”good“ repair process

1 2 3 4 5 6 7 9 108

(14, 10) RS code running at FB

1 2 3 4 5 6 7 9 108 P1 P3 P4P2

How “good” is the repair process?

• Total bandwidth during the repair

• Regenerating Codes - Seminal paper by Dimakis, Godfrey, Wu, Wainwright andRamchandran

• Model allows connecting “many” nodes

• Total number of disks participating in the repair

• The struggler

• Locally Recoverable (LRC) Codes

• The struggler

Codes with locality: Plan

• LRC codes - Basics (Tamo)

• LRC codes - Constructions (Barg)

• LRC codes in practice (Barg)

• Break ,

• LRC codes - Bounds (Barg)

• Maximally Recoverable Codes (Tamo)

• LRC codes on graphs (Tamo)

• Break ,

Locally Recoverable Codes - Definition

(n, k, r) LRC Code

• Takes k blocks (symbols)→ produces n blocks

• An erasure has occurred

• Any symbol i has a recovery set Ri of r other symbols, r � k

• Clearly 1 ≤ r ≤ k

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k

k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

(n, k, r) LRC Code

1 k − 1 k k + 1 k + 2 n

Early references on LRC codes

• J. Han and L. Lastras-Montano, ISIT 2007;

• C. Huang, M. Chen, and J. Li, Symp. Networks App. 2007;

• F. Oggier and A. Datta 2013 - Survey on codes for distributed storage systems;

• P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, IEEE Trans. Inf. Theory,Nov. 2012.

Parameters of LRC codes

Let C be an (n, k, r) LRC code

• Assume r|k and r + 1|n

• Rate?

• Minimum distance?

• Rate?

• Minimum distance?

• The rate is bounded bykn≤ r

r + 1.

Proof:

• There exist at most nrr+1 coordinates that determine the exact codeword

• This follows since iteratively:

1. Cost: expose the values of the coordinates in a recovery set Ri, |Ri| ≤ r

2. Free: the value of the i-th coordinate

3. By exposing at most nrr+1 coordinates, the exact codeword is determined

r + 1.

Proof:

r + 1.

Proof:

r + 1.

Proof:

r + 1.

Proof:

r + 1.

Proof:

r + 1.

• The bound is tight (even over F2)

• Partition the k bits into k/r sets of size r

• Add parity check bit to each set

r + 1.

• The minimum distance is bounded by

d ≤ n− k −⌈

⌉+ 2

[Gopalan, Huang, Simitci, Yekhanin 12][Papailiopoulos, Dimakis 12]

d ≤ n− k −⌈

⌉+ 2

Remarks:

• Smaller locality =⇒ lower failure resilience

• Generalization of the Singleton bound (r = k)

• Optimal (n, k, r) LRC code achieves the bound with equality

d ≤ n− k −⌈

⌉+ 2

Remarks:

d ≤ n− k −⌈

⌉+ 2

Remarks:

d ≤ n− k −⌈

⌉+ 2

Remarks:

Constructing optimal LRC codes

• A carefully constructed random generating matrix gives an optimal LRC

1. Non explicit /

2. Field size is superpolynomial in the length / (is it truly necessary?)

• Optimal ((r + 1)d kr e, k, r) LRC code [Prasanth, Kamath, Lalitha, and Kumar 12]

• Explicit constructions [Rawat, Koyluoglu, Silberstein, Vishwanath 14, Gopalan, Huang,Jenkins, Yekhanin 14, Tamo, Papailiopoulos, Dimakis 14]

1. Any n, k, r

2. Field size is superpolynomial

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

1. Non explicit /

1. Any n, k, r

Optimal LRC codes - Easy cases

• r = k

1. d ≤ n− k + 1

2. An (n, k) RS is an (n, k, k) optimal LRC code

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

2. Duplication of an (n/2, k) RS is an (n, k, 1) optimal LRC code

3. |F| = O(n)

• Q: What happens for 1 < r < k?

• Q: Generalize the optimal codes for r = 1, k to codes with arbitrary r?

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

• r = k

1. d ≤ n− k + 1

3. |F| = O(n)

• r = 1

1. d ≤ 2( n2 − k + 1)

3. |F| = O(n)

Intuition behind the LRC construction

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3 → f (x) = (f (x1), f (x2), ..., f (xn))Only two points suffice to recover the lost point

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)

→ f (x) = a0 + a1x + a2x2 + a3x3 → f (x) = (f (x1), f (x2), ..., f (xn))Only two points suffice to recover the lost point

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

→ f (x) = (f (x1), f (x2), ..., f (xn))Only two points suffice to recover the lost point

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3 → f (x) = (f (x1), f (x2), ..., f (xn))

Only two points suffice to recover the lost point

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

(a0, a1, a2, a3)→ f (x) = a0 + a1x + a2x2 + a3x3 → f (x) = (f (x1), f (x2), ..., f (xn))Only two points suffice to recover the lost point

f (x1)

f (x2)

f (x3)

f (x4)

f (x5)

f (xn)

Optimal (n, k, r) LRC code construction

Ingredients:

1. R1, ...,R nr+1

are disjoint subsets of the field F, s.t. |Ri| = r + 1

2. g(x) ∈ F[x] is a polynomial s.t.

2.1 deg (g(x)) = r + 1

2.2 g(x) is constant on each subset Ri: g(α) = g(β) for α, β ∈ Ri

Encoding: Given k information symbols ai,j, i = 0, ..., r − 1, j = 0, ..., kr − 1

1. Define the polynomial

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

2. Store the length−n vector (f (α) : α ∈ ∪iRi)

Theorem: This is an optimal (n, k, r) LRC code over F

T. and Alexander Barg, “A Family of Optimal Locally Recoverable Codes”, IEEE Trans. on Information Theory

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

Encoding:

Given k information symbols ai,j, i = 0, ..., r − 1, j = 0, ..., kr − 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Ingredients:

1. R1, ...,R nr+1

2.1 deg (g(x)) = r + 1

f (x) =r−1∑i=0

kr−1∑j=0

ai,jg(x)j

Theorem: This is an optimal (n, k, r) LRC code over FT. and Alexander Barg, “A Family of Optimal Locally Recoverable Codes”, IEEE Trans. on Information Theory

Optimal (n, k, r) LRC code construction - Cont’d

• g(x) is constant on the sets Ri, |Ri| = r + 1

• f (x) =∑r−1

i=0 xi∑ kr−1j=0 ai,jg(x)j

=∑r−1

i=0 xifi(x)

Locality: Recover f (α) =? for α ∈ R1

1. Define fi(x) =∑ k

r−1j=0 ai,jg(x)j

2. fi(x) is constant on the sets Rj

3. Define δ(x) =∑r−1

i=0 xifi(R1)

Claim:

f (β) = δ(β) for all β ∈ R1 (In particular f (α) = δ(α))

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

2. r points on δ(x) will suffice to recover δ(x)

3. Read the r values {δ(β) = f (β) : β ∈ R1\α}

• f (x) =∑r−1

=∑r−1

i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Claim:

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

=∑r−1

i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Claim:

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

=∑r−1

i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Claim: f (β) = δ(β) for all β ∈ R1 (In particular f (α) = δ(α))

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

=∑r−1

i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

i=0 xi∑ kr−1j=0 ai,jg(x)j =

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Claim:

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof: f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof:

f (β) =∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

Proof: f (β)

=∑r−1

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β)

=∑r−1

i=0 βifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1)

= δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

∑r−1i=0 xifi(x)

r−1j=0 ai,jg(x)j

i=0 xifi(R1)

i=0 βifi(β) =

∑r−1i=0 β

ifi(R1) = δ(β)

1. deg(δ(x)) ≤ r − 1

• f (x) =∑r−1

Minimum Distance:

1. Minimum distance = Minimum weight

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1) = k + k

r − 2

3. =⇒ d ≥ n− (k + kr − 2)

4. Equality follows since the code is an (n, k, r) LRC code

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1)

= k + kr − 2

3. =⇒ d ≥ n− (k + kr − 2)

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1)

= k + kr − 2

3. =⇒ d ≥ n− (k + kr − 2)

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1)

= k + kr − 2

3. =⇒ d ≥ n− (k + kr − 2)

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1) = k + k

r − 2

3. =⇒ d ≥ n− (k + kr − 2)

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1) = k + k

r − 2

3. =⇒ d ≥ n− (k + kr − 2)

• f (x) =∑r−1

Minimum Distance:

2. deg(f (x)) ≤ r − 1 + (r + 1)( kr − 1) = k + k

r − 2

3. =⇒ d ≥ n− (k + kr − 2)

Example: (9, 4, 2) LRC over F13

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• The polynomial g(x) = x3 is constant on each set Ri

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)

→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Store the evaluation of f (x) at ∪iRi

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)

→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)

→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)

→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j

= 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

𝑓 3 = 𝛿 3 = 8

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

𝑓 3 = 𝛿 3 = 8 𝑓 9 = 𝛿 9 = 7

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

• 𝛿 𝑥 = 2𝑥 + 2

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

𝑓 3 = 𝛿 3 = 8 𝑓 9 = 𝛿 9 = 7

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

• 𝛿 𝑥 = 2𝑥 + 2

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

𝑓 3 = 𝛿 3 = 8 𝑓 9 = 𝛿 9 = 7

𝑓 1 = 𝛿 1 = 4

• R1 = {1, 3, 9},R2 = {2, 6, 5},R3 = {4, 12, 10}

• (a0,0, a0,1, a1,0, a1,1) = (1, 1, 1, 1)→ f (x) =∑1

i=0 xi∑1j=0 ai,j(x3)j = 1 + x + x3 + x4

• Local Recovery:

• deg 𝛿 𝑥 ≤ 1

• 𝛿 𝑥 = 2𝑥 + 2

𝑓 1 , 𝑓 3 , 𝑓 9 , 𝑓 2 , 𝑓 6 , 𝑓 5 , 𝑓 4 , 𝑓 12 , 𝑓 10 = (4,8,7,1,11,2,0,0,0)

𝑓 1 =?

𝑓 3 = 𝛿 3 = 8 𝑓 9 = 𝛿 9 = 7

𝑓 1 = 𝛿 1 = 4

Ingredients:

1. R1, ...,R nr+1

2. A polynomial g(x) of degree r + 1

3. g(x) is constant on each subset Ri: g(α) = g(β) for α, β ∈ Ri

Claim: Let H be a subgroup of F∗ or F+, then the annihilator polynomial of H

g(x) =∏h∈H

(x− h)

is constant on each coset of H

Ingredients:

1. R1, ...,R nr+1

g(x) =∏h∈H

(x− h)

Ingredients:

1. R1, ...,R nr+1

g(x) =∏h∈H

(x− h)

Ingredients:

1. R1, ...,R nr+1

Claim:

Let H be a subgroup of F∗ or F+, then the annihilator polynomial of H

g(x) =∏h∈H

(x− h)

Ingredients:

1. R1, ...,R nr+1

g(x) =∏h∈H

(x− h)

Availability problem

“Hot data” accessed simultaneously by a very large number of users

Availability problem - Cont’d

• Main advantage of replication - High availability for hot data

• Goal: A code with high availability and small overhead

• Solution: LRC code with multiple disjoint recovery sets (codes with availability)

Idea behind constructing codes with multiple recovery sets

Subspace of polynomials that provide locality 𝑟1

Ex: The space of polynomials {𝑎0 + 𝑎1𝑥 + 𝑎3𝑥3 + 𝑎4𝑥

4: 𝑎𝑖 ∈ 𝐹13}

Subspace of polynomials that

provide locality 𝑟2

4: 𝑎𝑖 ∈ 𝐹13}

Subspace of polynomials that

provide locality 𝑟2

4: 𝑎𝑖 ∈ 𝐹13}

LRC code with 2 recovery sets - Example

• Example: (12, 6, {2, 3}) over F13

1. Encoding polynomial:

(a0, a1, a6, a4, a9, a10) 7→ f (x) = a0 + a1x + a4x4 + a6x6 + a9x9 + a10x10

2. Evaluated points: F∗13

• Example: (12, 6, {2, 3}) over F13

• 18,6, 1,1 3𝑥 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 12,6, {2,3} 𝐿𝑅𝐶

• Example: (12, 6, {2, 3}) over F13

• Overhead of 200%

• Example: (12, 6, {2, 3}) over F13

• Overhead of 200% • Overhead of 100%

• Example: (12, 6, {2, 3}) over F13

• Overhead of 200%• Can tolerate any 2 disk failures

• Overhead of 100%

• Example: (12, 6, {2, 3}) over F13

• Overhead of 200%• Can tolerate any 2 disk failures• Recovery sets

{ , , }

• Example: (12, 6, {2, 3}) over F13

• Overhead of 200%• Can tolerate any 2 disk failures• Recovery sets

• Overhead of 100%• Can tolerate any 3 disk failures• Recovery sets { , , } { , , }

• Break ,

Constructions of LRC codes

Constructions of LRC codes 1 / 25

From RS codes to other related families

In this part our goal is to construct new families of LRC codes

Advantages of the LRC RS construction

• small alphabet q « n

• large distance

• easy processing

Limitations

• n ď q

Fixed q, large n?Possibilities: Reduce q or Increase n

• Take a subfield subcode of an LRC RS code (e.g., a binary code)

• The analysis is simplified if we take a cyclic LRC RS code

• Take a large subset of sampling points (points on an algebraic curve)

• large distance

• easy processing

Limitations

• n ď q

• large distance

• easy processing

Limitations

• n ď q

• large distance

• easy processing

Limitations

• n ď q

Fixed q, large n?

Possibilities: Reduce q or Increase n• Take a subfield subcode of an LRC RS code (e.g., a binary code)

• large distance

• easy processing

Limitations

• n ď q

• large distance

• easy processing

Limitations

• n ď q

Cyclic LRC codesCyclic codes form a classic topic in coding theory: BCH codes, RM codes, many other well-studied codefamilies are cyclic.

Cyclic q-ary LRC codes

Recall the LRC RS construction:Data: subset of points P “ pP1, . . . ,Pnq Ă Fq; linear space of polynomialsV “ xxibpxqjy, dim V “ k, bpxq constant on pr ` 1q-subblocks of the set P;

V Ñ C

fa ÞÑ evPpfaq “ pfapP1q, . . . , fapPnqq

Additional assumptions: n|pq ´ 1q; pr ` 1q|n; r|k

P “ t1, α, . . . , αn´1u; V “

kr pr`1q´2ÿ

i“0i‰r modpr`1q

aixi, ai P Fq

(α - nth root of unity)

We obtain a cyclic code C:

pc0, c1, c2, . . . , cn´1q P C ñ pcn´1, c0, c1, . . . , cn´2q P C

V Ñ C

P “ t1, α, . . . , αn´1u; V “

kr pr`1q´2ÿ

i“0i‰r modpr`1q

aixi, ai P Fq

V Ñ C

P “ t1, α, . . . , αn´1u; V “

kr pr`1q´2ÿ

i“0i‰r modpr`1q

aixi, ai P Fq

Zeros of a cyclic LRC code

Let C be a q-ary cyclic code

pco, c1, . . . , cn´1q P C Ñ cpxq “

n´1ÿ

C is an ideal in the ring Fqrxs{pxn ´ 1q; C “ xgpxqy, where gpxq is the generator polynomialof C

Definition: Zeros of the code C “ zeros of gpxq

Generator matrix Parity-check matrix

1 1 1 . . . 11 α α2 . . . αn´1

......

...1 αk´1 α2pk´1q . . . αpk´1qpn´1q

1 α α2 . . . αn´1

1 α2 α2¨2 . . . α2pn´1q

......

...1 αn´k α2pn´kq . . . αpn´kqpn´1q

C is a cyclic code with zeros α, α2, . . . , αn´k

Zeros of a cyclic LRC code

Let C be a q-ary cyclic code

pco, c1, . . . , cn´1q P C Ñ cpxq “

n´1ÿ

C is an ideal in the ring Fqrxs{pxn ´ 1q; C “ xgpxqy, where gpxq is the generator polynomialof C

Definition: Zeros of the code C “ zeros of gpxq

Generator matrix Parity-check matrix

1 1 1 . . . 11 α α2 . . . αn´1

......

...1 αk´1 α2pk´1q . . . αpk´1qpn´1q

1 α α2 . . . αn´1

1 α2 α2¨2 . . . α2pn´1q

......

...1 αn´k α2pn´kq . . . αpn´kqpn´1q

C is a cyclic code with zeros α, α2, . . . , αn´k

Cyclic codes: Example

• RS code C of length n “ 15, k “ 8, d “ 8, q “ 24

Zeros of C: α, α2, α3, α4, α5, α6, α7

Generator polynomial gpxq “ś7

i“1px ´ αiq, dimpCq “ n ´ degpgq “ 8

BCH bound: dpCq ě number of consecutive 0’s ` 1

• Now suppose that C has zeros

tα, α2, α3, α4, α5, α6, α7u Y tα, α4, α7, α10, α13

The distance is still d “ 8, and we get locality r “ 2Indeed,

fapxq “ a1 ` a2x ` a3x3` a4x4

` a5x6` a6x7

The rows of G are 1, α, α3, α4, α6, α7

k “ 6; d “ 8 “ n ´ kr ` 1

The distance is still d “ 8, and we get locality r “ 2

Indeed,fapxq “ a1 ` a2x ` a3x3

` a4x4` a5x6

` a6x7

k “ 6; d “ 8 “ n ´ kr ` 1

` a5x6` a6x7

k “ 6; d “ 8 “ n ´ kr ` 1

` a5x6` a6x7

k “ 6; d “ 8 “ n ´ kr ` 1

` a5x6` a6x7

k “ 6; d “ 8 “ n ´ kr ` 1

Cyclic LRC codes

• Consider a cyclic code of length n|pq ´ 1q given by evaluations evalpfapxqq of the

polynomials of the form fapxq “

kr pr`1q´2

i“0i‰r modpr`1q

aixi, ai P Fq

• The zeros of the code are a union of two (overlapping) subsets:

subset D gives the distance, |D| “ n ´ kr pr ` 1q ` 1

subset L supports locality, |LzD| “ kr ´ 1

• The code is Singleton-optimal by the BCH bound

The zeros of C are of the form αj,

j P t1, 2, . . . , n ´ kr pr ` 1q ` 1u; tn ´ p k

r ´ 1qpr ` 1q ` 1, n ´ p kr ´ 2qpr ` 1q ` 1, . . . , n ´ ru

Cyclic LRC codes

• Consider a cyclic code of length n|pq ´ 1q given by evaluations evalpfapxqq of the

polynomials of the form fapxq “

kr pr`1q´2

i“0i‰r modpr`1q

aixi, ai P Fq

• The zeros of the code are a union of two (overlapping) subsets:

subset D gives the distance, |D| “ n ´ kr pr ` 1q ` 1

subset L supports locality, |LzD| “ kr ´ 1

• The code is Singleton-optimal by the BCH bound

The zeros of C are of the form αj,

j P t1, 2, . . . , n ´ kr pr ` 1q ` 1u; tn ´ p k

r ´ 1qpr ` 1q ` 1, n ´ p kr ´ 2qpr ` 1q ` 1, . . . , n ´ ru

Main ideas I

Let 0 ď l ď r, ν “ n{pr ` 1q. Consider the ν ˆ n matrix (the zeros from L)

1 αl α2l . . . αpn´1qpr`1q`l

1 αpr`1q`l α2ppr`1q`lq . . . αpn´1qppr`1q`l

......

1 αpν´1qpr`1q`l α2ppν´1qpr`1q`lq . . . αpn´1qppν´1qpr`1q`lq

The row space of H1 contains all the cyclic shifts of the n-dimensional vector of weight r ` 1

v “ p1 0 . . . 0loomoon

αlν 0 . . . 0loomoon

α2lν 0 . . . 0loomoon

. . . αrlν 0 . . . 0loomoon

q, ν “ npr ` 1q

The code C has the parity-check matrix H “ H Y H1, where

• H is formed by the rows in D

• H1 is formed by the rows in LzD

Main ideas I

......

v “ p1 0 . . . 0loomoon

q, ν “ npr ` 1q

Main ideas I

......

v “ p1 0 . . . 0loomoon

q, ν “ npr ` 1q

Main ideas II

To obtain a code over a small field (e.g., Fp) take a subfield subcode of the code C, i.e.,

D “ C X pFpqn

Analysis is based on Delsarte’s theorem: C P Fnq, q “ pm

C ÐÑ CK

pFpqn Tm

D ÐÑ DK “ TmpCKq

(Delsarte ’75; Sidelnikov ’72)

Trace Tm : Fpm Ñ Fp: Tmpaq “ a ` ap ` ¨ ¨ ¨ ` apm´1

Further ideas involve

• Theory of irreducible (minimal) cyclic codes

• Results on upper bounds on the distance of a cyclic code in terms of its zeros

Main ideas IITo obtain a code over a small field (e.g., Fp) take a subfield subcode of the code C, i.e.,

D “ C X pFpqn

C ÐÑ CK

pFpqn Tm

• Since D is cyclic, r “ dK ´ 1• We would like small r, i.e., upper estimates of dK

• This is in contrast to classical coding where one seeks large dK

Main ideas II

To obtain a code over a small field (e.g., Fp) take a subfield subcode of the code C, i.e.,

D “ C X pFpqn

C ÐÑ CK

pFpqn Tm

A sampling of results

In a number of cases it is possible to estimate the locality and the number of recovery setsof codes.

Let α be an nth root of unity, let p2t ´ 1q|n for some t.Let D be an rn, ks binary linear cyclic code whose defining set of zeros contains the groupxα2t´1y. Then the locality of D satisfies

r ď 2t´1´ 1,

and each symbol has ě 2t´1 recovery sets.

For instance, we have the following binary LRC codes:

n k d ZpDq t r ZpDKq dK

35 20 3 t1, 15u 3 r ď 3 t0, 1, 7, 15u 445 33 3 t1u 4 r ď 7 t0, 1, 3, 5, 9, 15, 21u 827 7 6 t1, 9u 2 r “ 1 t0, 3u 263 36 3 t1, 9, 11, 15, 23u 3 r ď 3 t0, 1, 7, 9, 11, 15, 21, 23u 4

Some open questions

• In several examples the bounds on locality (dual distance) give tight results.Is it possible to characterize the cases in which the bounds are tight?

• Find the number of recovery sets per symbol for families of cyclic codes.

Codes on curves

In this part we discuss another approach to constructing long codes over smallalphabets.

RS codes can be viewed as codes on the (affine) line, and can be extended tocodes on algebraic curves.The constructions becomes more technical, and we proceed by example.

LRC codes on curves

Consider the set of pairs px, yq P F9 that satisfy the equation x3 ` x “ y4

α7 ‚ ‚ ‚ ‚

α6 ‚

α5 ‚ ‚ ‚ ‚

α4 ‚ ‚ ‚ ‚

x α3 ‚ ‚ ‚ ‚

α2 ‚

α ‚ ‚ ‚ ‚

1 ‚ ‚ ‚ ‚

0 1 α α2 α3 α4 α5 α6 α7

Affine points of the Hermitian curve X over F9; α2 “ α ` 1

LRC codes on curves

Consider the set of pairs px, yq P F9 that satisfy the equation x3 ` x “ y4

α7 ‚ ‚ ‚ ‚

α6 ‚

α5 ‚ ‚ ‚ ‚

α4 ‚ ‚ ‚ ‚

x α3 ‚ ‚ ‚ ‚

α2 ‚

α ‚ ‚ ‚ ‚

1 ‚ ‚ ‚ ‚

0 1 α α2 α3 α4 α5 α6 α7

Affine points of the Hermitian curve X over F9; α2 “ α ` 1

Hermitian codes

g : X Ñ P1

px, yq ÞÑ y

Space of functions V :“ x1, y, y2, x, xy, xy2y

A={Affine points of the Hermitian curve over F9}; n “ 27, k “ 6

C : V Ñ Fn9

E.g., message p1, α, α2, α3, α4, α5q

Fpx, yq “ 1 ` αy ` α2y2` α3x ` α4xy ` α5xy2

Fp0, 0q “ 1 etc.

Hermitian codes

g : X Ñ P1

px, yq ÞÑ y

A={Affine points of the Hermitian curve over F9}; n “ 27, k “ 6

C : V Ñ Fn9

E.g., message p1, α, α2, α3, α4, α5q

Fpx, yq “ 1 ` αy ` α2y2` α3x ` α4xy ` α5xy2

Fp0, 0q “ 1 etc.

LRC codes on curves

α7 α α7 α5 0α6 α2

α5 α6 α4 α2 0α4 α7 α3 α5 α5

x α3 α3 α7 α α

α2 α3

α 0 0 0 01 1 α6 α4 00 1

0 1 α α2 α3 α4 α5 α6 α7

Hermitian LRC codes

α7 α α7 α5 0α6 α2

α5 α6 α4 α2 0α4 α7 α3 α5 α5

x α3 α3 α7 α α

α2 α3

α X0 0 0 01 1 α6 α4 00 1

0 1 α α2 α3 α4 α5 α6 α7

Let P “ pα, 1q be the erased location.

Local recovery with Hermitian codes

α7 α α7 α5 0α6 α2

α5 α6 α4 α2 0α4 α7 α3 α5 α5

x α3 α3 α7 α α

α2 α3

α ? 0 0 01 1 α6 α4 00 1

0 1 α α2 α3 α4 α5 α6 α7

Let P “ pα, 1q be the erased location. Recovery set IP “ tpα4, 1q, pα3, 1qu

Find f pxq : f pα4q “ α7, f pα3q “ α3

ñ f pxq “ αx ´ α2

f pαq “ 0 “ FpPq

Computations can be done using GAP or Magma

α7 α α7 α5 0α6 α2

α5 α6 α4 α2 0α4 α7 α3 α5 α5

x α3 α3 α7 α α

α2 α3

α ? 0 0 01 1 α6 α4 00 1

0 1 α α2 α3 α4 α5 α6 α7

Computations can be done using GAP or Magma

α7 α α7 α5 0α6 α2

α5 α6 α4 α2 0α4 α7 α3 α5 α5

x α3 α3 α7 α α

α2 α3

α ? 0 0 01 1 α6 α4 00 1

0 1 α α2 α3 α4 α5 α6 α7

Computations can be done using GAP or MagmaConstructions of LRC codes 16 / 25

Hermitian codes

q “ q20, q0 prime power

X : xq0 ` x “ yq0`1

X has q30 “ q3{2 points in Fq

We obtain a family of q-ary codes of length n “ q30,

k “ pt ` 1qpq0 ´ 1q, d ě n ´ tq0 ´ pq0 ´ 2qpq0 ` 1q

with locality r “ q0 ´ 1.

Hermitian codes

X : xq0 ` x “ yq0`1

Hermitian codes

X : xq0 ` x “ yq0`1

Hermitian codes

X : xq0 ` x “ yq0`1

Geometric view of the construction

We take g : X Ñ Y “ P1, gpPq “ gpx, yq :“ y

α7 ‚ ‚ ‚ ‚

α6 ‚

α5 ‚ ‚ ‚ ‚

α4 ‚ ‚ ‚ ‚

x α3 ‚ ‚ ‚ ‚

α2 ‚

α ‚ ‚ ‚ ‚

1 ‚ ‚ ‚ ‚

0 1 α α2 α3 α4 α5 α6 α7

��

Projection on y

Since y is constant on the fibers (recovery sets), we get back to the univariate polynomialinterpolation

It is also possible to take gpPq “ x (projection on x); we obtain LRC codes with locality q0

α7 ‚ ‚ ‚ ‚

α6 ‚

α5 ‚ ‚ ‚ ‚

α4 ‚ ‚ ‚ ‚

x α3 ‚ ‚ ‚ ‚

α2 ‚

α ‚ ‚ ‚ ‚

1 ‚ ‚ ‚ ‚

0 1 α α2 α3 α4 α5 α6 α7

��

Projection on y

α7 ‚ ‚ ‚ ‚

α6 ‚

α5 ‚ ‚ ‚ ‚

α4 ‚ ‚ ‚ ‚

x α3 ‚ ‚ ‚ ‚

α2 ‚

α ‚ ‚ ‚ ‚

1 ‚ ‚ ‚ ‚

0 1 α α2 α3 α4 α5 α6 α7

��

Projection on y

General construction: Covering maps of curves

In the RS-like construction, X “ Y “ P1

General construction: Technical details

Map of curvesX, Y smooth projective absolutely irreducible curves over k

g : X Ñ Y

rational separable map of degree r ` 1

Lift the points of YS “ tP1, . . . ,Psu Ă Ypkq. Partition of points:

A :“ g´1pSq “ tPij, i “ 0, . . . , r, j “ 1, . . . , su Ď Xpkq

such that gpPijq “ Pj for all i, j

Let x P kpXq be such that kpXq “ kpYqpxq, and let deg x “ h as a projection x : X Ñ P1k

g : X Ñ Y

General construction, IILet Q8 Ă π´1p8q, deg Q8 “ ℓ ě 1

Let LpQ8q “ xf1, . . . , fmy,m ě ℓ ´ gY ` 1

Function spaceV :“

fjxi, i “ 0, . . . , r ´ 1; j “ 1, . . . ,mE

The code C is an image of the map

e :“ evA :V ÝÑ kpr`1qs

F ÞÑ pFpPijq, i “ 0, . . . , r, j “ 1, . . . , sq

Theorem: The subspace CpD, gq Ă Fq forms an pn, k, rq linear LRC code with theparameters

n “ pr ` 1qs

k “ rm ě rpℓ ´ gY ` 1q

d ě n ´ ℓpr ` 1q ´ pr ´ 1qh

provided that the right-hand side of the inequality for d is a positive integer.

fjxi, i “ 0, . . . , r ´ 1; j “ 1, . . . ,mE

n “ pr ` 1qs

fjxi, i “ 0, . . . , r ´ 1; j “ 1, . . . ,mE

n “ pr ` 1qs

Asymptotically good sequences of codesIn classic coding problem, codes on algebraic curves give rise to some excellent codefamilies, in particular, improving the asymptotic Gilbert-Varshamov bound on theparameters (Tsfasman-Vladuts-Zink ’81)

Let q “ q20, where q0 is a prime power. Take Garcia-Stichtenoth towers of curves:

x0 :“ 1; X1 :“ P1,kpX1q “ kpx1q;

Xl : zq0l ` zl “ xq0`1

l´1 , xl´1 :“zl´1

xl´2P kpXl´1q (if l ě 3q

There exist families of q-ary LRC codes with locality r whose rate and relative distancesatisfy

1 ´ δ ´3

?q ` 1

, r “?

q ´ 1

1 ´ δ ´2

q ´ 1

, r “?

˚qRecall the TVZ ’81 bound without locality: R ě 1 ´ δ ´ 1?

Asymptotically good sequences of codes

x0 :“ 1; X1 :“ P1,kpX1q “ kpx1q;

l´1 , xl´1 :“zl´1

1 ´ δ ´3

?q ` 1

, r “?

q ´ 1

1 ´ δ ´2

q ´ 1

, r “?

Asymptotically good sequences of codes

x0 :“ 1; X1 :“ P1,kpX1q “ kpx1q;

l´1 , xl´1 :“zl´1

1 ´ δ ´3

?q ` 1

, r “?

q ´ 1

1 ´ δ ´2

q ´ 1

, r “?

LRC codes on curves better than the GV bound

The asymptotic GV bound can be improved for any given (constant) r for all q greater thansome value.

LRC codes on curves better than the GV bound

The asymptotic GV bound can be improved for any given (constant) r for all q greater thansome value.

Extensions and open questions

Extensions

• LRC codes that correct locally multiple erasures

• LRC codes on curves with availability

• Adjusting the locality value r

Open questions

• Constructions of LRC codes from known families of good curves

• Parameters of subfield subcodes of AG LRC codes

• Automorphism groups of curves and codes with availability

References

S. Goparaju and R. Calderbank, Binary cyclic codes that are locally repairable, Proc. 2014 IEEE Int. Sympos.Inform. Theory, Honolulu, HI, pp. 676–680.

A. Barg, I. Tamo, and S. Vladut, Locally recoverable codes on algebraic curves, March 2016 (Preprint:arXiv:1603.08876).

I. Tamo, A. Barg, S. Goparaju, and R. Calderbank, Cyclic LRC codes, binary LRC codes, and upper bounds onthe distance of cyclic codes, International Journal of Information and Coding Theory, to appear (Preprint:arXiv:1603.08878).

A. Barg and I. Tamo, On codes with the locality constraint, IEEE IT Society Newsletter 65, no. 4 (2015),pp. 7–15.

Erasure (and LRC) codes in practiceA brief overview

LRC codes in practice 1 / 14

Before LRC codes

Replication: Google (x3, x27), FB (x3), HDFS, others

RS codes, e.g., [6,4] RS codes: RAID; used by FB, others

Array-type codes: RAID 6 (2 parities per block); IBM (EVENODD, X-Code)

“There are two kinds of erasure codes implemented in Raid: XOR code and Reed-Solomon code. [...]The replication on the source file can be reduced to 1 when using Reed-Solomon without losing data safety. Thedownside of having only one replica of a block is that reads of a block have to go to a single machine, reducingparallelism. Thus Reed-Solomon should be used on data that is not supposed to be used frequently.”http://wiki.apache.org/hadoop/HDFS-RAID

Before LRC codes

Applications of erasure codes

Systems using existing solutions:

• Amazon Web Services (AWS) + Glacier archiving service (previously S3); on themarket

• Dropbox (moved its storage from AWS to own system)

• Google Colossus distributed FS ([9,6] RS codes; OSDI2010)

• Facebook F4 ([14,10] RS code; OSDI 2014)

• Ceph storage platform (ceph.com)

Innovative proposals, erasure codes with some form of locality:

• HDFS-RAID, including HDFS Xorbas (FB)

• Microsoft Azure storage (VLDB Endowment, 2013)

• HACFS, adaptive coding for HDFS (IBM) (FAST ’15)

• Glacier (NSDI ’05), Petal (ASPLOS ’96), Weaver (FAST ’05), Stair codes (FAST ’14),CORE (MSST 2013), ...

• Proposals based on regenerating codes

(Many more)

Optimization of coding for storage

[Xia et al., HACFS, USENIX FAST’15]

Erasure coding in Ceph

Ceph: Object storage based freesoftware storage platform for storingon a single cluster

Erasure coding:uses a [5,3] MDS code

http://docs.ceph.com

Facebook’s storage system

The FB system stores files as redundant pieces of data distributed across different drivesand data centers.The system distinguishes between cold, warm, and hot data.

Hot data (Haystack): The data is replicated 3 times, two copies in one center on differentracks, and the third copy in a different center.

Warm data are written once, accessed many times, and can be deleted but not modified(e.g., 400B photos).

Facebook’s F4 “Warm” BLOB storage system relies on r14, 10s RS codes. The stripe of 14blocks is placed on 14 different nodes, different racks.

(S. Muralidhar e.a., OSDI ’14)

Main issues: Reliability and load analysis based on the models of failures in the system,and storage/system architecture

Optimizing storage, input/output, and bandwidth (the amount of data exchanges in thesystem).

Data placement

http://www.slideshare.net/ydn/hdfs-raid-facebook

HDFS RAID + LRC (HDFS-XORbas)

LRC codes for Hadoop file system(M. Sathiamoorthy et al., 2013 VLDB Endowment)

[14,10] RS code with two additional local parities:

X1,X2,X3,X4,X5 S1 X6,X7,X8,X9,X10 S2 P1,P2,P3,P4

Data Data RS parities

where S1 “ř5

i“1 aiXi, S2 “ř5

i“1 biXi`5 are the local parities

Reliability analysis in terms of mean time to data loss using a Markov failure model

HDFS RAID + LRC (HDFS-XORbas)

LRC codes for Hadoop file system(M. Sathiamoorthy et al., 2013 VLDB Endowment)

[14,10] RS code with two additional local parities:

X1,X2,X3,X4,X5 S1 X6,X7,X8,X9,X10 S2 P1,P2,P3,P4

Data Data RS parities

where S1 “ř5

i“1 aiXi, S2 “ř5

i“1 biXi`5 are the local parities

Reliability analysis in terms of mean time to data loss using a Markov failure model

Microsoft Azure storage

(previously Windows Azure storage)

F16 X X X X X X

21 3 4

S1 “ α1,1X1 ` α1,2X2 ` α1,3X3, S2 “ α2,1X4 ` α2,2X5 ` α2,3X6

Choose a basis of C so that every 4 erasures other than

tX1,X2,X3, S1u, tX4,X5,X6, S2u

are correctable (Maximally Recoverable Codes, refer to Part 5 “MR codes”).

• C. Huang et al., USENIX 2012: (6,2,2) code, (12,2,2) code (1.33x overhead)

• Installed in Windows 8.1 and Windows Server 2012

Microsoft Azure storage

(previously Windows Azure storage)

F16 X X X X X X

21 3 4

S1 “ α1,1X1 ` α1,2X2 ` α1,3X3, S2 “ α2,1X4 ` α2,2X5 ` α2,3X6

Choose a basis of C so that every 4 erasures other than

tX1,X2,X3, S1u, tX4,X5,X6, S2u

are correctable (Maximally Recoverable Codes, refer to Part 5 “MR codes”).

• C. Huang et al., USENIX 2012: (6,2,2) code, (12,2,2) code (1.33x overhead)

• Installed in Windows 8.1 and Windows Server 2012

Adaptive coding for HDFS

Implements 2 code families:

• 6 x 5 Product codes

• LRC (12,2,2) codes12 data blocks; 2 local parities, 2global parities

Optimizing overhead or recovery time

[Xia et al., HACFS, USENIX FAST’15]

Maximally recoverable codes for SSD arrays

RAID 5 or RAID 6 type architecture (r “ 1 to 3 parities per row)

global parities

RAID 5 or RAID 6 type architecture (r “ 1 or 2 parities per row)

Mixed failure model: device failure plus several sector failures

Algebraic construction of MR codes similar to rank error codes (Blaum et al., 2013)

RAID 5 or RAID 6 type architecture (r “ 1 or 2 parities per row)

Mixed failure model: device failure plus several sector failures

Algebraic construction of MR codes similar to rank error codes (Blaum et al., 2013)

A selection of references

C. Huang et al., Erasure coding in Windows Azure storage, Usenix ATC, 2012

M. Blaum et al., Partial MDS codes and applcations in RAID, IT Trans. 2013

˚ f4: Facebook’s Warm BLOB Storage system, 11th USENIX OSDI, 2014, 383–398

˚ M. Xia et al., A tale of two erasure codes in HDFS, 13th USENIX FAST, 2015, 213–226

A. Dimakis, LRC codes and index coding, Talk at DIMACS workshop, Dec. 2015

J. Plank and C. Huang, Tutorial on erasure codes, USENIX FAST 2013

XORing elephants: Erasure codes for Big Data, VLDB Endowment 2013, vol.6, 325–336

K.V. Rashmi et al., Erasure-coded distributed storage systems (FB study), USENIX 2013

˚ K.V. Rashmi et al., Jointly optimal erasure codes for I/O, storage & bandwidth, FAST 2015

˚ Includes overview of other industry-related solutions or proposals

Bounds on LRC codes

Bounds on LRC codes 1 / 22

Problem

C Ă Fnq – LRC code of length n, cardinality qk, distance d, locality r

Def: For any i P rns there exists a recovery (helper) set Ri Ă rnsztiu, |Ri| ď r and a functionϕi such that for any x P C

xi “ ϕipxj, j P Riq

(We do not assume that Ri X Rj “ H, i ‰ j )

Given k define dpn, k, r, qq to be the largest possible distance of C

Given d define kpn, d, r, qq to be the largest possible dimension of C(log-cardinality)

Bounds on d, k ?

Problem

Bounds on d, k ?

Problem

Bounds on d, k ?

Problem

Bounds on d, k ?

First results

d ď n ´ k ´

(Gopalan e.a.. IEEE Trans. IT, 2013)

The bound on the rate was proved in the first part of the tutorial.Let us prove the bound on the distance.

First results

d ď n ´ k ´

(Gopalan e.a.. IEEE Trans. IT, 2013)

The bound on the rate was proved in the first part of the tutorial.Let us prove the bound on the distance.

The Generalized Singleton bound

Main idea. Let C be a q-ary code of length n, size qk. The distance dpCq satisfies

@SĂrns:|CS|ăqk dpCq ď n ´ |S| pAq

Construct a set S:

Let m “ t k´1r u. Put T :“ H, L :“ H.

For j “ 1, 2, . . . ,m: Take ij P pT Y Lqc, Put T Ð T YRij , L Ð L Y tiju (Ri is the recovery set for ij).

Clearly |T| ď k ´ 1, so |CT | ď qk´1, and |CTYL| ď qk´1

Suppose that |T| “ k ´ 1 (if needed, add some coordinates from rnszT Y L) and put S “ T Y L.

We have |S| “ k ´ 1 ` m “ k ` r kr s ´ 2 and |CS| ă qk. Conclude by pAq.

Construct a set S:

Shortening (alphabet-dependent) bound

Theorem (Cadambe-Mazumdar ’13)

Let kqpm, dq be the maximum dimension of a q-ary code of length m and distance d. Then

kpn, d, r, qq ď minsě1

tsr ` kqpn ´ spr ` 1q, dqu.

Proof.Let C be a linear LRC code with distance d (the linearity assumption is easily lifted).

• Following the proof of the LRC Singleton bound, construct a subset I Ă rns such that|I| “ spr ` 1q, dim projIpCq ď sr.

• Consider the shortening CI of C by the coordinates in I. We obtain

length pCIq “ n ´ spr ` 1q, dpCIq “ d

dimpCIq “ k ´ dim projIpCq ě k ´ sr

kqpn ´ spr ` 1q, dq ě k ´ sr

Asymptotic problem

Bounds on codes are often considered in the asymptotics of n Ñ 8

Let Mpn, r, δnq be the max size of a code of length n, distance d, locality r

Rpr, δq :“ lim supnÑ8

log Mpn, r, δnq

Gilbert-Varshamov bound

For binary codes,

Rpr, δq ě 1 ´ min0ăsď1

! 1r ` 1

log2pp1 ` sqr`1

` p1 ´ sqr`1

q ´ δ log2 s)

Proof by random coding: Estimate the average weight enumerator for the ensemble givenby

11 . . . 1

11 . . . 1. . .

11 . . . 1HL

PrptdpCq ă δnuq

ď δnq´np rr`1 ´Rq min

0ăsď1

bpsq “1q

pp1 ` pq ´ 1qsqr`1

` pq ´ 1qp1 ´ sqr`1q

(Cadambe-Mazumdar ’15; Tamo-B.-Frolov ’15)

For binary codes,

! 1r ` 1

log2pp1 ` sqr`1

` p1 ´ sqr`1

q ´ δ log2 s)

11 . . . 1

11 . . . 1. . .

11 . . . 1HL

PrptdpCq ă δnuq

0ăsď1

bpsq “1q

pp1 ` pq ´ 1qsqr`1

For binary codes,

! 1r ` 1

log2pp1 ` sqr`1

` p1 ´ sqr`1

q ´ δ log2 s)

11 . . . 1

11 . . . 1. . .

11 . . . 1HL

PrptdpCq ă δnuq

0ăsď1

bpsq “1q

pp1 ` pq ´ 1qsqr`1

For binary codes,

! 1r ` 1

log2pp1 ` sqr`1

` p1 ´ sqr`1

q ´ δ log2 s)

11 . . . 1

11 . . . 1. . .

11 . . . 1HL

PrptdpCq ă δnuq

0ăsď1

bpsq “1q

pp1 ` pq ´ 1qsqr`1

For binary codes,

! 1r ` 1

log2pp1 ` sqr`1

` p1 ´ sqr`1

q ´ δ log2 s)

11 . . . 1

11 . . . 1. . .

11 . . . 1HL

PrptdpCq ă δnuq

0ăsď1

bpsq “1q

pp1 ` pq ´ 1qsqr`1

Asymptotic upper bounds

LetRqpr, δq “ lim sup

logq kpn, r, δnq

Using the Cadambe-Mazumdar bound together with classic bounds on kqpm, dq (with nolocality)

Rqpr, δq ďr

r ` 1p1 ´ δq, 0 ď δ ď 1

Rqpr, δq ďr

1 ´ δq

q ´ 1

, 0 ď δ ď q{pq ´ 1q

Rqpr, δq ď min0ďτď 1

τr ` p1 ´ τpr ` 1qqfq

1 ´ τpr ` 1q

wherefqpxq :“ hqp 1

q pq ´ 1 ´ xpq ´ 2q ´ 2b

pq ´ 1qxp1 ´ xqqq,

hqpxq :“ ´x logqpx{pq ´ 1qq ´ p1 ´ xq logqp1 ´ xq.

Asymptotic bounds

Singleton bound

Plotkin bound

LP boundGV bound

0.2 0.4 0.6 0.8 1.0∆

Binary codes; r “ 3

Improving GV bound using LRC codes on curves

(B-Tamo-Vladut, ’15)

(details in part on algebraic constructions)

Correcting more than one erasure

Our goal here is to correct locally ě 2 erasures.

Def. (correcting ρ ´ 1 erasure): C is an pn, k, r, ρq LRC code if each i P rns is contained in asubset Ii, |Ii| ď r ` ρ ´ 1 such that the projection of C on Ii forms a code with distance ě ρ.

Singleton-type bound:

d ď n ´ k ` 1 ´

´ 1¯

pρ ´ 1q (Prakash e.a., ISIT ’12)

GV-type bound:

• Proof by random coding, similar to the LRC GV bound

• For large q the GV bound can be improved using codes on curves (refer to the part“Algebraic constructions”)

(A.B., I. Tamo, and S. Vladut, ’16).

d ď n ´ k ` 1 ´

´ 1¯

GV-type bound:

d ď n ´ k ` 1 ´

´ 1¯

GV-type bound:

d ď n ´ k ` 1 ´

´ 1¯

GV-type bound:

Correcting more than one erasure - recursive bounds

Theorem (Hu, Tamo, B., ISIT’16)

Let Bpn, ρq be an upper bound on the size of a code of length n and distance ρ. If thefunction B is log-convex in n, then for any pn, k, r, ρq LRC code,

Qn ´ pd ´ 1q

r ` ρ ´ 1

logq Bpr ` ρ ´ 1, ρq

E.g., locality-dependent Plotkin bound: Let ρ ą pr ` ρ ´ 1qq´1

q , then

Qn ´ pd ´ 1q

r ` ρ ´ 1

logqρ

ρ ´q´1

q pr ` ρ ´ 1q

It is also possible to derive a locality-dependent Hamming bound.

Linear programming bounds

In classical coding theory, all (or most) known upper bounds on the size of codes with agiven distance can be derived via linear programming (Delsarte ’72). In this part we extendthis approach to codes with locality.

Steps:

• Construct an association scheme for codes with locality

• Derive positivity conditions for the distance distribution (Delsarte inequalities)

• Find closed-form expressions for upper bounds

C an pn, k, r, ρq LRC code, distance d.

Assume disjoint recovery groups of size r ` ρ ´ 1Product association scheme Hpst, qq “ pHpt, qqqbs, s ě 2

Let C be a q-ary pn, k, r, ρq LRC code with distance d. Define

T :“␣

i “ pi1, . . . , isq | i1 ` . . . ` is ě d,

ip P t0, ρ, ρ ` 1, . . . , ρ ` r ´ 1u for all p “ 1, . . . , s.(

Then |C| ď 1 `ř

iPT ai, where pai, i P Tq is given by the following LP problem

maximizeÿ

ai subject to

ai ě 0, i P T,ÿ

aiQij ě ´Kpr`ρ´1qj p0q, j P rr ` ρ ´ 1s

(Hu et al., Proc. ISIT 2016)

C an pn, k, r, ρq LRC code, distance d.Assume disjoint recovery groups of size r ` ρ ´ 1

Product association scheme Hpst, qq “ pHpt, qqqbs, s ě 2

T :“␣

i “ pi1, . . . , isq | i1 ` . . . ` is ě d,

Then |C| ď 1 `ř

maximizeÿ

ai subject to

ai ě 0, i P T,ÿ

C an pn, k, r, ρq LRC code, distance d.Assume disjoint recovery groups of size r ` ρ ´ 1Product association scheme Hpst, qq “ pHpt, qqqbs, s ě 2

T :“␣

i “ pi1, . . . , isq | i1 ` . . . ` is ě d,

Then |C| ď 1 `ř

maximizeÿ

ai subject to

ai ě 0, i P T,ÿ

C an pn, k, r, ρq LRC code, distance d.Assume disjoint recovery groups of size r ` ρ ´ 1Product association scheme Hpst, qq “ pHpt, qqqbs, s ě 2

T :“␣

i “ pi1, . . . , isq | i1 ` . . . ` is ě d,

Then |C| ď 1 `ř

maximizeÿ

ai subject to

ai ě 0, i P T,ÿ

Results:It is possible to

• Derive the Singleton, Hamming, and Plotkin bounds on pn, k, r, ρq LRC codes

• Improve the shortening bound in examples for short length (e.g., n “ 8-20) by solvingthe LP problem

Availability (Multiple recovery sets)

Def: C Ă Fn is an LRC code with availability if every coordinate i Ă rns has tpairwise disjoint recovery sets R

pjqi , and |R

pjqi | ď rj, j “ 1, . . . , t.

For simplicity let rj “ r, j “ 1, . . . , t and use the notation pn, k, r, tq LRC code

Example: p6, 3, 2, 2q LRC code with distance 3, parity-check matrix

0 0 0 1 1 10 1 1 0 0 11 0 1 0 1 0

Constructions:

• Case t “ 2 : Two-level codes give a natural way of constructing LRC codes with t “ 2.Examples include product codes, codes on bipartite graphs.

• Algebraic constructions of codes with t ě 2 recovery sets are covered in another partof the tutorial (refer to “Introduction”)

pjqi , and |R

pjqi | ď rj, j “ 1, . . . , t.

0 0 0 1 1 10 1 1 0 0 11 0 1 0 1 0

Constructions:

pjqi , and |R

pjqi | ď rj, j “ 1, . . . , t.

0 0 0 1 1 10 1 1 0 0 11 0 1 0 1 0

Constructions:

Upper bounds on codes with availability

Extension of the Singleton bound:

d ď n ´ k ` 2 ´

Q tpk ´ 1q ` 1tpr ´ 1q ` 1

(Wang-Zhang ’14; Rawat e.a. ’14)

Theorem (Tamo-B-Frolov ’15)

The parameters of a t-LRC code are bounded as follows:

śtj“1p1 ` 1

d ď n ´

Yk ´ 1ri

Proof idea:• construct a directed graph GpV,Eq, V “ rns, pi, jq P E iff j is a part of some recovery set for i

• study colorings and expansion of the graph, which enables us to trace propagation of dependent verticesin G

• The bounds are tight in some examples, but generally are likely to be loose.

• The above bound and the bound on the previous page are incomparable (i.e., thereare examples where each of them is better than the other one).

śtj“1p1 ` 1

d ď n ´

Yk ´ 1ri

śtj“1p1 ` 1

d ď n ´

Yk ´ 1ri

Existence of codes with availability; t “ 2

GV-type bound for codes with two recovery sets (TBF ’16)

• Let G be a regular graph of degree r ` 1. Each edge is adjacent to two disjoint sets of r edges,which form 2 recovery sets (bipartite graphs support recovery sets of different sizes)

• Consider the ensemble of linear codes with parity-check matrices

where Ar`2 is E-V incidence matrix of the complete graph Kr`2 (or of Kr1,r2 )HL is a random matrix with independent entries

• Analyze the ensemble-average distance of codes

• Let G be a regular graph of degree r ` 1. Each edge is adjacent to two disjoint sets of r edges,which form 2 recovery sets

(bipartite graphs support recovery sets of different sizes)

Many recovery sets

TheoremThere exist asymptotically good sequences of q-ary LRC codes (fixed q, n Ñ 8) with tdisjoint recovery sets of size r for any constant t, 1 ď t ď r

Ideas: Existence of bipartite expanding graphs; large finite field alphabets

Lower bound

Singleton bound

0.2 0.4 0.6 0.8 1.0∆

t “ 3; r “ 6δ “ 0,R “ 1 ´ t

Singleton bound

Rptqpr, δq ď

rtpr ´ 1q

rt`1 ´ 1p1 ´ δq

Many recovery sets

TheoremThere exist asymptotically good sequences of q-ary LRC codes (fixed q, n Ñ 8) with tdisjoint recovery sets of size r for any constant t, 1 ď t ď r

Ideas: Existence of bipartite expanding graphs; large finite field alphabets

Lower bound

Singleton bound

0.2 0.4 0.6 0.8 1.0∆

t “ 3; r “ 6δ “ 0,R “ 1 ´ t

Singleton bound

Rptqpr, δq ď

rtpr ´ 1q

rt`1 ´ 1p1 ´ δq

Open questions

• Alphabet-dependent bounds make little use of locality

• Find a good way to derive lower bounds for t ě 3, upper bounds for t ě 2

• Settle the corner point for δ “ 0 of the asymptotic bound for t ě 2

• Remove the structure assumption in the LP bound

• Derive an LP (closed-form asymptotic) bound for LRC codes that correct locally oneerasure

Selected references

P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, On the locality of codeword symbols, IEEE Trans. Inform.Theory, vol. 58, no. 11, pp. 6925–6934, 2011.

N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, Optimal linear codes with a local-error-correctionproperty, Proc. 2012 IEEE Internat. Sympos. Inform. Theory, IEEE, 2012, pp. 2776–2780.

V. Cadambe and A. Mazumdar, Bounds on the size of locally recoverable codes, IEEE Trans. Inform. Theory,vol. 61, no. 11, pp. 5787–5794, 2015.

A. Wang and Z. Zhang, Repair locality with multiple erasure tolerance, IEEE Trans. Inform. Theory, vol. 60,no. 11, 6979–6987, 2014.

A. Rawat, D. Papailiopoulos, A. Dimakis, and S. Vishwanath, Locality and availability in distributed storage, ISIT2014.

I. Tamo, A. Barg, and A. Frolov, Bounds on the parameters of locally recoverable codes, IEEE Trans. Inform.Theory, vol. 62, no. 6, 3070–3083, 2016.

S. Hu, I. Tamo, and A. Barg, Combinatorial bounds for LRC codes, Proc. ISIT 2016.

A. Barg, I. Tamo, and S. Vladut, Locally recoverable codes on algebraic curves, arXiv:1603.08876; also ISIT ’15.

Maximally Recoverable CodesCombining locality and storage efficiency

Maximally Recoverable Codes 1 / 17

Maximally Recoverable (MR) codesMotivation

• An (n, k) code is MDS ⇐⇒ Any k symbols of the codeword suffice for decoding

• MDS property is very important in storage

• Recovery set = a set of r + 1 symbols s.t. any symbol of the set can be recoveredby the remaining r symbols

• A set of k symbols of an (n, k, r) LRC code, that contains a recovery set does notsuffice for decoding

• =⇒ LRC codes are not MDS

• Goal: LRC codes that are “close” to be MDS

(n, k, r) MR codes - Definition

An (n, k, r) MR codes has the following properties:

• An (n, k, r) LRC code

• Disjoint recovery sets Ri, |Ri| = r + 1 for i = 1, ..., nr+1

• Any set S of k symbols s.t. Ri * S suffices for decoding

Remarks:

• If ∃i,Ri ⊆ S, |S| = k then decoding of the codeword is impossible

• Terminology: MR codes = Partially MDS codes

• Equivalent definition using Matroid Theory language

Remarks:

MR codes - Basics

LemmaMR codes are optimal LRC codes

Proof:

• Assume r divides k

• MR code is an Optimal LRC code

⇔ d = n− k − kr + 2

⇔ any d − 1 = n− k − kr + 1 erasures are recoverable

⇔ any n− (d − 1) = k + kr − 1 coordinates suffice for decoding

• Any k + kr − 1 coordinates contain a subset S s.t.

1. |S| = k

2. ∀i,Ri * S

• By the MR property, S suffices for decoding

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Proof:

⇔ d = n− k − kr + 2

1. |S| = k

2. ∀i,Ri * S

MR codes - Basics

Q: MR codes = Optimal LRC codes ? Ans: No

MR codes - Basics

LRC Codes

MR codes - Basics

LRC Codes

Optimal LRC Codes

MR codes - Basics

LRC Codes

Optimal LRC Codes

MR Codes

MR codes - Basics

LRC Codes

Optimal LRC Codes

MR Codes

Q: MR codes = Optimal LRC codes ?

Ans: No

MR codes - Basics

LRC Codes

Optimal LRC Codes

MR Codes

Constructing MR codes

• Probabilistic construction of optimal LRC code results an MR code

1. Non explicit

2. Field size is superpolynomial in the length (is this truly necessary?)

1. Any n, k, r

• Construction of Partially MDS codes [Blaum, Lee Hafner, Hetzler 13, Blaum, Plank,Schwartz, Yaakobi 14]

1. Restricted parameters

2. Field size |F| = O(n)

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

1. Non explicit

1. Any n, k, r

Constructing MR codes - Easy cases

• r = k

1. An (n, k) RS is an (n, k, k) MR code

2. |F| = O(n)

• r = 1

1. Duplication of an (n/2, k) RS is an (n, k, 1) MR code

2. |F| = O(n)

• Q: Is the RS-like construction an MR code?

1. No, in general

2. Yes in some cases (other cases?)

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

• r = k

2. |F| = O(n)

• r = 1

2. |F| = O(n)

1. No, in general

MR codes through linearized polynomials

Basic facts:

• Fqm is a vector space of dimension m over Fq

• Linearized polynomial has the form

f (x) =m−1∑i=0

aixqi, where ai ∈ Fqm

• For x, y ∈ Fqm and α1, α2 ∈ Fq

f (α1x + α2y) = α1f (x) + α2f (y)

Basic facts:

f (x) =m−1∑i=0

f (α1x + α2y) = α1f (x) + α2f (y)

Basic facts:

f (x) =m−1∑i=0

f (α1x + α2y) = α1f (x) + α2f (y)

Basic facts:

f (x) =m−1∑i=0

f (α1x + α2y) = α1f (x) + α2f (y)

Constructing an (n, k, r) MR code [Rawat, Koyluoglu, Silberstein, Vishwanath 14]

• Set m = nrr+1

• α1, ..., αm ∈ F2m linearly independent elements over F2

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

1. Given k information symbols of F2m

2. Form the linearized polynomial

3. Output the length−n codeword

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi}

,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi}

, ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding:

(v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1)

7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i

7→ (f (α) : α ∈ ∪iRi)

• Set m = nrr+1

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• Observations:

1. | ∪i Ri| = n

2.∑α∈Ri

α = 0

Encoding: (v0, ..., vk−1) 7→ f (x) =∑k−1

i=0 vix2i7→ (f (α) : α ∈ ∪iRi)

Constructing an (n, k, r) MR code - Cont’d

Locality: Recover f(β)=? for β ∈ Ri

∑α∈Ri

α = 0 =⇒ β =∑

α∈Ri\β

=⇒ f(β)= f( ∑α∈Ri\β

∑α∈Ri\β

MR Property:

• =⇒ 2k evaluations of f(x):

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

, since f(x)=∑k−1

i=0 vix2i

• f(x)

can be interpolated

∑α∈Ri

α = 0 =⇒ β =∑

α∈Ri\β

α =⇒ f(β)= f( ∑α∈Ri\β

∑α∈Ri\β

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

∑α∈Ri

α = 0

=⇒ β =∑

α∈Ri\β

α =⇒ f(β)= f( ∑α∈Ri\β

∑α∈Ri\β

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

∑α∈Ri

α = 0 =⇒ β =∑

α∈Ri\β

=⇒ f(β)= f( ∑α∈Ri\β

∑α∈Ri\β

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

∑α∈Ri

α = 0 =⇒ β =∑

α∈Ri\β

α =⇒ f(β)= f( ∑α∈Ri\β

α∈Ri\β

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

∑α∈Ri

α = 0 =⇒ β =∑

α∈Ri\β

α =⇒ f(β)= f( ∑α∈Ri\β

∑α∈Ri\β

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• α1, ..., αm are linearly independent over F2

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

• =⇒ the only linear dependencies in ∪iRi are the Ri’s

• =⇒ If S ⊆ ∪iRi s.t. |S| = k,Ri * S then the elements of S are linearlyindependent over F2

• =⇒ all 2k possible sums∑εs∈{0,1}

s∈Sεss are distinct

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi}

,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi}

, ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εs∈{0,1}s∈S

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

• R1 = {α1, ..., αr,∑r

i=1 αi} ,R2 = {αr+1, ..., α2r,∑2r

i=r+1 αi} , ...,R nr+1

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1 , since f(x)=∑k−1

i=0 vix2i

• f(x)

can be interpolated

MR Property:

f( ∑εs∈{0,1}

εss)=

∑εs∈{0,1}

εsf(s)

• deg(f ) ≤ 2k−1 , since f(x)=∑k−1

i=0 vix2i

• f(x)

can be interpolated

Properties of the MR code construction

• Flexible set of parameters n, k, r X

• Need m = nrr+1 linearly independent elements over F2 =⇒ |F| = 2

• Field size is exponential in n ×

• Can we do better?

• Flexible set of parameters n, k, r

• Need m = nrr+1 linearly independent elements over F2

=⇒ |F| = 2nr

• Field size is exponential in n

High-rate MR codes [Gopalan, Huang, Jenkins, Yekhanin 14]

• In any (n, k, r) LRC code k ≤ nrr+1

• High rate means nrr+1 − k = O( n

log n )

• F(n, k, r) = minimum field size needed for an (n, k, r) MR code

• Known bounds on F(n, k, r):

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k

0 1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0

1 2 3 4 ≥ 5

F(n, k, r)

2 r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0

1 2 3 4 ≥ 5

F(n, k, r) 2

r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

The code is a nr+1 independent parity check codes of length r + 1

log n )

nrr+1 − k 0 1

2 3 4 ≥ 5

F(n, k, r) 2

r + 1 θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1

2 3 4 ≥ 5

F(n, k, r) 2 r + 1

θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2

3 4 ≥ 5

F(n, k, r) 2 r + 1

θ(n) O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2

3 4 ≥ 5

F(n, k, r) 2 r + 1 θ(n)

O(n3/2) O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3

4 ≥ 5

F(n, k, r) 2 r + 1 θ(n) O(n3/2)

O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3

4 ≥ 5

F(n, k, r) 2 r + 1 θ(n) O(n3/2)

O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3 4

F(n, k, r) 2 r + 1 θ(n) O(n3/2)

O(n7/3) O(nnr

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3 4

F(n, k, r) 2 r + 1 θ(n) O(n3/2) O(n7/3)

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3 4 ≥ 5F(n, k, r) 2 r + 1 θ(n) O(n3/2) O(n7/3)

r+1−k−1)

log n )

nrr+1 − k 0 1 2 3 4 ≥ 5F(n, k, r) 2 r + 1 θ(n) O(n3/2) O(n7/3) O(n

nrr+1−k−1)

Summary of best known constructions of MR codes

High-Rate Constant-Rate Vanishing RateDimension

nrr+1 − k = O( n

log n )kn <

rr+1 k = O( n

log n )

Field size

r+1−k−1 2nr

r+1 nk

• High-Rate Regime: Gopalan, Huang, Jenkins, Yekhanin 14

• Constant-Rate Regime: Rawat, Koyluoglu, Silberstein, Vishwanath 14

• Vanishing Rate Regime: T., Papailiopoulos, Dimakis 14

High-Rate Constant-Rate Vanishing RateDimension

nrr+1 − k = O( n

log n )kn <

rr+1 k = O( n

log n )

Field size

r+1−k−1 2nr

r+1 nk

High-Rate Constant-Rate Vanishing RateDimension nr

r+1 − k = O( nlog n )

rr+1 k = O( n

log n )

Field size

r+1−k−1 2nr

r+1 nk

r+1 − k = O( nlog n )

rr+1 k = O( n

log n )

Field size nnr

r+1−k−1

r+1 nk

r+1 − k = O( nlog n )

rr+1 k = O( n

log n )

Field size nnr

r+1−k−1

r+1 nk

r+1 − k = O( nlog n )

k = O( nlog n )

Field size nnr

r+1−k−1

r+1 nk

r+1 − k = O( nlog n )

k = O( nlog n )

Field size nnr

r+1−k−1 2nr

r+1 − k = O( nlog n )

k = O( nlog n )

Field size nnr

r+1−k−1 2nr

r+1 − k = O( nlog n )

rr+1 k = O( n

log n )

Field size nnr

r+1−k−1 2nr

r+1 − k = O( nlog n )

rr+1 k = O( n

log n )

Field size nnr

r+1−k−1 2nr

r+1 nk

Lower bounds on the field size

F(n, k, r) is upper bounded by a superpolynomial function of n

Theorem [Gopalan, Huang, Jenkins, Yekhanin 14]

F(n, k, r) ≥ k + 1

Proof:

• Puncturing the code by one coordinate from each recovery set results an ( nrr+1 , k)

MDS code

• By the bound on the field size of an MDS code nrr+1 ≤ |F|+ ( nr

r+1 − k)− 1

Questions:

1. What is the smallest field size needed for an (n, k, r) MR code?

- For optimal LRC order n is sufficient

- Contains the MDS conjecture as a subproblem2. Are MR codes inherently different from optimal LRC codes?

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

- Contains the MDS conjecture as a subproblem

2. Are MR codes inherently different from optimal LRC codes?

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

F(n, k, r) ≥ k + 1

Proof:

MDS code

r+1 − k)− 1

Questions:

MR codes under arbitrary topologies

• R = {Ri ⊆ [n] : i = 1, ...,m} constraints on code’s coordinates,

• CR = a code with a set of coordinate constraints R

• Definition: Given a code CR, its Information sets

I = {S ⊆ [n] : CR is decodable from S, |S| = k}

• Goal: Construct a code CR with local constraints R that is “highly” decodable(large set I)

• Definition: CR with information sets I is MR if for any C′R with I′

I′ ⊆ I ( i.e. I is a maximum)

1. E.g. R1 = {1, ..., r + 1},R2 = {r + 2, ..., 2(r + 1)}, ...,R nrr+1

2. Can be arbitrary code constraints

1. E.g. R1 = {1, ..., r + 1},R2 = {r + 2, ..., 2(r + 1)}, ...,R nrr+1

• Definition: CR with I is MR if for any C′R with I′

I′ ⊆ I

• Example: R1 = {1, ..., r + 1} , ...,Rn/(r+1) = {n− r, ..., n}

CR is MR if I = {S ⊂ [n] : Ri * S, |S| = k}

LemmaFor any set R there exists an MR code CR over a large enough field

I′ ⊆ I

• Example:

R1 = {1, ..., r + 1} , ...,Rn/(r+1) = {n− r, ..., n}

I′ ⊆ I

• Example: R1 = {1, ..., r + 1}

, ...,Rn/(r+1) = {n− r, ..., n}

I′ ⊆ I

MR codes under arbitrary topologies - cont’d

“Maximally Recoverable Codes for Grid-like Topologies” [Gopalan, Hu, Saraf, Wang,Yekhanin 16]

1. Good problem introduction

2. The first non-trivial lower bound on |F| (for grid-like topologies)

3. More results...

Questions:

1. Given an MR code with set of constraints R characterize its information sets I

2. MR code constructions for other topologies (e.g. grid-like topologies)

3. Are the known constructions extendable to other topologies?

4. Field size?

5. Is it possible to construct MR codes using the RS-like construction ?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

3. More results...

Questions:

4. Field size?

LRC codes on graphs

In this part we used materials provided to us by Fatemeh Arbabjolfaei (University of California, San Diego),

Søren Riis (Queen Mary, University of London ) and Arya Mazumdar (University of Massachusetts at

Amherst), with their kind permission.

LRC codes 1 / 15

LRC codes on graphs [Mazumdar 2014, Shanmugam and Dimakis 2014]

LRC codes 2 / 15

• Storage recovery graph G1

LRC codes 2 / 15

• Each node can recover its content from its incoming neighbors

LRC codes 2 / 15

• Recovery sets:

LRC codes 2 / 15

• Recovery sets: A1 = {2},

LRC codes 2 / 15

• Recovery sets: A1 = {2}, A2 = {1, 3}, A3 = {1}

LRC codes 2 / 15

• How much data can be stored?

LRC codes 2 / 15

• How much data can be stored?

• Which coding scheme achieves the limit?

LRC codes 2 / 15

Storage Capacity

• The network is modeled by a (directed) graph G = (V,E), |V| = n

LRC codes 3 / 15

Storage Capacity

• Each node (vertex) stores a symbol from Fq

LRC codes 3 / 15

Storage Capacity

• Storage code:

LRC codes 3 / 15

Storage Capacity

• Storage code:1. A set of vectors C ⊆ F

LRC codes 3 / 15

Storage Capacity

2. n recovery functions fi, s.t. for any (x1, ..., xn) ∈ C

LRC codes 3 / 15

Storage Capacity

fi(xj : j ∈ N(i)) = xi

LRC codes 3 / 15

Storage Capacity

• The storage capacity of G over Fq

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

LRC codes 3 / 15

Storage Capacity

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C| ≤ n

LRC codes 3 / 15

Storage Capacity

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C| ≤ n

• The storage capacity of G is

Cap(G) = supq

Capq(G)

LRC codes 3 / 15

Storage Capacity

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C| ≤ n

• The storage capacity of G is

Cap(G) = supq

Capq(G) = limq→∞

Capq(G) (Fekete’s lemma)

LRC codes 3 / 15

LRC codes on graphs - Example

LRC codes 4 / 15

• Each node stores a single bit

LRC codes 4 / 15

• Two bits of information b1, b2 can be stored (Cap2(G) = 2)

LRC codes 4 / 15

• Store:• N1 = b1• N2 = b2• N3 = b1 + b2

LRC codes 4 / 15

• Store:• N1 = b1• N2 = b2• N3 = b1 + b2

• Cap(G) = supq Capq(G)

LRC codes 4 / 15

• Store:• N1 = b1• N2 = b2• N3 = b1 + b2

• Cap(G) = supq Capq(G) = 2

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• N1 = N2 ∧ N5,

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• N1 = N2 ∧ N5,N2 = N1 ∨ N3,N3 = N2 ∧ N4,N4 = N3 ∧ N5,N5 = N1 ∨ N4

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• Cap2(G) ≥ log2(5) = 2.32...

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• Cap2(G) ≥ log2(5) = 2.32...

• In fact Cap2(G) = log2(5) = 2.32...

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• Cap2(G) ≥ log2(5) = 2.32...

• In fact Cap2(G) = log2(5) = 2.32...

• However Cap(G) = Cap4(G)

LRC codes 4 / 15

• C = {(0, 0, 0, 0, 0), (0, 1, 1, 0, 0), (0, 0, 0, 1, 1), (1, 1, 0, 1, 1), (1, 1, 1, 0, 1)}

• Cap2(G) ≥ log2(5) = 2.32...

• In fact Cap2(G) = log2(5) = 2.32...

• However Cap(G) = Cap4(G) = 2.5 [Blasiak, Kleinberg, Lubetzky 13, Christofides, Markstrom 11]

LRC codes 4 / 15

• Cap4(G) = 2.5 → 5 bits can be stored in the system {1, 2, 3, 4, 5}

1, 22, 3

3, 44, 5

LRC codes 4 / 15

1, 22, 3

3, 44, 5

• Cap(G) ≤ 2.5?

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

H(N1,N2,N3,N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

H(N1,N2,N3,N4,N5) = H(N1,N2,N3,N4,N5) + H(N4,N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

= H(N1,N3,N4,N5) + H(N4,N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

= H(N1,N3,N4,N5) + H(N4,N5)− H(N4,N5)

≤ H(N1,N4,N5) + H(N3,N4,N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

= H(N1,N3,N4,N5) + H(N4,N5)− H(N4,N5)

≤ H(N1,N4,N5) + H(N3,N4,N5)− H(N4,N5)

= H(N1,N4) + H(N3,N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

= H(N1,N3,N4,N5) + H(N4,N5)− H(N4,N5)

≤ H(N1,N4,N5) + H(N3,N4,N5)− H(N4,N5)

= H(N1,N4) + H(N3,N5)− H(N4,N5)

≤ H(N1) + H(N3) + H(N4) + H(N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

• H(N1,N2,N3,N4,N5) ≤ H(N1) + H(N3) + H(N4) + H(N5)− H(N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

H(N1,N2,N3,N4,N5) = H(N2,N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

H(N1,N2,N3,N4,N5) = H(N2,N4,N5)

= H(N4,N5) + H(N2|N4,N5)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

H(N1,N2,N3,N4,N5) = H(N2,N4,N5)

= H(N4,N5) + H(N2|N4,N5)

≤ H(N4,N5) + H(N2)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

• H(N1,N2,N3,N4,N5) ≤ H(N4,N5) + H(N2)

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

• H(N1,N2,N3,N4,N5) ≤ H(N4,N5) + H(N2)

• ⇒ 2H(N1,N2,N3,N4,N5) ≤ H(N1) + H(N2) + H(N3) + H(N4) + H(N5) ≤ 5

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

• H(N1,N2,N3,N4,N5) ≤ H(N4,N5) + H(N2)

• ⇒ Cap(G) ≤ 2.5

LRC codes 4 / 15

• H(X, Y, Z) + H(Z) ≤ H(X, Z) + H(Y, Z)

• H(N1,N2,N3,N4,N5) ≤ H(N4,N5) + H(N2)

• ⇒ Cap(G) ≤ 2.5

• Question: Given an arbitrary graph G how to calculate Cap(G)?

LRC codes 4 / 15

The Confusion Graph [Bar-Yossef, Birk, Jayram, and Kol 2006]

Confusion Graph

LRC codes 5 / 15

Confusion Graph

• Given a graph G on n vertices and q ∈ N

LRC codes 5 / 15

Confusion Graph

• The confusion graph Conf(G) has

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

• Edge set: v = (v1, ..., vn) ∼ u = (u1, ..., un) if

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

1. Exists i s.t. vi 6= ui and

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

1. Exists i s.t. vi 6= ui and2. v|N(i) = u|N(i)

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

• v, u ∈ Fnq can both be contained in a storage code ⇔ v ≁ u in Conf(G)

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

• C ⊆ Fnq is a storage code ⇔ C is an independent set of Conf(G)

LRC codes 5 / 15

Confusion Graph

• Vertex set: Fnq

• C ⊆ Fnq is a storage code ⇔ C is an independent set of Conf(G)

• Corollary:Capq(G) = logq α(Conf(G)),

where α(·) is the independence number

LRC codes 5 / 15

Computing Cap(G)

• Capq(G) = logq α(Conf(G))

LRC codes 6 / 15

Computing Cap(G)

• Computing the independence number is hard

LRC codes 6 / 15

Computing Cap(G)

• Need to compute it for arbitrarily large confusion graphs (q → ∞)

LRC codes 6 / 15

Computing Cap(G)

• Need to compute it for arbitrarily large confusion graphs (q → ∞)

• Bounds on Cap(G)?

LRC codes 6 / 15

Index Coding [Birk, Kol 1998]

LRC codes 7 / 15

• Side information A1 = {2}, A2 = {1, 3}, A3 = {1}

LRC codes 7 / 15

• Side information graph G1

LRC codes 7 / 15

• Send x1 + x2 and x3

LRC codes 7 / 15

• What is the fundamental limit on the number of transmissions?

LRC codes 7 / 15

• Many generalizations...

LRC codes 7 / 15

• G = (V,E), |V| = n receivers

LRC codes 7 / 15

• Receiver i requests message xi ∈ Fq

LRC codes 7 / 15

• An incoming edge represents the side information

LRC codes 7 / 15

• An index code of length L is

LRC codes 7 / 15

• An index code of length L is1. An encoding function E : Fn

q → FLq

LRC codes 7 / 15

• An index code of length L is1. An encoding function E : Fn

q → FLq

2. n decoding functions Di s.t. for any x ∈ Fnq

Di(E(x), x|N(i)) = xi.

LRC codes 7 / 15

• Indexq(G) = the shortest length of an index code over Fq

LRC codes 7 / 15

• Index(G) = infq Indexq(G)

LRC codes 7 / 15

• Index(G) = infq Indexq(G) = limq→∞ Indexq(G)

LRC codes 7 / 15

Index Coding - Continued

LRC codes 8 / 15

• Index code: an encoding function E : Fnq → F

LRC codes 8 / 15

• The confusion of G, Conf(G) has

LRC codes 8 / 15

• Vertex set: Fnq

LRC codes 8 / 15

• Vertex set: Fnq

LRC codes 8 / 15

• Vertex set: Fnq

Exists i s.t.vi 6= ui

LRC codes 8 / 15

• Vertex set: Fnq

Exists i s.t.vi 6= ui and v|N(i) = u|N(i)

LRC codes 8 / 15

• Vertex set: Fnq

• Fact: E(·) is an index code iff it is a proper coloring of Conf(G)

LRC codes 8 / 15

• Vertex set: Fnq

Corollary

Given a graph G

LRC codes 8 / 15

• Vertex set: Fnq

Corollary

Given a graph GIndexq(G) = ⌈logq χ(Conf(G))⌉,

where χ(·) is the chromatic number

LRC codes 8 / 15

Duality between Storage Capacity and Index Coding

LRC codes 9 / 15

Theorem [Mazumdar 14, Shanmugam and Dimakis 14]

Let G = (V,E), |V| = n, then

LRC codes 9 / 15

Cap(G) + Index(G) = n

LRC codes 9 / 15

Observations:

• Storage Capacity and Index coding are dual problems

LRC codes 9 / 15

Observations:

• upper bound on Index(G) ⇒ lower bound on Cap(G)

LRC codes 9 / 15

Observations:

• upper bound on Index(G) ⇒ lower bound on Cap(G)

• lower bound on Index(G) ⇒ upper bound on Cap(G)

LRC codes 9 / 15

Proof:

LRC codes 9 / 15

Proof:

• χf = fractional chromatic number

LRC codes 9 / 15

Proof:

• Hq = Confq(G) is a Cayley Graph of (Fnq,+)

LRC codes 9 / 15

Proof:

• Hq = Confq(G) is a Cayley Graph of (Fnq,+) → is vertex transitive

LRC codes 9 / 15

Proof:

⇒ α(Hq) · χf (Hq) = |V(Hq)| = qn

LRC codes 9 / 15

Proof:

⇒ α(Hq) · χf (Hq) = |V(Hq)| = qn

⇒ limq→∞

logq α(Hq) + limq→∞

logq χf (Hq) = n

LRC codes 9 / 15

Proof:

⇒ α(Hq) · χf (Hq) = |V(Hq)| = qn

⇒ limq→∞

logq α(Hq) + limq→∞

logq χf (Hq) = n

⇒ Cap(G) + limq→∞

logq χf (Hq) = n

LRC codes 9 / 15

Proof:

• Cap(G) + limq→∞ logq χf (Hq) = n

LRC codes 9 / 15

Proof:

• For any graph, and in particular, for Hq

LRC codes 9 / 15

Proof:

χf (Hq) ≤ χ(Hq) ≤ χf (Hq) · ln |V(Hq)|

LRC codes 9 / 15

Proof:

⇒ limq→∞

logq χf (Hq) = limq→∞

logq χ(Hq)

LRC codes 9 / 15

Proof:

⇒ limq→∞

logq χf (Hq) = limq→∞

logq χ(Hq) = Index(G)

LRC codes 9 / 15

Bounds on Cap(G)

LRC codes 10 / 15

Bounds on Cap(G)

• Any bound on Index(G) ⇒ a bound on Cap(G)

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

• Cap(G) ≤ n − α(G) = VertexCover(G)

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:• Cap(G) ≤ n − α(G) = VertexCover(G)

• First proof: Index(G) ≥ α(G)

LRC codes 10 / 15

Bounds on Cap(G)

• Second Proof:

LRC codes 10 / 15

Bounds on Cap(G)

• Second Proof:

1. S ⊆ V(G), a maximal independent set

LRC codes 10 / 15

Bounds on Cap(G)

• Second Proof:

1. S ⊆ V(G), a maximal independent set

2. The information in S can be recovered by the nodes V(G)\S

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

• Shannon type inequalities (Recall the proof for the pentagon)

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

• Non-Shannon type inequalities

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

• Non-Shannon type inequalities• Theorem [Baber, Chistofides,Dang, Riis, Vaughan]:

There exists a graph G on 10 vertices s.t.

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

There exists a graph G on 10 vertices s.t.• Shannon type inequalities give: Cap(G) ≤ 114/17 = 6.705...

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

• Non-Shannon type inequalities give: Cap(G) ≤ 20/3 = 6.666...

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

Lower bounds:

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

Lower bounds:

• Index(G) ≤ min{χf (G),Minrank(G)}

LRC codes 10 / 15

Bounds on Cap(G)

Upper Bounds:

Lower bounds:

• Index(G) ≤ min{χf (G),Minrank(G)}

• Cap(G) ≥ n − min{χf (G),Minrank(G)}

LRC codes 10 / 15

Guessing game on graphs [Riis2005]

LRC codes 11 / 15

• n players

LRC codes 11 / 15

• n players

• Red/Blue hat is assigned independently to each player. xj =

0 if blue

1 if red

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Each player:

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Each player:1. observes only the hats of his mates (complete graph)

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Each player:1. observes only the hats of his mates (complete graph)2. makes a guess on the color of his hat

LRC codes 11 / 15

• n players

0 if blue

1 if red

• What is the probability that all players guessed correctly?

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Random guess → Psuccess =(

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Strategy: assume∑n

j=1 xj = 0

LRC codes 11 / 15

• n players

0 if blue

1 if red

• Strategy: assume∑n

j=1 xj = 0 → Psuccess = 12

LRC codes 11 / 15

• Arbitrary graphs?

LRC codes 11 / 15

• Neighbors A1 = {2}, A2 = {1, 3}, A3 = {1}

LRC codes 11 / 15

• Neighbors A1 = {2}, A2 = {1, 3}, A3 = {1}

• Fundamental limit on the amount of improvement over random guesses?

LRC codes 11 / 15

• Neighbors A1 = {2}, A2 = {1, 3}, A3 = {1}

• What is the optimal strategy?

LRC codes 11 / 15

• Neighbors A1 = {2}, A2 = {1, 3}, A3 = {1}

• What is the optimal strategy?

• Connection to index coding [Yi–Sun–Jafar–Gesbert 2015]

LRC codes 11 / 15

Guessing Number [Riis2005]

• Given a (directed) graph G = (V,E), |V| = n

LRC codes 12 / 15

• Each player has a hat of q possible colors

LRC codes 12 / 15

• Let X = (x1, ..., xn), xi ∈ [q] be a hat assignment

LRC codes 12 / 15

• A strategy f = (f1, ..., fn) composed of n functions fi : (xj : j ∈ N(i)) → [q]

LRC codes 12 / 15

• The strategy is successful iff

LRC codes 12 / 15

fi(xj : j ∈ N(i)) = xi for all i

LRC codes 12 / 15

• Equivalently: The hat assignment X is a fixed point for the mapping

X = (x1, ..., xn)

LRC codes 12 / 15

• Equivalently: The hat assignment X is a fixed point for the mapping

X = (x1, ..., xn) 7→ f (X) = (f1(X), ..., fn(X))

fi(X) is a function of the hat colors of the neighbors of i

LRC codes 12 / 15

q-Guessing Number - Definition

The q-guessing number of G

Guessq(G) = maxf is a

strategy

P(f is successful)Prand

LRC codes 13 / 15

strategy

• For a fixed strategy f

LRC codes 13 / 15

strategy

= logq P(f is successful) + n

LRC codes 13 / 15

strategy

= logq

| fixed points of f |qn

LRC codes 13 / 15

strategy

= logq

= logq | fixed points of f |

LRC codes 13 / 15

strategy

= maxf is a

strategy

logq |fixed points of f |

= logq

= logq | fixed points of f |

LRC codes 13 / 15

strategy

= maxf is a

strategy

LRC codes 13 / 15

strategy

= maxf is a

strategy

Guessing Number - Definition

The guessing number of G

Guess(G) = supq

Guessq(G)

LRC codes 13 / 15

Guessing games and LRC codes on graphs

LRC codes 14 / 15

• Given a graph G

LRC codes 14 / 15

• Given a graph G

Guessq(G) = maxf is a strategy

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

• The set of fixed points of a strategy forms a storage code C for G

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

Guessq(G) ≤ Capq(G)

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

• The codewords of a storage code form the set of fixed points of some strategy forthe hat game

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

Capq(G) ≤ Guessq(G)

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

• Guessq(G) = Capq(G)

LRC codes 14 / 15

• Given a graph G

Capq(G) = maxC⊆F

nq is a

storage code for G

logq |C|

• Guessq(G) = Capq(G)

• Guess(G) = Cap(G)

LRC codes 14 / 15

Open questions

LRC codes 15 / 15

Open questions

• Derive improved bounds on Cap(G)

LRC codes 15 / 15

Open questions

• How expansion properties of the graph (expander graphs) affect the storagecapacity?

LRC codes 15 / 15

Open questions

• Approximate the storage capacity using spectral properties of the graph

LRC codes 15 / 15

Open questions

• Given a graph G, characterize the tradeoff between the distance and the rate ofthe code

LRC codes 15 / 15

Open questions

• Given a graph G, characterize the tradeoff between the distance and the rate ofthe code

• Networks with time varying topologies: Develop algorithms that efficiently shiftbetween storage codes for different topologies

LRC codes 15 / 15

theory and practice of codes with locality

Documents

ecen 5682 theory and practice of error control codes -...

non-locality in quantum theory, in biology and neuroscience,...

barthes 5 codes theory

music theory essentials secret chord codes - sign up · pdf...

information theory and error-correcting codes in g

information and coding theory hamming codes. golay codes....

speech codes theory

locality in coding theory - massachusetts institute of...

structured codes in information theory: mimo and network...

coding theory and applications linear codes - up … ·...

coding theory linear codes: general - magma · coding...

combinatorial neural codes from a mathematical coding theory...

edward gibson (2000) dependency locality theory ·...

communication theory (speech codes theory)

coding theory...

low-rate turbo-hadamard codes - information theory, ieee

lecture 8: parallelism and locality in scientific...

eced 4504 digital transmission theory decoding of...

16.548 notes 15: concatenated codes, turbo codes...

coding theory and applications linear codes ·...