least squares superposition codes of moderate dictionary ...arb4/presentations/ieeesw0617.pdf ·...
TRANSCRIPT
Least Squares Superposition Codes of ModerateDictionary Size,
Reliable at Rates up to Capacity
Antony JosephDepartment of Statistics
Yale University
Joint work with Andrew Barron
June 13-18, 20092010 IEEE International Symposium on Information Theory
Barron, Joseph (Yale) Least Squares, Superposition codes 1 / 22
Outline
1 The Gaussian channel
2 Sparse Superposition Codes
3 Partitioned Superposition CodesEncodingPerformance of Least Squares Decoding
4 Analysis of Reliability at Rates up to Capacity
Barron, Joseph (Yale) Least Squares, Superposition codes 2 / 22
Gaussian Channel. (Power P , Noise variance σ2)
uMessage
(length K bits)
Encoder xCodeword(length n)
Channel
εNoise ∼ Normal(0, σ2In)
y Decoder u
Characteristics
Power constraint: Ave‖x‖2≤nP
signal-to-noise: v = P/σ2
Capacity: 12 log(1 + P
σ2 )
Interested in
Rate: R = K/n
Prob{u 6= u}Low fraction of mistakes
Barron, Joseph (Yale) Least Squares, Superposition codes 3 / 22
Gaussian Channel. (Power P , Noise variance σ2)
uMessage
(length K bits)
Encoder xCodeword(length n)
Channel
εNoise ∼ Normal(0, σ2In)
y Decoder u
Characteristics
Power constraint: Ave‖x‖2≤nP
signal-to-noise: v = P/σ2
Capacity: 12 log(1 + P
σ2 )
Interested in
Rate: R = K/n
Prob{u 6= u}Low fraction of mistakes
Barron, Joseph (Yale) Least Squares, Superposition codes 3 / 22
Gaussian Channel
Challenges in Code construction
Achieve Rate arbitrarily close to Capacity
Good error exponents
Manageable codebook
Fast Encoding
Fast Decoding
Barron, Joseph (Yale) Least Squares, Superposition codes 4 / 22
Sparse Superposition Codes
Model : Y = Xβ + ε
uMessage
(length K)
β
Encoder
XβCodeword(length n)
Channel
εNoise ∼ N(0, σ2In)
Y Decoder u
Code design
n × N design matrix X
Received string: Y = Xβ + ε
Coefficient vector β is sparse is non-zero values
Not a linear code in algebraic coding sense
Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22
Sparse Superposition Codes
Model : Y = Xβ + ε
uMessage
(length K)
β
Encoder
XβCodeword(length n)
Channel
εNoise ∼ N(0, σ2In)
Y Decoder u
Code design
n × N design matrix X
Received string: Y = Xβ + ε
Coefficient vector β is sparse is non-zero values
Not a linear code in algebraic coding sense
Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22
Sparse Superposition Codes
Model : Y = Xβ + ε
uMessage
(length K)
β
Encoder
XβCodeword(length n)
Channel
εNoise ∼ N(0, σ2In)
Y Decoder u
Code design
n × N design matrix X
Received string: Y = Xβ + ε
Coefficient vector β is sparse is non-zero values
Not a linear code in algebraic coding sense
Barron, Joseph (Yale) Least Squares, Superposition codes 5 / 22
Sparse Superposition Codes
X = . . .
β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)
Code design
n × N matrix X with entries independent Normal(0,P/L)
β has exactly L, with L� N non-zero elements, all equal to 1
Average of ||Xβ||2 ≤ nP
Codeword is the sum of the selected columns
Barron, Joseph (Yale) Least Squares, Superposition codes 6 / 22
Partitioned Superposition Codes
X = . . .
B columns B columns B columns
Section 1 Section 2 .... Section L
β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)
0,1,............................,B-1,0,1,............................,B-1
Code design
n × N matrix X with entries independent Normal(0,P/L)
Columns of X divided into L sections of size B. So N = LB
β has exactly one element non-zero in each section
Relationship: L, B , n and R
Number of codewords : BL
Length of message string K = L log2 B bits
Blocklength n = (1/R)L log B
Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22
Partitioned Superposition Codes
X = . . .
B columns B columns B columns
Section 1 Section 2 .... Section L
β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)
0,1,............................,B-1,0,1,............................,B-1
Relationship: L, B , n and R
Number of codewords : BL
Length of message string K = L log2 B bits
Blocklength n = (1/R)L log B
Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22
Partitioned Superposition Codes
X = . . .
B columns B columns B columns
Section 1 Section 2 .... Section L
β = (....., 1, ............. .............., 1, ... .... .............., 1, ...)0,1,............................,B-1,0,1,............................,B-1
Relationship: L, B , n and R
Number of codewords : BL
Length of message string K = L log2 B bits
Blocklength n = (1/R)L log B
Barron, Joseph (Yale) Least Squares, Superposition codes 7 / 22
Encoding
0 1 1 1 0 0 0 0 1 1 0 1
{ { {
Non-zero elements of β:
7th element from Section 1
0th element from Section 2
13th element from Section 3
Map from Message u to β
Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.
Split u into L subtrings of log2 B bits
Substrings give addresses of chosencolumns
For map : u → β,Choose non-zero elements of β in eachsection as given by these indices
Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22
Encoding
0 1 1 1 0 0 0 0 1 1 0 1{ { {
Non-zero elements of β:
7th element from Section 1
0th element from Section 2
13th element from Section 3
Map from Message u to β
Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.
Split u into L subtrings of log2 B bits
Substrings give addresses of chosencolumns
For map : u → β,Choose non-zero elements of β in eachsection as given by these indices
Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22
Encoding
0 1 1 1 0 0 0 0 1 1 0 1{ { {
0111 0000 1101
Non-zero elements of β:
7th element from Section 1
0th element from Section 2
13th element from Section 3
Map from Message u to β
Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.
Split u into L subtrings of log2 B bits
Substrings give addresses of chosencolumns
For map : u → β,Choose non-zero elements of β in eachsection as given by these indices
Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22
Encoding
0 1 1 1 0 0 0 0 1 1 0 1{ { {
7 0 13
Non-zero elements of β:
7th element from Section 1
0th element from Section 2
13th element from Section 3
Map from Message u to β
Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.
Split u into L subtrings of log2 B bits
Substrings give addresses of chosencolumns
For map : u → β,Choose non-zero elements of β in eachsection as given by these indices
Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22
Encoding
0 1 1 1 0 0 0 0 1 1 0 1{ { {
7 0 13
Non-zero elements of β:
7th element from Section 1
0th element from Section 2
13th element from Section 3
Map from Message u to β
Assume length of u, K = L log2 BFor ex. take L = 3 and log2 B = 4.
Split u into L subtrings of log2 B bits
Substrings give addresses of chosencolumns
For map : u → β,Choose non-zero elements of β in eachsection as given by these indices
Barron, Joseph (Yale) Least Squares, Superposition codes 8 / 22
Decoding, Least Squares
Least Square Decoder
Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form
Reliability
Want error β 6= β∗ to have small probability, when β∗ is sent
Small probability of Block error
Less stringent, want β to not equal to β∗ in at most α sections
Small probability that section error rate is greater than α
Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22
Decoding, Least Squares
Least Square Decoder
Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form
Reliability
Want error β 6= β∗ to have small probability, when β∗ is sent
Small probability of Block error
Less stringent, want β to not equal to β∗ in at most α sections
Small probability that section error rate is greater than α
Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22
Decoding, Least Squares
Least Square Decoder
Find β which minimizes ||Y − Xβ||2 over β’s of the assumed form
Reliability
Want error β 6= β∗ to have small probability, when β∗ is sent
Small probability of Block error
Less stringent, want β to not equal to β∗ in at most α sections
Small probability that section error rate is greater than α
Barron, Joseph (Yale) Least Squares, Superposition codes 9 / 22
Decoding, Least Squares
Least Squares is the optimal decoder. It minimizes probability of errorwith uniform distribution on input strings
Analysis here of performance of Least Squares, without concern forcomputational feasibility
A computationally feasible algorithm discussed
Today, Session S-Fr-3, 2:40-4:00 p.m.
Barron, Joseph (Yale) Least Squares, Superposition codes 10 / 22
Decoding, Least Squares
Least Squares is the optimal decoder. It minimizes probability of errorwith uniform distribution on input strings
Analysis here of performance of Least Squares, without concern forcomputational feasibility
A computationally feasible algorithm discussed
Today, Session S-Fr-3, 2:40-4:00 p.m.
Barron, Joseph (Yale) Least Squares, Superposition codes 10 / 22
Performance of Least Squares
Result 1: Section size
To achieve rates up to capacity, the section size B need only be apolynomial in L.
In particular, B = Lav , where av is a function of only v .
av is decreasing function of vav is near 1 for large v
Dictionary size N = L1+av
Barron, Joseph (Yale) Least Squares, Superposition codes 11 / 22
Plot of av
v
sect
ion
size
rat
e
5 15 25 35 45 55 65 75 85 95
12
34
56
av
Barron, Joseph (Yale) Least Squares, Superposition codes 12 / 22
Performance of Least Squares
Result 2: Error Exponent
Letε = Prob{# section mistakes > αL}
be the probability that more than α fraction of sections are wrong.Then,
ε ≤ exp{−n cv min(α, (C−R)2)}
where cv is a constant that depends on only v .
Barron, Joseph (Yale) Least Squares, Superposition codes 13 / 22
Performance of Least Squares
From Partially Correct to Completely Correct Decoding
Let R be a rate for which the partitioned superposition code has
Prob{# section mistakes > αL} = ε.
Then through composition with an outer Reed-Solomon code, oneobtains a code with Rate Rtot = (1− 2α)R and
Prob{block error} = ε
Corollary 3: Block error probability
Taking α of order (C−R)2 we have
Prob{block error} ≤ exp{−n cv (C − Rtot)2}Optimal form of exponent as in Shannon & Gallager or Polyanskiyet.al
Barron, Joseph (Yale) Least Squares, Superposition codes 14 / 22
Performance of Least Squares
From Partially Correct to Completely Correct Decoding
Let R be a rate for which the partitioned superposition code has
Prob{# section mistakes > αL} = ε.
Then through composition with an outer Reed-Solomon code, oneobtains a code with Rate Rtot = (1− 2α)R and
Prob{block error} = ε
Corollary 3: Block error probability
Taking α of order (C−R)2 we have
Prob{block error} ≤ exp{−n cv (C − Rtot)2}Optimal form of exponent as in Shannon & Gallager or Polyanskiyet.al
Barron, Joseph (Yale) Least Squares, Superposition codes 14 / 22
Performance of Least Squares
Comparison with PPV curve
Polyanskiy, Poor and Verdu demonstrate that for the Gaussianchannel the following approximate relation holds for an optimal code
R ≈ C −√
V
nQ−1(ε) +
1
2
log n
n
V = (v/2)(v + 2) log2 e/(v + 1)2 is the channel dispersionQ = 1− Φ, where Φ is Gaussian distribution function
Barron, Joseph (Yale) Least Squares, Superposition codes 15 / 22
Comparison with PPV curve
0 100 200 300 400 500
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Blocklength, n
Rat
e, R
bits
/ch.
use
CapacityPPV curveSuperposition code (Partitioned)SNR == 100Block error prob. =10−4
Barron, Joseph (Yale) Least Squares, Superposition codes 16 / 22
Superpositions for Multiple users vs our Single user channel
Cover introduced Superposition Codes for multi-user Gaussianchannels
Codeword sent is sum of codewords for respective users
Here we use superpositions for simplification of the single user channel
Barron, Joseph (Yale) Least Squares, Superposition codes 17 / 22
Relationship to Sparse Signal Recovery
Wainwright and others: Necessary and Sufficient conditions on n forsparsity recovery. When applied to our settings where N � L, thesegive that the n required is constant × L log(N/L)
It is natural to call (the reciprocal of) the best constant, for a givenset of allowed signals and given noise distribution, the compressedsensing capacity or signal recovery capacity.
For our setting n = (1/R)L log B so that the best constant is 1/C andthus the signal recovery capacity is equal to the Shannon capacity.
Barron, Joseph (Yale) Least Squares, Superposition codes 18 / 22
Error Probability Bound
Analysis
Least squares estimate β satisfies |Y − X β|2 ≤ |Y − Xβ∗|2
Error event of a fraction of α = `/L section mistakes, contained in
Eα = {|Y − Xβ|2 ≤ |Y − Xβ∗|2 for some β ∈Wrongα}
where Wrongα is the set of β differing from β∗ in fraction α sections.
Our bound:
Prob[Eα] ≤(
L
αL
)exp{−nDα}
Exponent Dα is sufficiently large to cancel the combinatorialcoefficient provided the B ≥ Lav and R < C .
Barron, Joseph (Yale) Least Squares, Superposition codes 19 / 22
Some ingredients of the error exponent
Analysis
Ingredients in Dα = D(∆α, ρ2α)
∆α = α(C − R) + (Cα − αC )
Cα = (1/2) log(1 + αv)
1−ρ2α = α(1−α)v/(1 + α2v)
D(∆, ρ2) is the cumulant generating function for (1/2)(Z 21 −Z 2
2 ) withZ1,Z2 bivariate normal, mean zero, unit variance and correlation ρ.
Near (1/2)∆2/(1−ρ2) for small ∆
Barron, Joseph (Yale) Least Squares, Superposition codes 20 / 22
Contributions to Error Exponent
020
4060
80
αα
erro
r ex
pone
nts
0 αα0 0.2 0.4 0.6 0.8 1
P(# mistakes >l0) = 1.8(10)−12
L = 100B = 8192P/σσ2 == 15C = 2R = 0.7Cn = 929N = 819200
Exponent of Prob(Eαα)
Barron, Joseph (Yale) Least Squares, Superposition codes 21 / 22
Summary
Sparse superposition coding is reliable at rates up to channel capacity
Error probability of the optimum form for R < C
Analysis blends modern statistical regression and information theory
Thank You
Barron, Joseph (Yale) Least Squares, Superposition codes 22 / 22