[ieee 2014 21st international conference on telecommunications (ict) - lisbon, portugal...
TRANSCRIPT
The average equivocation of random linear binary
codes in syndrome coding
Ke Zhang
Instituto de Telecomunicacoes
Dep. de Ciencia de Computadores
Universidade do Porto
{kezhang}@dcc.fc.up.pt
Martin Tomlinson
University of Plymouth
United Kingdom
{M.Tomlinson}@plymouth.ac.uk
Mohammed Zaki Ahmed
University of Plymouth
United Kingdom
{M.Ahmed}@plymouth.ac.uk
Abstract—This paper studies the security performance ofrandom codes in the syndrome coding scheme. We proposetheoretical analysis using a putative (n, k) code having the samedistribution of syndrome probabilities as the ensemble of all(n, k) random codes to calculate the average equivocation ofrandom codes, and compare the theoretical results with thesimulation results generated from Monte Carlo analysis whichshows that the theoretical method is precise. Moreover theanalysis works well for long codes having large values of n− k,which are not amenable to Monte Carlo analysis. We presentresults showing that the longer the length of the code the higherthe value of the average equivocation and the lower the value ofthe standard deviation of the equivocation. For code lengths inexcess of 150 bits or so almost any random code will producehigh levels of secrecy.
I. INTRODUCTION
The performance of random codes is one of the important
topics in information theory following Shannon’s random code
ensemble analysis in 1948. He showed that in an ensemble of
random codes there must be at least one code from the random
code ensemble that approaches channel capacity as the code
length approaches infinity [1]. Since then, most of the research
on random codes has focused on error correction coding.
Besides transmission reliability, physical layer security is
also very important in modern communications. The wiretap
model was first introduced by Wynner in 1975 [2] in which a
transmitter, Alice, wishes to send confidential information to
a legitimate receiver, Bob, in the presence of an eavesdropper,
Eve.There has been considerable research aimed at the design
of codes which achieve secrecy capacity for several specific
wiretap channels based on the structure of Wyner’s syndrome
coding scheme [3], [4]. As in the original paper [5] the main
channel is error-free and the eavesdropper channel is a binary
symmetric channel.
The security performance of random codes in the syndrome
coding scheme was studied by Csiszar and Korner for the
broadcast channel [6]. Cohen and Zemor showed that a random
choice of code in the syndrome coding scheme suffices to
guarantee secure communications by showing that the most
likely syndrome does not have too high probability of occur-
rence [7].To study the security performance of random codes
more completely, it is important to determine the secrecy level
achieved by random codes for specific values of code length
and information rate. This can be measured by the information
theoretic secrecy, the average equivocation achieved by the
random code ensemble. To date this has not been done and
giving results for this case is the main aim of this paper.
II. SYNDROME CODING SCHEME
Wyner introduced the wiretap channel in [2], where the main
channel is error-free and the eavesdropper channel is binary
symmetric with an error probability α. The secrecy capacity
of this wiretap channel [2] is:
Cs = 1−CBSC(α) = −α · log2 α− (1−α) · log2(1−α) (1)
which is the highest transmission rate that can be achieved
whilst keeping communications secure from eavesdropping for
this channel.
In this model, Alice wishes to convey a sequence of
independent and uniformly distributed m-bit messages to the
legitimate receiver, M(1), . . . ,M(p), where p is the block
length, which is encoded into a sequence of n-bit codewords
VT (1), . . . ,VT (p). Bob observes the sequence of n-bit words
VM (1), . . . ,VM (p) where
VM (i) = VT (i), i = 1, . . . , p (2)
and Eve observes the sequence of n-bit words
VE(1), . . . ,VE(p) where
VE(i) = VT (i) + e(i), i = 1, . . . , p (3)
where e(i) represents a n-bit error vector arising from the
binary symmetric channel.
The syndrome coding scheme uses a (n, k, 2t + 1) linear
block code capable of correcting t errors where m = n−k. It is
a property of any linear block code that there exist 2m distinct
n-bit minimum weight error patterns, in which each pattern
produces a distinct syndrome of the total 2m syndromes. These
minimum weight error patterns are the coset leaders of the
code and may be represented in a table of 2m syndromes, and
associated minimum weight error patterns. For perfect codes,
the error patterns in the syndrome table are all of the n-bit
2014 21st International Conference on Telecommunications (ICT)
978-1-4799-5141-3/14/$31.00 ©2014 IEEE 47
binary error vectors with a weight w ≤ t. For non-perfect
codes, some of the error patterns in the table are n-bit binary
vectors with a weight w ≤ t and the remaining error patterns
are n-bit binary vectors with a weight w > t. Regardless of
whether or not the code is perfect, in syndrome coding all
2m syndromes are used to send messages. In the traditional
syndrome coding, it is necessary to store an error pattern-
syndrome look up table, which can be accessed by Alice, Bob
and Eve. However, it is shown below that such a look up
table is unnecessary and the parity check matrix of the code
is sufficient.
The encoding and decoding processes are as follows.1) Encoder: Alice equates the m-bit messages to m-bit
syndromes ST (i) = M (i) and produces n-bit vectors VT (i)from each m-bit message M(i) at time i such that VT (i) ×H
T = M(i) in three steps:
• Alice generates a random n-bit codeword CT (i) from a
random, uniformly distributed k-bit vector DR (i):
CT (i) = DR (i)× G (4)
• Alice forms an n-bit zero padded message vector,V(i),consisting of the m-bit message M(i) followed by k 0’s.
• Alice then generates the transmitted n-bit vector VT (i)by adding the n-bit codeword CT (i) to the n-bit zero
padded message vector
VT (i) = CT (i) + V(i) (5)
In this way Alice produces n-bit vectors with the property that
VT (i)×HT = M(i).
Proof: The syndrome of VT (i), ST (i) satisfies
ST (i) = VT (i)×HT
= CT (i)×HT + V (i)×H
T
= 0 + M(i) (6)
V(i) × HT produces M(i), which is due to the structure of
V(i) and H.
The information rate of the syndrome coding scheme is R =mn
.2) Legitimate decoder: Bob uses the parity check matrix
of the code to determine the original messages. By computing
the syndrome associated with each received vector, which is
given by:
SM (i) = VM (i)×HT = VT (i)×H
T = ST (i) (7)
and then the original message is immediately found as follows:
M (i) = SM (i) = ST (i) (8)
3) Eavesdropper decoder: Eve – like Bob – uses the parity
check matrix of the code to estimate each original message by
computing the syndrome associated with each of her received
vectors as follows:
SE (i) = VE (i)×HT = (VT (i) + e(i))×H
T = ST (i)+Se(i)(9)
then the estimate of the message is given by:
M (i) = SE (i) = M (i) + Se (i) (10)
A. Random linear codes
We use (n, k) random linear codes in the syndrome coding
scheme. Each code may be defined by its m×n parity check
matrix H. Any column of a parity check matrix, H, having
m rows may be represented by an integer in the range 0 to
2n−k−1. For the systematic format of the m×n parity check
matrix H, the first m columns of H is a m×m identity matrix:
H =
⎛⎜⎜⎜⎝1 0 · · · 0 hm0 · · · h(n−1)0
0 1 · · · 0 hm1 · · · h(n−1)1
...... · · ·
...... hij
...
0 0 · · · 1 hm(m−1) · · · h(n−1)(m−1)
⎞⎟⎟⎟⎠
in which 0 ≤ j ≤ m−1, m ≤ i ≤ n−1 and hi,j takes a value
of 0 or 1. Each column may be represented as a packed integer
defined as bi =∑m−1
j=0 hij · 2j . Then the systematic format
of H for the (n, k) random linear code can be completely
represented by the following integer sequence:
[1, 2, 4, · · · , 2m−1, bm, · · · , bn−1] (11)
in which the first m integers denote the identity matrix and the
integers bm, · · · , bn−1 are generated randomly between 3 and
2m − 1 with the constraint that there are no repeated integers
in the sequence 1, 2, 4, · · · , 2m−1, bm, · · · , bn−1. This ensures
that the minimum Hamming distance of each randomly gen-
erated code is at least 3.
III. EQUIVOCATION ANALYSIS
In syndrome coding scheme, the secrecy of a specific code
is measured by the eavesdropper decoder output equivocation:
H(M(i)|M(i))
=H(M(i))−H(M(i)) +H(M(i)|M(i))
=H(M(i))−H(M(i) + Se(i)) +H(M(i) + Se(i)|M(i))
=H(Se(i)) (12)
=−∑Se(i)
p(Se(i))log2p(Se(i)) (13)
in which, there are 2m different syndromes, Se(i), in total and
p(Se(i)) is the probability mass function of the syndromes.
The simplifications in (12) are due to M(i) being uniformly
distributed and independent of Se(i). The equivocation is
calculated after deriving the probability mass function of the
syndromes due to errors from the BSC, p(Se(i)), and is a
function of the parity check matrix of the code.
To evaluate the average equivocation of the random linear
(n, k) codes, a direct method is to use Monte Carlo analysis
to generate a large number of (n, k) codes randomly, calculate
the equivocation for each code and average the evaluated
equivocation values. Monte Carlo analysis can also provide
statistics of random codes to help in the representation of the
random codes ensemble by a putative (n, k) code model which
can be used to evaluate the mean of the equivocation for (n, k)random codes.
2014 21st International Conference on Telecommunications (ICT)
48
A. Using Monte Carlo analysis to derive statistics of random
codes
Monte Carlo analysis measures the average equivocation
of a large number, |C|, of randomly chosen (n, k) codes as
follows:
E(H)
=1
|C|
∑j
Hj(M(i)|M(i)) (14)
=1
|C|
∑j
(−∑Se(i)
pj(Se(i)) · log2 pj(Se(i))) (15)
in which |C| is the number of the codes, pj(Se(i)) is the
probability of each syndrome Se(i) for the j-th code. The
Monte Carlo analysis also provides statistics of the distribution
of syndrome probabilities. Expanding (15):
E(H)
= −|Se(i)| ×1
|Se(i)| × |C|
∑Se(i)
∑j
pj(Se(i)) · log2 pj(Se(i))
(16)
= −|Se(i)| ×∑l
p(p(Se(i))(l))× p(Se(i))(l) · log2 p(Se(i))(l)
(17)
where l is the index of all p(Se(i)) which have∑l p(p(Se(i))(l)) ≈ 1. In (16), we collect all |Se(i)| × |C|
values of pj(Se(i)) for all 2m syndromes for all |C| random
codes to get the probability distribution, p(p(Se(i))l), of
p(Se(i))l and get (17), in which p(p(Se(i))l) is the probability
of p(Se(i))l. We can collect all 2m×|C| values of p(Se(i)) for
|C| random codes to get the probability mass function (pmf),
p(p(Se(i))), of p(Se(i)) in (17).
The Monte Carlo analysis works well for short random
codes, but for random codes with a large value of m (m ≥ 30)
this method will be impractical, because of the high computa-
tion complexity. However, (17) gives us an insight to evaluate
the average equivocation of (n, k) random codes based on the
probability mass function of p(Se(i)).
B. Theoretical analysis
The approach is to derive a putative (n, k) code which
has the same distribution of syndrome probabilities as the
ensemble of all (n, k) random codes from the Monte Carlo
analysis. It is shown below that the measured distribution
of syndrome probabilities may be modelled as a number of
Poisson distributions which may be used in turn to derive the
putative code. On this basis the putative code for other code
lengths n, and values of k, may be derived and used to predict
the average equivocation of randomly chosen (n, k) codes. The
equivocation of the (n, k) putative code is:
E(H) = −∑Se(i)
p(Se(i)) · log2 p(Se(i)) (18)
= −|Se(i)| ×∑l
p(p(Se(i))(l))× p(Se(i))(l) · log2 p(Se(i))(l)
(19)
in which, l is the index of all p(Se(i)) which have∑l p(p(Se(i))(l)) ≈ 1.
For a binary symmetric channel with an error probability,
α, 2m syndromes, Se(i), is contributed by 2n error patterns,
e(i), with a probability,
p(e(i)) = αw(e(i)) × (1− α)n−w(e(i)) (20)
in which w(e(i)) is the weight of e(i), based on the following
expression:
Se(i) = e(i)×HT (21)
We divide those 2n error patterns into different weight error
events: 0-error event, · · · , i-error events, · · · , n-error events,
0 ≤ i = w(e(i)) ≤ n. For each weight, all the error patterns
with that weight have the same probability of (20).
Each of the 2n error patterns has a syndrome value de-
termined by the parity check matrix of the code. As there
are only 2m distinct syndromes and it is a linear code, each
specific syndrome is produced by 2n−m distinct error patterns.
Accordingly the probability of each syndrome is determined
by the combined probability of 2n−m distinct error patterns.
The probability of an error pattern is an inverse power of its
weight. The zero syndrome, which is contributed by 0-error
event and 2− to−n error events, and single error syndromes,
which are contributed by 1− to−n error events, usually have
much higher probabilities than other syndromes, which are
only contributed by 2− to−n error events. For most choices
of code parameters n and k, the 2m syndromes can be divided
into 3 groups:
1) The syndromes, which are only contributed by 2−to−nerror events, are defined as normal syndromes, Se. There
are 2m−(n
1
)−1 normal syndromes, and we suppose all
of them have the same probability mass function (pmf)
of p(Se), p(p(Se)).2) The syndromes, which are contributed by 1 − to − n
error events, are denoted by S1e . There are
(n1
)S1e with
the same pmf of p(S1e ), p(p(S
1e )). Since the minimum
Hamming distance of the random code is greater than
or equal to 3,(n1
)1-error events map into distinct
syndromes, S1e , and contribute additional α×(1−α)n−1
probability on each syndrome. Therefore,
p(S1e )(l) = p(Se)(l) + α× (1− α)n−1 (22)
p(p(S1e )(l)) = p(p(Se)(l)) (23)
3) The zero syndrome, S0e , which is contributed by 0-error
event and 2−to−n error events. 0-error event contributes
additional (1− α)n on p(S0e ), so
p(S0e )(l) = p(Se)(l) + (1− α)n (24)
p(p(S0e )(l)) = p(p(Se)(l)) (25)
Now let’s analyse the pmf of p(Se) for the normal syn-
dromes. Since each code we consider has minimum Hamming
distance of 3 and there are a large number of error pattern with
2014 21st International Conference on Telecommunications (ICT)
49
the same weight, greater than or equal to 3, in comparison to
the number of syndromes, the probability of an error pattern
mapping into any one of the 2m syndromes is uniform.
The probability of the normal syndrome is contributed by
all i-error events and expressed as follows:
p(Se) =∑i
p(Se)i (26)
in which 2 ≤ i ≤ n, so the pmf of p(Se) can be calculated as
follows:
p(p(Se)) = p(∑i
p(Se)i)
= p(p(Se)2) ∗ · · · ∗ p(p(Se)i) ∗ · · · ∗ p(p(Se)n) (27)
in which p(Se)i denotes the additional probability contributed
by the corresponding i-error events, x ∗ y denotes the convo-
lution of x and y and 2 ≤ i ≤ n. Next we analyse how to
calculate p(Se)i.There are
(n
i
)error patterns each leading to an i-error event.
Each of these maps into one syndrome out of 2m syndromes.
The average number of the occurrences of an i-error events
mapping into each syndrome is λ =(n
i
)/2m.
As the code is linear, there are 2k error patterns that produce
the same syndrome. All error events are binomially distributed
and there are so many events that the Poisson distribution is
applicable to describe the number of times l that a particular
syndrome will arise from an i-error event as follows:
p(l) =λl · e−λ
l!(28)
in which e is the base of the natural logarithm (e = 2.71828)
and l! is the factorial of l.Each time an i-error event maps into a syndrome, the
additive probability of this is αi × (1 − α)n−i. Therefore
the probability mass function (pmf), p(p(Se)i), of the additive
probability, p(Se)i, of the syndrome from i-error events is as
follows:
p(p(Se)i(l)) = p(l) =λl · e−λ
l!(29)
in which
p(Se)i(l) = l · αi × (1− α)n−i (30)
When we have the pmf of p(Se)i for all i-error events,
as these are independent, they can be convolved together to
obtain the pmf of p(Se) based on (27). Applying (26) and (27)
in (22), (23), (24) and (25) and then using the results in (19):
E(H) = −∑l
p(p(S0e )(l))× p(S0
e )(l) · log2 p(S0e )(l)
−
(n
1
)·∑l
p(p(S1e )(l))× p(S1
e )(l) · log2 p(S1e )(l)
− (2m −
(n
1
)− 1) ·
∑l
p(p(Se)(l))× p(Se)(l) · log2 p(Se)(l)
C. Example
Considering random codes with parameters (100,85) with
(α = 0.05), we can compare the theoretical pmf of p(Se(i))with the results obtained from Monte Carlo analysis. For
(100,85) random codes, the 215 syndromes are divided into
3 groups as follows:
1) There are 215 − 101 normal syndromes with the same
pmf of p(Se), p(p(Se)).2) There are 100 S1
e , with the same pmf of p(S1e )(l) =
p(Se)(l) + 0.00031161, p(p(S1e )(l)) = p(p(Se)(l)).
3) The zero syndrome has a pmf of p(S0e )(l) = p(Se)(l)+
0.0059205, p(p(S0e )(l)) = p(p(Se)(l)).
Table I shows a statistics table of different error events
for (100,85) random codes when α = 0.05. It shows that
considering 0 to 14 error events has a cumulative probability
of 0.999864 and so error events of weight more than 14 have
a negligible contribution to the pmf.
Table II-ERROR EVENTS STATISTIC (α = 0.05)
i(n
i
)p(e)
(n
i
)
2k−1· p(e)
(n
i
)× p(e) total
0 1 5.92053E-3 1.8068E-7 0.005921 0.005921
1 100 3.11607E-4 9.5095E-7 0.031161 0.037081
2 4950 1.64004E-5 2.4775E-6 0.081182 0.118263
3 161700 8.63177E-7 4.2595E-6 0.139576 0.257839
4 3921225 4.54304E-8 5.4365E-6 0.178143 0.435981
5 75287520 2.39107E-9 5.4937E-6 0.180018 0.615999
6 1192052400 1.25846E-10 4.5781E-6 0.150015 0.766014
7 16007560800 6.62300E-12 3.2356E-6 0.106026 0.87204
8 1.86088E+11 3.49000E-13 1.9797E-6 0.064871 0.93691
9 1.90223E+12 1.80000E-14 1.0651E-6 0.034901 0.971812
10 1.73103E+13 9.65661E-16 5.1013E-7 0.016716 0.988528
11 1.41629E+14 5.08242E-17 2.1967E-7 0.007198 0.995726
12 1.050421E+15 2.67496E-18 8.5749E-8 0.002809834 0.998536
13 7.11054E+015 1.40788E-19 3.0550E-8 0.001001075 0.999537
14 4.41869E+15 7.40986E-21 9.9920E-9 3.274191E-04 0.999864
For i-error events, the Poisson distribution has a mean of
λ =(100i
)/215. Based on the above analysis, the pmf’s of
p(Se) for the normal syndromes and p(S1e ) are obtained.
We use the Kolmogorov-Smirnov test [10] to measure the
difference between the analysis and the Monte Carlo results
for both the normal syndromes and S1e . This shows that for the
cumulative distribution probability of p(S1e ), Dn = 0.023036,
and for the cumulative distribution probability of p(Se),Dn = 0.016887, Dn is the maximum difference between the
theoretical result and the Monte Carlo result vindicating the
theoretical approach.
Calculating the mean of the equivocation rate based on the
theoretical pmf of p(Se(i)) produces E(Re) = E(H)/m =0.98862. The average equivocation rate from Monte Carlo
analysis is 0.99048. The difference between the theoretical re-
sult and the simulation result is 0.00186, caused by neglecting
the error events with i > 14.
IV. SIMULATION RESULTS
Figure 1 shows the average equivocation rate and Reave±3σvs. code length for random codes with m = 20 (99% of
the codes have an equivocation rate between Reave − 3σ
2014 21st International Conference on Telecommunications (ICT)
50
and Reave + 3σ), in which σ is the standard deviation of
the equivocation rate. It is shown that the longer the code
the higher the average equivocation. Figure 2 shows that
as the code length increases, the standard deviation of the
equivocation rate decreases. For codes longer than 150 bits or
so any randomly chosen code achieves almost perfect secrecy.
σ
σ
Figure 1. The average equivocation rate vs. the code length for random codes(m = 20,sample number=10000, α = 0.05)
σ
Figure 2. The standard deviation of the average equivocation rate vs. thecode length for random codes (m = 20)
Figure 3 plots the average equivocation vs. the information
rate for random codes with different values of m. The differ-
ences between theoretical analysis and Monte Carlo analysis
for random codes with m = 20 are all less than 0.3%. For
random codes with m ≥ 30, Monte Carlo analysis requires
powerful computer resources and takes a long time, but the
codes are still amenable to theoretical analysis. It may be
observed that good results may be obtained with randomly
chosen codes and that it is not necessary to use a large number
of parity bits or excessive code length to attain almost perfect
secrecy.
Figure 3. The average equivocation rate vs. the information rate for randomcodes with different values of m
V. CONCLUSIONS
In this paper, we analysed the security performance of (n, k)random linear, binary codes in Wyner’s syndrome coding
scheme from the viewpoint of information theoretical security
as a function of code length and code rate. An important
contribution in this paper is the proposal of theoretical analysis
based on a putative (n, k) code having the same distribution of
syndrome probabilities as the ensemble of all (n, k) random
codes having the same parameters to calculate the average
equivocation of (n, k) random codes, and the comparison with
results obtained by Monte Carlo analysis. Specific examples
were given of results to show the accuracy of the theoretical
analysis. It was shown that random codes with large values of
m, the number of parity bits, may be analysed theoretically
even though Monte Carlo analysis is impractical. Possibly the
most important conclusion is that almost perfect secrecy may
be obtained for syndrome coding using a randomly chosen
code of length greater than 150 bits and reasonable code rate.
REFERENCES
[1] C. E. Shannon, ‘A mathematical theory of communication’, Bell Syst.
Tech. J., 1948, Vol. 27, pp. 379-423, 623-656.[2] A. D. Wyner, ‘The wire-tap channel’, Bell Syst. Tech. J., 1975, Vol. 54,
pp. 1355-1367.[3] A. Thangaraj, S. Dihidar, A. Calderbank, S. McLaughlin, and J. M.
Merolla, ‘Application of LDPC codes to that wiretap channel’, IEEE
Trans. Inform. Theory, August 2007, Vol. 53, pp. 2933-2945.[4] D. Liu, S. Lin, H. V. Poor, and P. Spasojevic, ‘Secure nested codes for type
II wire-tap channels’, IEEE Information Theory Workshop, September2007, Lake Tahoe, CA, USA, pp. 337-342.
[5] L. H. Ozarow and A. D. Wyner, ‘Wire-tap channel II’, Bell Syst. Tech.
J., 1984, Vol. 63, No. 10, pp. 2135-2157.[6] I. Csiszar and J. Korner, ‘Broadcast channels with confidential messages’,
IEEE Trans. Inform. Theory, May 1978, Vol. 24, pp. 339-348.[7] G. Cohen and G. Zemor, ‘Syndrome-coding for the wiretap channel re-
visited’, IEEE Information Theory Workshop(ITW2006), 2007, Chengdu,China, pp. 33-36.
[8] S. Lin and D. J. Costello, ‘Error control coding (second edition)’, Pearson
Education, 2004, pp. 100-231.[9] F. J. MacWilliams and N.J.A. Sloane, ‘The theory of error-correcting
codes’, Bell laboratories, 1981.[10] A. Justel, D. Pena and R. Zamar, ‘A multivariate Kolmogorov-Smirnov
test of goodness of fit’, Statistics and Probability Letters, 1997, Vol. 35,pp. 251-259.
2014 21st International Conference on Telecommunications (ICT)
51