[ieee 2014 21st international conference on telecommunications (ict) - lisbon, portugal...

The average equivocation of random linear binary

codes in syndrome coding

Ke Zhang

Instituto de Telecomunicacoes

Dep. de Ciencia de Computadores

Universidade do Porto

{kezhang}@dcc.fc.up.pt

Martin Tomlinson

University of Plymouth

United Kingdom

{M.Tomlinson}@plymouth.ac.uk

Mohammed Zaki Ahmed

University of Plymouth

United Kingdom

{M.Ahmed}@plymouth.ac.uk

Abstract—This paper studies the security performance ofrandom codes in the syndrome coding scheme. We proposetheoretical analysis using a putative (n, k) code having the samedistribution of syndrome probabilities as the ensemble of all(n, k) random codes to calculate the average equivocation ofrandom codes, and compare the theoretical results with thesimulation results generated from Monte Carlo analysis whichshows that the theoretical method is precise. Moreover theanalysis works well for long codes having large values of n− k,which are not amenable to Monte Carlo analysis. We presentresults showing that the longer the length of the code the higherthe value of the average equivocation and the lower the value ofthe standard deviation of the equivocation. For code lengths inexcess of 150 bits or so almost any random code will producehigh levels of secrecy.

I. INTRODUCTION

The performance of random codes is one of the important

topics in information theory following Shannon’s random code

ensemble analysis in 1948. He showed that in an ensemble of

random codes there must be at least one code from the random

code ensemble that approaches channel capacity as the code

length approaches infinity [1]. Since then, most of the research

on random codes has focused on error correction coding.

Besides transmission reliability, physical layer security is

also very important in modern communications. The wiretap

model was first introduced by Wynner in 1975 [2] in which a

transmitter, Alice, wishes to send confidential information to

a legitimate receiver, Bob, in the presence of an eavesdropper,

Eve.There has been considerable research aimed at the design

of codes which achieve secrecy capacity for several specific

wiretap channels based on the structure of Wyner’s syndrome

coding scheme [3], [4]. As in the original paper [5] the main

channel is error-free and the eavesdropper channel is a binary

symmetric channel.

The security performance of random codes in the syndrome

coding scheme was studied by Csiszar and Korner for the

broadcast channel [6]. Cohen and Zemor showed that a random

choice of code in the syndrome coding scheme suffices to

guarantee secure communications by showing that the most

likely syndrome does not have too high probability of occur-

rence [7].To study the security performance of random codes

more completely, it is important to determine the secrecy level

achieved by random codes for specific values of code length

and information rate. This can be measured by the information

theoretic secrecy, the average equivocation achieved by the

random code ensemble. To date this has not been done and

giving results for this case is the main aim of this paper.

II. SYNDROME CODING SCHEME

Wyner introduced the wiretap channel in [2], where the main

channel is error-free and the eavesdropper channel is binary

symmetric with an error probability α. The secrecy capacity

of this wiretap channel [2] is:

Cs = 1−CBSC(α) = −α · log2 α− (1−α) · log2(1−α) (1)

which is the highest transmission rate that can be achieved

whilst keeping communications secure from eavesdropping for

this channel.

In this model, Alice wishes to convey a sequence of

independent and uniformly distributed m-bit messages to the

legitimate receiver, M(1), . . . ,M(p), where p is the block

length, which is encoded into a sequence of n-bit codewords

VT (1), . . . ,VT (p). Bob observes the sequence of n-bit words

VM (1), . . . ,VM (p) where

VM (i) = VT (i), i = 1, . . . , p (2)

and Eve observes the sequence of n-bit words

VE(1), . . . ,VE(p) where

VE(i) = VT (i) + e(i), i = 1, . . . , p (3)

where e(i) represents a n-bit error vector arising from the

binary symmetric channel.

The syndrome coding scheme uses a (n, k, 2t + 1) linear

block code capable of correcting t errors where m = n−k. It is

a property of any linear block code that there exist 2m distinct

n-bit minimum weight error patterns, in which each pattern

produces a distinct syndrome of the total 2m syndromes. These

minimum weight error patterns are the coset leaders of the

code and may be represented in a table of 2m syndromes, and

associated minimum weight error patterns. For perfect codes,

the error patterns in the syndrome table are all of the n-bit

2014 21st International Conference on Telecommunications (ICT)

978-1-4799-5141-3/14/$31.00 ©2014 IEEE 47

binary error vectors with a weight w ≤ t. For non-perfect

codes, some of the error patterns in the table are n-bit binary

vectors with a weight w ≤ t and the remaining error patterns

are n-bit binary vectors with a weight w > t. Regardless of

whether or not the code is perfect, in syndrome coding all

2m syndromes are used to send messages. In the traditional

syndrome coding, it is necessary to store an error pattern-

syndrome look up table, which can be accessed by Alice, Bob

and Eve. However, it is shown below that such a look up

table is unnecessary and the parity check matrix of the code

is sufficient.

The encoding and decoding processes are as follows.1) Encoder: Alice equates the m-bit messages to m-bit

syndromes ST (i) = M (i) and produces n-bit vectors VT (i)from each m-bit message M(i) at time i such that VT (i) ×H

T = M(i) in three steps:

• Alice generates a random n-bit codeword CT (i) from a

random, uniformly distributed k-bit vector DR (i):

CT (i) = DR (i)× G (4)

• Alice forms an n-bit zero padded message vector,V(i),consisting of the m-bit message M(i) followed by k 0’s.

• Alice then generates the transmitted n-bit vector VT (i)by adding the n-bit codeword CT (i) to the n-bit zero

padded message vector

VT (i) = CT (i) + V(i) (5)

In this way Alice produces n-bit vectors with the property that

VT (i)×HT = M(i).

Proof: The syndrome of VT (i), ST (i) satisfies

ST (i) = VT (i)×HT

= CT (i)×HT + V (i)×H

T

= 0 + M(i) (6)

V(i) × HT produces M(i), which is due to the structure of

V(i) and H.

The information rate of the syndrome coding scheme is R =mn

.2) Legitimate decoder: Bob uses the parity check matrix

of the code to determine the original messages. By computing

the syndrome associated with each received vector, which is

given by:

SM (i) = VM (i)×HT = VT (i)×H

T = ST (i) (7)

and then the original message is immediately found as follows:

M (i) = SM (i) = ST (i) (8)

3) Eavesdropper decoder: Eve – like Bob – uses the parity

check matrix of the code to estimate each original message by

computing the syndrome associated with each of her received

vectors as follows:

SE (i) = VE (i)×HT = (VT (i) + e(i))×H

T = ST (i)+Se(i)(9)

then the estimate of the message is given by:

M (i) = SE (i) = M (i) + Se (i) (10)

A. Random linear codes

We use (n, k) random linear codes in the syndrome coding

scheme. Each code may be defined by its m×n parity check

matrix H. Any column of a parity check matrix, H, having

m rows may be represented by an integer in the range 0 to

2n−k−1. For the systematic format of the m×n parity check

matrix H, the first m columns of H is a m×m identity matrix:

H =

⎛⎜⎜⎜⎝1 0 · · · 0 hm0 · · · h(n−1)0

0 1 · · · 0 hm1 · · · h(n−1)1

...... · · ·

...... hij

...

0 0 · · · 1 hm(m−1) · · · h(n−1)(m−1)

⎞⎟⎟⎟⎠

in which 0 ≤ j ≤ m−1, m ≤ i ≤ n−1 and hi,j takes a value

of 0 or 1. Each column may be represented as a packed integer

defined as bi =∑m−1

j=0 hij · 2j . Then the systematic format

of H for the (n, k) random linear code can be completely

represented by the following integer sequence:

[1, 2, 4, · · · , 2m−1, bm, · · · , bn−1] (11)

in which the first m integers denote the identity matrix and the

integers bm, · · · , bn−1 are generated randomly between 3 and

2m − 1 with the constraint that there are no repeated integers

in the sequence 1, 2, 4, · · · , 2m−1, bm, · · · , bn−1. This ensures

that the minimum Hamming distance of each randomly gen-

erated code is at least 3.

III. EQUIVOCATION ANALYSIS

In syndrome coding scheme, the secrecy of a specific code

is measured by the eavesdropper decoder output equivocation:

H(M(i)|M(i))

=H(M(i))−H(M(i)) +H(M(i)|M(i))

=H(M(i))−H(M(i) + Se(i)) +H(M(i) + Se(i)|M(i))

=H(Se(i)) (12)

=−∑Se(i)

p(Se(i))log2p(Se(i)) (13)

in which, there are 2m different syndromes, Se(i), in total and

p(Se(i)) is the probability mass function of the syndromes.

The simplifications in (12) are due to M(i) being uniformly

distributed and independent of Se(i). The equivocation is

calculated after deriving the probability mass function of the

syndromes due to errors from the BSC, p(Se(i)), and is a

function of the parity check matrix of the code.

To evaluate the average equivocation of the random linear

(n, k) codes, a direct method is to use Monte Carlo analysis

to generate a large number of (n, k) codes randomly, calculate

the equivocation for each code and average the evaluated

equivocation values. Monte Carlo analysis can also provide

statistics of random codes to help in the representation of the

random codes ensemble by a putative (n, k) code model which

can be used to evaluate the mean of the equivocation for (n, k)random codes.


48

A. Using Monte Carlo analysis to derive statistics of random

codes

Monte Carlo analysis measures the average equivocation

of a large number, |C|, of randomly chosen (n, k) codes as

follows:

E(H)

=1

|C|

∑j

Hj(M(i)|M(i)) (14)

=1

|C|

∑j

(−∑Se(i)

pj(Se(i)) · log2 pj(Se(i))) (15)

in which |C| is the number of the codes, pj(Se(i)) is the

probability of each syndrome Se(i) for the j-th code. The

Monte Carlo analysis also provides statistics of the distribution

of syndrome probabilities. Expanding (15):

E(H)

= −|Se(i)| ×1

|Se(i)| × |C|

∑Se(i)

∑j

pj(Se(i)) · log2 pj(Se(i))

(16)

= −|Se(i)| ×∑l

p(p(Se(i))(l))× p(Se(i))(l) · log2 p(Se(i))(l)

(17)

where l is the index of all p(Se(i)) which have∑l p(p(Se(i))(l)) ≈ 1. In (16), we collect all |Se(i)| × |C|

values of pj(Se(i)) for all 2m syndromes for all |C| random

codes to get the probability distribution, p(p(Se(i))l), of

p(Se(i))l and get (17), in which p(p(Se(i))l) is the probability

of p(Se(i))l. We can collect all 2m×|C| values of p(Se(i)) for

|C| random codes to get the probability mass function (pmf),

p(p(Se(i))), of p(Se(i)) in (17).

The Monte Carlo analysis works well for short random

codes, but for random codes with a large value of m (m ≥ 30)

this method will be impractical, because of the high computa-

tion complexity. However, (17) gives us an insight to evaluate

the average equivocation of (n, k) random codes based on the

probability mass function of p(Se(i)).

B. Theoretical analysis

The approach is to derive a putative (n, k) code which

has the same distribution of syndrome probabilities as the

ensemble of all (n, k) random codes from the Monte Carlo

analysis. It is shown below that the measured distribution

of syndrome probabilities may be modelled as a number of

Poisson distributions which may be used in turn to derive the

putative code. On this basis the putative code for other code

lengths n, and values of k, may be derived and used to predict

the average equivocation of randomly chosen (n, k) codes. The

equivocation of the (n, k) putative code is:

E(H) = −∑Se(i)

p(Se(i)) · log2 p(Se(i)) (18)

= −|Se(i)| ×∑l

p(p(Se(i))(l))× p(Se(i))(l) · log2 p(Se(i))(l)

(19)

in which, l is the index of all p(Se(i)) which have∑l p(p(Se(i))(l)) ≈ 1.

For a binary symmetric channel with an error probability,

α, 2m syndromes, Se(i), is contributed by 2n error patterns,

e(i), with a probability,

p(e(i)) = αw(e(i)) × (1− α)n−w(e(i)) (20)

in which w(e(i)) is the weight of e(i), based on the following

expression:

Se(i) = e(i)×HT (21)

We divide those 2n error patterns into different weight error

events: 0-error event, · · · , i-error events, · · · , n-error events,

0 ≤ i = w(e(i)) ≤ n. For each weight, all the error patterns

with that weight have the same probability of (20).

Each of the 2n error patterns has a syndrome value de-

termined by the parity check matrix of the code. As there

are only 2m distinct syndromes and it is a linear code, each

specific syndrome is produced by 2n−m distinct error patterns.

Accordingly the probability of each syndrome is determined

by the combined probability of 2n−m distinct error patterns.

The probability of an error pattern is an inverse power of its

weight. The zero syndrome, which is contributed by 0-error

event and 2− to−n error events, and single error syndromes,

which are contributed by 1− to−n error events, usually have

much higher probabilities than other syndromes, which are

only contributed by 2− to−n error events. For most choices

of code parameters n and k, the 2m syndromes can be divided

into 3 groups:

1) The syndromes, which are only contributed by 2−to−nerror events, are defined as normal syndromes, Se. There

are 2m−(n

1

)−1 normal syndromes, and we suppose all

of them have the same probability mass function (pmf)

of p(Se), p(p(Se)).2) The syndromes, which are contributed by 1 − to − n

error events, are denoted by S1e . There are

(n1

)S1e with

the same pmf of p(S1e ), p(p(S

1e )). Since the minimum

Hamming distance of the random code is greater than

or equal to 3,(n1

)1-error events map into distinct

syndromes, S1e , and contribute additional α×(1−α)n−1

probability on each syndrome. Therefore,

p(S1e )(l) = p(Se)(l) + α× (1− α)n−1 (22)

p(p(S1e )(l)) = p(p(Se)(l)) (23)

3) The zero syndrome, S0e , which is contributed by 0-error

event and 2−to−n error events. 0-error event contributes

additional (1− α)n on p(S0e ), so

p(S0e )(l) = p(Se)(l) + (1− α)n (24)

p(p(S0e )(l)) = p(p(Se)(l)) (25)

Now let’s analyse the pmf of p(Se) for the normal syn-

dromes. Since each code we consider has minimum Hamming

distance of 3 and there are a large number of error pattern with


49

the same weight, greater than or equal to 3, in comparison to

the number of syndromes, the probability of an error pattern

mapping into any one of the 2m syndromes is uniform.

The probability of the normal syndrome is contributed by

all i-error events and expressed as follows:

p(Se) =∑i

p(Se)i (26)

in which 2 ≤ i ≤ n, so the pmf of p(Se) can be calculated as

follows:

p(p(Se)) = p(∑i

p(Se)i)

= p(p(Se)2) ∗ · · · ∗ p(p(Se)i) ∗ · · · ∗ p(p(Se)n) (27)

in which p(Se)i denotes the additional probability contributed

by the corresponding i-error events, x ∗ y denotes the convo-

lution of x and y and 2 ≤ i ≤ n. Next we analyse how to

calculate p(Se)i.There are

(n

i

)error patterns each leading to an i-error event.

Each of these maps into one syndrome out of 2m syndromes.

The average number of the occurrences of an i-error events

mapping into each syndrome is λ =(n

i

)/2m.

As the code is linear, there are 2k error patterns that produce

the same syndrome. All error events are binomially distributed

and there are so many events that the Poisson distribution is

applicable to describe the number of times l that a particular

syndrome will arise from an i-error event as follows:

p(l) =λl · e−λ

l!(28)

in which e is the base of the natural logarithm (e = 2.71828)

and l! is the factorial of l.Each time an i-error event maps into a syndrome, the

additive probability of this is αi × (1 − α)n−i. Therefore

the probability mass function (pmf), p(p(Se)i), of the additive

probability, p(Se)i, of the syndrome from i-error events is as

follows:

p(p(Se)i(l)) = p(l) =λl · e−λ

l!(29)

in which

p(Se)i(l) = l · αi × (1− α)n−i (30)

When we have the pmf of p(Se)i for all i-error events,

as these are independent, they can be convolved together to

obtain the pmf of p(Se) based on (27). Applying (26) and (27)

in (22), (23), (24) and (25) and then using the results in (19):

E(H) = −∑l

p(p(S0e )(l))× p(S0

e )(l) · log2 p(S0e )(l)

−

(n

1

)·∑l

p(p(S1e )(l))× p(S1

e )(l) · log2 p(S1e )(l)

− (2m −

(n

1

)− 1) ·

∑l

p(p(Se)(l))× p(Se)(l) · log2 p(Se)(l)

C. Example

Considering random codes with parameters (100,85) with

(α = 0.05), we can compare the theoretical pmf of p(Se(i))with the results obtained from Monte Carlo analysis. For

(100,85) random codes, the 215 syndromes are divided into

3 groups as follows:

1) There are 215 − 101 normal syndromes with the same

pmf of p(Se), p(p(Se)).2) There are 100 S1

e , with the same pmf of p(S1e )(l) =

p(Se)(l) + 0.00031161, p(p(S1e )(l)) = p(p(Se)(l)).

3) The zero syndrome has a pmf of p(S0e )(l) = p(Se)(l)+

0.0059205, p(p(S0e )(l)) = p(p(Se)(l)).

Table I shows a statistics table of different error events

for (100,85) random codes when α = 0.05. It shows that

considering 0 to 14 error events has a cumulative probability

of 0.999864 and so error events of weight more than 14 have

a negligible contribution to the pmf.

Table II-ERROR EVENTS STATISTIC (α = 0.05)

i(n

i

)p(e)

(n

i

)

2k−1· p(e)

(n

i

)× p(e) total

0 1 5.92053E-3 1.8068E-7 0.005921 0.005921

1 100 3.11607E-4 9.5095E-7 0.031161 0.037081

2 4950 1.64004E-5 2.4775E-6 0.081182 0.118263

3 161700 8.63177E-7 4.2595E-6 0.139576 0.257839

4 3921225 4.54304E-8 5.4365E-6 0.178143 0.435981

5 75287520 2.39107E-9 5.4937E-6 0.180018 0.615999

6 1192052400 1.25846E-10 4.5781E-6 0.150015 0.766014

7 16007560800 6.62300E-12 3.2356E-6 0.106026 0.87204

8 1.86088E+11 3.49000E-13 1.9797E-6 0.064871 0.93691

9 1.90223E+12 1.80000E-14 1.0651E-6 0.034901 0.971812

10 1.73103E+13 9.65661E-16 5.1013E-7 0.016716 0.988528

11 1.41629E+14 5.08242E-17 2.1967E-7 0.007198 0.995726

12 1.050421E+15 2.67496E-18 8.5749E-8 0.002809834 0.998536

13 7.11054E+015 1.40788E-19 3.0550E-8 0.001001075 0.999537

14 4.41869E+15 7.40986E-21 9.9920E-9 3.274191E-04 0.999864

For i-error events, the Poisson distribution has a mean of

λ =(100i

)/215. Based on the above analysis, the pmf’s of

p(Se) for the normal syndromes and p(S1e ) are obtained.

We use the Kolmogorov-Smirnov test [10] to measure the

difference between the analysis and the Monte Carlo results

for both the normal syndromes and S1e . This shows that for the

cumulative distribution probability of p(S1e ), Dn = 0.023036,

and for the cumulative distribution probability of p(Se),Dn = 0.016887, Dn is the maximum difference between the

theoretical result and the Monte Carlo result vindicating the

theoretical approach.

Calculating the mean of the equivocation rate based on the

theoretical pmf of p(Se(i)) produces E(Re) = E(H)/m =0.98862. The average equivocation rate from Monte Carlo

analysis is 0.99048. The difference between the theoretical re-

sult and the simulation result is 0.00186, caused by neglecting

the error events with i > 14.

IV. SIMULATION RESULTS

Figure 1 shows the average equivocation rate and Reave±3σvs. code length for random codes with m = 20 (99% of

the codes have an equivocation rate between Reave − 3σ


50

and Reave + 3σ), in which σ is the standard deviation of

the equivocation rate. It is shown that the longer the code

the higher the average equivocation. Figure 2 shows that

as the code length increases, the standard deviation of the

equivocation rate decreases. For codes longer than 150 bits or

so any randomly chosen code achieves almost perfect secrecy.

σ

σ

Figure 1. The average equivocation rate vs. the code length for random codes(m = 20,sample number=10000, α = 0.05)

σ

Figure 2. The standard deviation of the average equivocation rate vs. thecode length for random codes (m = 20)

Figure 3 plots the average equivocation vs. the information

rate for random codes with different values of m. The differ-

ences between theoretical analysis and Monte Carlo analysis

for random codes with m = 20 are all less than 0.3%. For

random codes with m ≥ 30, Monte Carlo analysis requires

powerful computer resources and takes a long time, but the

codes are still amenable to theoretical analysis. It may be

observed that good results may be obtained with randomly

chosen codes and that it is not necessary to use a large number

of parity bits or excessive code length to attain almost perfect

secrecy.

Figure 3. The average equivocation rate vs. the information rate for randomcodes with different values of m

V. CONCLUSIONS

In this paper, we analysed the security performance of (n, k)random linear, binary codes in Wyner’s syndrome coding

scheme from the viewpoint of information theoretical security

as a function of code length and code rate. An important

contribution in this paper is the proposal of theoretical analysis

based on a putative (n, k) code having the same distribution of

syndrome probabilities as the ensemble of all (n, k) random

codes having the same parameters to calculate the average

equivocation of (n, k) random codes, and the comparison with

results obtained by Monte Carlo analysis. Specific examples

were given of results to show the accuracy of the theoretical

analysis. It was shown that random codes with large values of

m, the number of parity bits, may be analysed theoretically

even though Monte Carlo analysis is impractical. Possibly the

most important conclusion is that almost perfect secrecy may

be obtained for syndrome coding using a randomly chosen

code of length greater than 150 bits and reasonable code rate.

REFERENCES

[1] C. E. Shannon, ‘A mathematical theory of communication’, Bell Syst.

Tech. J., 1948, Vol. 27, pp. 379-423, 623-656.[2] A. D. Wyner, ‘The wire-tap channel’, Bell Syst. Tech. J., 1975, Vol. 54,

pp. 1355-1367.[3] A. Thangaraj, S. Dihidar, A. Calderbank, S. McLaughlin, and J. M.

Merolla, ‘Application of LDPC codes to that wiretap channel’, IEEE

Trans. Inform. Theory, August 2007, Vol. 53, pp. 2933-2945.[4] D. Liu, S. Lin, H. V. Poor, and P. Spasojevic, ‘Secure nested codes for type

II wire-tap channels’, IEEE Information Theory Workshop, September2007, Lake Tahoe, CA, USA, pp. 337-342.

[5] L. H. Ozarow and A. D. Wyner, ‘Wire-tap channel II’, Bell Syst. Tech.

J., 1984, Vol. 63, No. 10, pp. 2135-2157.[6] I. Csiszar and J. Korner, ‘Broadcast channels with confidential messages’,

IEEE Trans. Inform. Theory, May 1978, Vol. 24, pp. 339-348.[7] G. Cohen and G. Zemor, ‘Syndrome-coding for the wiretap channel re-

visited’, IEEE Information Theory Workshop(ITW2006), 2007, Chengdu,China, pp. 33-36.

[8] S. Lin and D. J. Costello, ‘Error control coding (second edition)’, Pearson

Education, 2004, pp. 100-231.[9] F. J. MacWilliams and N.J.A. Sloane, ‘The theory of error-correcting

codes’, Bell laboratories, 1981.[10] A. Justel, D. Pena and R. Zamar, ‘A multivariate Kolmogorov-Smirnov

test of goodness of fit’, Statistics and Probability Letters, 1997, Vol. 35,pp. 251-259.


51

[ieee 2014 21st international conference on telecommunications (ict) - lisbon, portugal...

Documents