informaton theory project final

8
(Including the Optional task) STUDENT ID 280668 336347 339043 335869

Upload: hasib1713

Post on 19-Jul-2016

11 views

Category:

Documents


5 download

DESCRIPTION

information theory project MSc

TRANSCRIPT

Page 1: Informaton Theory Project FINAL

(Including the Optional task)

STUDENT ID

• 280668 • 336347 • 339043 • 335869

Page 2: Informaton Theory Project FINAL

1. Overview

2. Methodology

3. Implementation and Evaluation

3.1 Case 1: Main Task 3.2 Case 2: Optional task

4. Conclusion

5. Reference

Page 3: Informaton Theory Project FINAL

1. Overview

This project is free information transmission. A coding method has been developed to maximise the rate of information to be transmitted. Part A will call part B. Part B is passive. Part A can call and interrupt the call. Signalling of the call takes 1 second.

2. Methodology

The alphabets in the English language will be coded with bits and transmitted. Alphabets alone are sufficient enough to create a message and therefore only lower-case alphabets will be transmitted. No other characters will be transmitted. The relative frequencies of the alphabets appearing in the language are in table 1 [1]. The relative frequencies of the alphabets can also been seen from the graph as shown in Figure 1. The average word length in the English language is 5.1 alphabets [2]. Huffman coding is known to be optimal and therefore a ternary Huffman coding scheme has been used to determine the codewords for each alphabet. Below shows the time duration (Ti) of the transmitted bits.

• 0 – 1 second • 1 – 2 seconds • 2 – 3 seconds

There are 2 slightly different schemes that are used to calculate the throughput. One considered the interruption of the call to last a uniform distribution between n and n+0.1s while the other considers the interruption to last a normal distribution between n and n+0.1s. The next bit will start transmitting immediately after the previous one has been interrupted. Once there is a pause means the message have been transmitted. This should be considered intuitive. An example will be shown later in the report.

Alphabet Relative Frequency (%)

Codeword Alphabet Relative Frequency (%)

Codeword

A 8.167 001 N 6.749 012 B 1.492 1200 O 7.507 002 C 2.782 121 P 1.929 1002 D 4.253 101 Q 0.095 120111 E 12.702 11 R 5.987 022 F 2.228 0102 S 6.327 020 G 2.015 1000 T 9.056 000 H 6.094 021 U 2.758 122 I 6.966 011 V 0.978 1202 J 0.153 12012 W 2.360 0101 K 0.772 12010 X 0.150 120110 L 4.025 102 Y 1.974 1001 M 2.406 0100 Z 0.074 120112

Table 1: Code-words for alphabets generated from Matlab script

Page 4: Informaton Theory Project FINAL

In general it can be seen from table 1 that the higher the relative frequency, the shorter the length of the code-word.

Figure 1: Alpahbet probabilities

3. Implementation:

A MatLab script was written to for the Huffman coding to determine the code-words for each alphabet based on their relative frequencies as seen from table 1. Below is the MatLab script.

0 5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Prob

abitli

es

alphabets (1-a,2-b...26-z)

Page 5: Informaton Theory Project FINAL

Based on MatLab calculations, we have the following:

• Average code length = 3.0549 • Entropy = 2.6346 • Efficiency = Entropy/Average code length = 86.24%

The weighted bit probabilities are calculate from values in table 1 and shown in table 2

Bit 0 1 2 Probability 0.4636 0.3398 0.1966

Table 2:Weighted bit probability

• Time to transmit – Tword • Word Length – W • Code Length - C • Average word length – Wavg

• Average code length – Cavg • Weighted probabilities – Pi (i = 0, 1 or 2) • E[h] = 0.05 (this is for the uniform distribution of length 0.1)

3.1 Case 1 (Main Task):

(Uniform distribution for the interruption of call)

Tword = (Wavg x Cavg x 1) + Wavg x Cavg (∑PiTi) + (Wavg x Cavg x E[h])

Tword = (5.1 x 3.0549 x 1) + (5.1 x 3.0549)(0.4636 x 1+ 0.3398 x 2 + 0.1966 x 3) + (5.1 x 3.0549)(0.05)

= 37.78 seconds

For example if there are 10 bits to be transmitted, the signalling time will be 10 seconds in total since after every bit there is a hang-up. The part in the first parenthesis is to describe the total number of bits transmitted and the multiplication of 1 is to show the signalling time is 1 second. The equation in the last parenthesis describes the total time taken for the interruption of the call. We use the expectation in our calculation.

If example the word “you” is transmitted, the word length and code lengths are known but the interruption of the call is still unknown.

Tword = ((∑C) x 1) +(∑C) (∑PiTi) + (∑C)(E[h])

Tword = (4 +3 + 3) + (4 +3 + 3)(0.4636 x 1+ 0.3398 x 2 + 0.1966 x 3) + (4 +3 + 3)(0.05)

= 27.83 seconds

Page 6: Informaton Theory Project FINAL

It can be seen here that the word “you” is only 3 alphabets and is lesser than the average English word length and the time taken to transmit is shorter too.

3.2 Case-2: (Optional Task):

Normal distribution for the interruption of call

Variance = 0.1, standard deviation = 0.316, E[Normal] = 0.05, x = delay time for interruption

When we consider the case when the interruption of the call is normal distribution, the equation will be as follows:

Tword for average case is same as above because the E[normal]i s also 0.05 seconds.

Using the example of transmitting “he”, there are 5 bits and i = 1 to 5. Assume the delay to be 0.01, 0.02, 0.05, 0.08, 0.09 seconds.

Tword = ((∑C) x 1) +(∑C) (∑PiTi) + ∑ 1𝜎√2

𝑒−0.5((𝑥𝑖−𝑢)/2)^2)

= (5 x 1) + (5 x 1.733) + 11.18706

= 24.85 seconds

A MatLab script below was written for the Huffman coding and decoding.

Page 7: Informaton Theory Project FINAL

Bit rate (average) = Length of word x (Length of code word / Time to transmit)

Bit rate =(∑C) / Time to transmit

Considering the average case of 5.1 alphabets (case 1),

Bit rate = 0.4124 bits/s

Considering the case of transmitting “you” (case 1),

Bit rate = 0.36 bits/s

Considering the case of transmitting “he” (case 2),

Bit rate = 0.2 bits/s

4. Conclusion

The Shannon’s information theory tells us that there are two ways to increase the information rate. One is to do date compression by source coding to decrease the code length until we approach the entropy, so we can use less time to transmit more information. The other is to do the channel coding to increase the transmission rate to reach the Shannon capacity. In this scheme, since we are talking about free transmission, we did not consider channel coding to correct errors and we assume there is no noise for simplicity. For example, if A transmits 0 to B, we assume B receives 0 without errors.

We can use some source coding methods to solve this problem. First we considered the ternary scheme, so we calculated the probability of differentiate symbols appearance. Here we get the 0,1,2’s probability of appearance. In this scheme, we can only use different time to distinguish different symbols, so we use less time to represent more frequent symbols. For example, we used 1 seconds to represent 0 because its probability is the largest, which is 0.4636(Table 2) and use 2 seconds to represent 1(0.3398), 3 seconds for 2(0.1966).

Then, according to the probability of different English letters appear in the text, we used Huffman coding and decoding to code the English alphabet. We used matlab to achieve this. Another important thing is we have to consider the delay time for interruption. Here we used expectation time to represent delay for uniformly distribution and normal distributions. And we have to assume receiver can recognise the delay time.

We considered in the receiver B , the minimum time humans ear can recognise is 1 seconds. So we use 2 .05 seconds(1(signalling)+1(represent 0)+0.05(interruption delay)) to represent 0 .Actually if the receiver can recognise 0.1s,we can use signalling time plus interruption delay,1.05 seconds to represent 0 so we decreased the time for transmission delay. Unfortunately, we assume this is impossible in our scheme.

Page 8: Informaton Theory Project FINAL

In the end, we calculated the average time for transmit the word and the related transmission rate in our scheme.

5. Reference

[1] http://en.wikipedia.org/wiki/Letter_frequency

[2] http://www.puchu.net/doc/Average_Word_Length