ec6402 notes

100
EC6402 COMMUNICATION THEORY FRANCIS XAVIER ENGINEERING COLLEGE www.francisxavier.ac.in Department of ECE-FXEC

Upload: jeya

Post on 13-Apr-2016

10 views

Category:

Documents


1 download

DESCRIPTION

cdfhdfhethethdfbdfbdfbh

TRANSCRIPT

Page 1: Ec6402 Notes

EC6402

COMMUNICATION THEORY

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 2: Ec6402 Notes

UNIT 1

AMPLITUDE MODULATION

Review of spectral characteristics of periodic and non-periodic signals.

Generation and demodulation of AM signal.

Generation and demodulation of DSBSC signal.

Generation and demodulation of SSB signal.

Generation and demodulation of VSB signal.

Comparison of amplitude modulation systems.

Frequency translation.

FDM.

Non-linear distortion.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 3: Ec6402 Notes

Introduction:

In electronics, a signal is an electric current or electromagnetic field used to convey data from one

place to another. The simplest form of signal is a direct current (DC) that is switched on and off; this is the

principle by which the early telegraph worked. More complex signals consist of an alternating-current (AC)

or electromagnetic carrier that contains one or more data streams.

Modulation:

Modulation is the addition of information (or the signal) to an electronic or

optical signal carrier. Modulation can be applied to direct current (mainly by turning it on

and off), to alternating current, and to optical signals. One can think of blanket waving as a

form of modulation used in smoke signal transmission (the carrier being a steady stream of

smoke). Morse code, invented for telegraphy and still used in amateur radio, uses

a binary (two-state)digital code similar to the code used by modern computers. For most of

radio and telecommunication today, the carrier is alternating current (AC) in a given range

of frequencies. Common modulation methods include:

Amplitude modulation (AM), in which the voltage applied to the carrier is varied

over time

Frequency modulation (FM), in which the frequency of the carrier waveform is

varied in small but meaningful amounts

Phase modulation (PM), in which the natural flow of the alternating current

waveform is delayed temporarily

Classification of Signals:

Some important classifications of signals

Analog vs. Digital signals: as stated in the previous lecture, a signal with a

magnitude that may take any real value in a specific range is called an analog signal

while a signal with amplitude that takes only a finite number of values is called a

digital signal.

Continuous-time vs. discrete-time signals: continuous-time signals may be analog

or digital signals such that their magnitudes are defined for all values of t, while

discrete-time signal are analog or digital signals with magnitudes that are defined at

specific instants of time only and are undefined for other time instants.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 4: Ec6402 Notes

| f (t ) |2

dt

T T / 2

lim 1

| f (t ) |2

dt

T1

Periodic vs. aperiodic signals: periodic signals are those that are constructed from a

specific shape that repeats regularly after a specific amount of time T0, [i.e., a

periodic signal f(t) with period T0 satisfies f(t) = f(t+nT0) for all integer values of

n], while aperiodic signals do not repeat regularly.

Deterministic vs. probabilistic signals: deterministic signals are those that can be

computed beforehand at any instant of time while a probabilistic signal is one that

is random and cannot be determined beforehand.

Energy vs. Power signals: as described below.

Energy and Power Signals

The total energy contained in and average power provided by a signal f(t) (which

is a function of time) are defined as

E f

,

and

T / 2

Pf , T

respectively.

For periodic signals, the power P can be computed using a simpler form based on

the periodicity of the signal as

PPeriodic f

t 0

| f (t ) |2

dt , T

t 0

where T here is the period of the signal and t0 is an arbitrary time instant that is chosen to

simply the computation of the integration (to reduce the functions you have to integrate

over one period).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 5: Ec6402 Notes

Classification of Signals into Power and Energy Signals

t ), 3sin(2 t

Most signals can be classified into Energy signals or Power signals. A signal is classified

into an energy or a power signal according to the following criteria

a) Energy Signals: an energy signal is a signal with finite energy and zero

average power (0 ≤ E < , P = 0),

b) Power Signals: a power signal is a signal with infinite energy but finite

average power (0 < P < , E ).

Comments:

1. The square root of the average power P of a power signal is what is

usually defined as the RMS value of that signal.

2. Your book says that if a signal approaches zero as t approaches then the

signal is an energy signal. This is in most cases true but not always as you

can verify in part (d) in the following example.

3. All periodic signals are power signals (but not all non–periodic signals are

energy signals).

4. Any signal f that has limited amplitude (| f | < ) and is time limited

(f = 0 for | t | > t0 for some t0 > 0) is an energy signal as in part (g) in

the following example.

Exercise 1: determine if the following signals are Energy signals, Power signals, or

neither, and evaluate E and P for each signal (see examples 2.1 and 2.2 on

pages 17 and 18 of your textbook for help).

a) a(t ) ,

This is a periodic signal, so it must be a power signal. Let us prove it.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 6: Ec6402 Notes

t ) |2

dt

t ) dt

t )dt

| a(t ) |2

dt | 3sin(2

9 1

1 cos(4

9 1

dt 9 cos(4

J

2

2

t ) |2

dt

t ) dt

t )dt

t )

1 | a(t ) |

2 dt | 3sin(2

9 1

1 cos(4

9 1

dt 9 cos(4

1

9 9 sin (4

4 0

9 W

2

2

2|t | 5e , t

1 1

1 0

0 1

0

0 1

0

0

2

2

E a

Notice that the evaluation of the last line in the above equation is infinite

because of the first term. The second term has a value between –2 to 2 so it

has no effect in the overall value of the energy.

Since a(t) is periodic with period T = 2 /2 = 1 second, we get

P

a

So, the energy of that signal is infinite and its average power is finite (9/2).

This means that it is a power signal as expected. Notice that the average

power of this signal is as expected (square of the amplitude divided by 2)

b) b (t ) ,

Let us first find the total energy of the signal.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 7: Ec6402 Notes

2|t | 2

4t

4t

| b (t ) |2

dt 5e dt

25 e 4t

dt 25 e dt

e 4t 0 25

e 25

4 0

25 25 50 J

2|t | 2

T T T T / 2 T / 2

4t

T T T T / 2

4t T / 2

T / 2 T 4 T T

2T 25 1 2T

T 4 T T

lim 1

| b (t ) |2

dt lim 1

5e

25 lim 1

e 4t

dt 25 lim 1

e

e 4t 0 25

lim 1

e 25

lim 1

0

1 e lim e 1 25

lim 1

0 0 0

dt

dt

3t 4e , | 5

) 5 0, |

1 1 , t

) t

0, t 1

E b

0

0

4

4 4 4

The average power of the signal is

T / 2

Pb T

T / 2

0 T / 2

T

0

4 T

4 T

So, the signal b(t) is definitely an energy signal.

So, the energy of that signal is infinite and its average power is finite (9/2). This means that

it is a power signal as expected. Notice that the average power of this signal is as expected

(the square of the amplitude divided by 2)

t | c) c (t ,

t |

d) d ( t ,

Let us first find the total energy of the signal.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 8: Ec6402 Notes

| d (t ) |2

dt 1

dt

ln t

0 J

t

T T T t T / 2

T 1 T T

T 2 T T

lim 1

| d (t ) |2

dt lim 1 1

dt

1 T 1 1 T / 2 lim ln t lim ln ln 1

2 T

T ln

2 1 T lim ln lim

T T 1

T 2 ln

2

T

lim lim 0

7t 2 , t

t ), 2 cos2 (2 t

t ), 8 12 cos2 (2 t 31

) 0,

E d

1

1

So, this signal is NOT an energy signal. However, it is also NOT a power

signal since its average power as shown below is zero.

The average power of the signal is

T / 2

Pd T

T

T

T / 2

1

Using Le‘hopital‘s rule, we see that the power of the signal is zero. That is

Pd T

So, not all signals that approach zero as time approaches positive and

negative infinite is an energy signal. They may not be power signals either.

e) e (t ) ,

f) f (t ) .

g) g ( t . elsewhere

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 9: Ec6402 Notes

AMPLITUDE MODULATION:

In amplitude modulation, the instantaneous amplitude of a carrier wave is varied in accordance with the insta

modulating signal. Main advantages of AM are small bandwidth and simple transmitter and receiver designs.

implemented by mixing the carrier wave in a nonlinear device with the modulating signal. This produces upp

which are the sum and difference frequencies of the carrier wave and modulating signal.

The carrier signal is represented by

c(t) = A cos(wct)

The modulating signal is represented by

m(t) = B sin(wmt)

Then the final modulated signal is

[1 + m(t)] c(t)

= A [1 + m(t)] cos(wct)

= A [1 + B sin(wmt)] cos(wct)

= A cos(wct) + A m/2 (cos((wc+wm)t)) + A m/2 (cos((wc-wm)t))

Because of demodulation reasons, the magnitude of m(t) is always kept less than 1 and

the frequency much smaller than that of the carrier signal.

The modulated signal has frequency components at frequencies wc, wc+wm and wc-wm.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 10: Ec6402 Notes

DSBSC:

Double Sideband Suppressed Carrier Modulation In amplitude modulation the

amplitude of a high-frequency carrier is varied indirect proportion to the low-frequency

(baseband) message signal. The carrier is usually a sinusoidal waveform, that is,

c(t)=Ac cos(ωct+θc)

Or

c(t)=Ac sin(ωct+θc)

Where:

Ac is the unmodulated carrier amplitude

ωc is the unmodulated carrier angular frequency in radians/s;

ωc =2πfcθc is the unmodulated carrier phase, which we shall assume is zero.

The amplitude modulated carrier has the mathematical form

ΦDSB-SC(t)= A(t) cos(ωct)

Where:

A(t) is the instantaneous amplitude of the modulated carrier, and is a linear function

of the message signal m(t). A(t) is also known as the envelope of the modulated signal For

double-sideband suppressed carrier (DSB-SC) modulation the amplitude is

related to the message as follows:

A(t)=Ac(t) m(t)

Consider a message signal with spectrum (Fourier transform) M(ω) which is band limited

to 2πB as shown in Figure 1(b). The bandwidth of this signal is B Hz and ωc is chosen

such that ωc >> 2πB. Applying the modulation theorem, the modulated Fourier transform

is

A(t) cos(ωct)= m(t) cos(ωct) ⇔ ½( M(ω - ωc)+ M(ω + ωc))

GENERATION OF DSBSC:

The DSB-SC can be generated using either the balanced modulator or the ‗ring-modulator‘.

The balanced modulator uses two identical AM generators along with an adder. The two

amplitude modulators have a common carrier with one of them modulating the input

message , and the other modulating the inverted message . Generation of AM is not simple,

and to have two AM generators with identical operating conditions is extremely difficult.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 11: Ec6402 Notes

Hence, laboratory implementation of the DSB-SC is usually using the ‗ring-modulator‘,

shown in figure 1.

Figure 1

Figure 1: The ring modulator used for the generation of the double-side-band-suppressed-

carrier (DSB-SC)

This standard form of DSB-SC generation is the most preferred method of laboratory

implementation. However, it cannot be used for the generation of the AM waveform.

The DSB-SC and the DSB forms of AM are closely related as; the DSB-SC with the

addition of the carrier becomes the DSB, while the DSB with the carrier removed results in

the DSB-SC form of modulation. Yet, existing methods of DSB cannot be used for the

generation of the DSB-SC. Similarly the ring modulator cannot be used for the generation

of the DSB. These two forms of modulation are generated using different methods. Our

attempt in this work is to propose a single circuit capable of generating both the DSB-SC

and the DSB forms of AM.

THE MODIFIED SWITCHING MODULATOR:

The block diagram of the ‗modified switching modulator‘ given in figure 1, has all

the blocks of the switching modulator, but with an additional active device. In this case, the

active device has to be of three terminals to enable it being used as a ‗controlled switch‘.

Another significant change is that of the ‗adder‘ being shifted after the active device. These

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 12: Ec6402 Notes

changes in the ‗switching-modulator‘ enable the carrier to independently control the

switching action of the active device, and thus eliminate the restriction existing in the usual

‗switching-modulator‘ (equation (2)). In addition, the same circuit can generate the DSB-

SC waveform. Thus the task of modulators given in figures 1 and 2 is accomplished by the

single modulator of figure 3.

Figure 2

Figure 2: The modified ‗switching modulator‘

It is possible to obtain AM or the DSB-SC waveform from the ‗modified switching-

modulator‘ of figure 3, by just varying, the amplitude of the square wave carrier . It may be

noted that the carrier performs two tasks: (i) control the switching action of the active

devices and (ii) control the depth of modulation of the generated AM waveform. Thus, the

proposed modification in the switching modulator, enables the generation of both the AM

and the DSB-SC from a single circuit. Also, it may be noted that the method is devoid of

any assumptions or stringent difficult to maintain operating conditions, as in existing low

power generation of the AM. We now implement the ‗modified switching modulator‘ and

record the observed output in the next Section.

Experimental results

The circuit implemented for testing the proposed method is given in figure 4, which

uses transistors CL-100 and CK-100 for controlled switches, two transformers for the

adder, followed by a passive BPF. The square-wave carrier and the sinusoidal message are

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 13: Ec6402 Notes

given from a function generator (6MHz Aplab FG6M).The waveforms are observed on the

mixed signal oscilloscope (100MHz Agilent 54622D, capable of recording the output in

‗.tif‘ format).

Figure 3

Figure 3: The implementation of the modified ‗switching modulator‘ to generate the AM

and the DSB-SC waveform

The modified switching modulator is tested using a single tone message of 706 Hz,

with a square-wave carrier of frequency 7.78 KHz. The depth of modulation of the

generated waveform can be varied either by varying the amplitude of the carrier or by

varying the amplitude of the signal. Figure 5 has the results of the modulated waveforms

obtained using the ‗modified switching modulator‘. It can be seen that the same circuit is

able to generate AM for varying depths of modulation, including the over-modulation and

the DSB-SC. The quality of the modulated waveforms is comparable to that obtained using

industry standard communication modules (like the LabVolt for example).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 14: Ec6402 Notes

Properties of DSB-SC Modulation:

(a) There is a 180 phase reversal at the point where +A(t)=+m(t) goes negative.

This is typical of DSB-SC modulation.

is,

(b) The bandwidth of the DSB-SC signal is double that of the message signal, that

BWDSB-SC =2B (Hz).

(c) The modulated signal is centered at the carrier frequency ωc with two identical

sidebands (double-sideband) – the lower sideband (LSB) and the upper sideband (USB).

Being identical, they both convey the same message component.

(d) The spectrum contains no isolated carrier. Thus the name suppressed carrier.

(e)The 180 phase reversal causes the positive (or negative) side of the envelope to

have a shape different from that of the message signal. This is known as envelope

distortion, which is typical of DSBSC modulation.

(f) The power in the modulated signal is contained in all four sidebands.

Generation of DSB-SC Signals

The circuits for generating modulated signals are known as modulators. The basic

modulators are Nonlinear, Switching and Ring modulators. Conceptually, the simplest

modulator is the product or multiplier modulator which is shown in figure 1-a. However, it

is very difficult (and expensive) in practice to design a product modulator that maintains

amplitude linearity at high carrier frequencies. One way of replacing the modulator stage

is by using a non-linear device. We use the non-linearity to generate a harmonic that

contains the product term then use a BPF to separate the term of interest. Figure 3 shows a

block diagram of a nonlinear DSBSC modulator. Figure 4 shows a double balanced

modulator that use the diode as a non-linear device, then use the BPF to separate the

product term.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 15: Ec6402 Notes

The received DSB-SC signal is

Sm(t) = ΦDSB-SC(t)= Ac (t) m(t) cos(ωct)

The receiver first generates an exact (coherent) replica (same phase and frequency) of the

unmodulated carrier

Sc(t) = Cos(ωct)

The coherent carrier is then multiplied with the received signal to give

Sm(t)* Sc(t) = Ac (t) m(t) cos(ωct)* Cos(ωct)

= ½ Ac (t) m(t)+1/2 Ac (t) m(t) cos(2ωct)

The first term is the desired baseband signal while the second is a band-pass signal

centered at 2ωc. A low-pass filter with bandwidth equal to that of the m(t) will pass the

first term and reject the band-pass component.

Single Side Band (SSB) Modulation:

In DSB-SC it is observed that there is symmetry in the band structure. So,

even if one half is transmitted, the other half can be recovered at the received. By

doing so, the bandwidth and power of transmission is reduced by half.

Depending on which half of DSB-SC signal is transmitted, there are two typ es of

1. Lower Side Band (LSB) Modulation

2. Upper Side Band (USB) Modulation

Vestigial Side Band (VSB) Modulation:

The following are the drawbacks of SSB signal generation:

1. Generation of an SSB signal is difficult.

2. Selective filtering is to be done to get the original signal back.

3. Phase shifter should be exactly tuned to 900

.

To overcome these drawbacks, VSB modulation is used. It can viewed

as a compromise between SSB and DSB-SC.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 16: Ec6402 Notes

In VSB

1. One sideband is not rejected fully.

2. One sideband is transmitted fully and a small part (vestige)of

the other sideband is transmitted.

The transmission BW is B Wv =B + v. where, v is the vestigial frequency band.

FREQUENCY TRANSLATION:

The transfer of signals occupying a specified frequency band, such

as a channel or group of channels, from one portion of the frequency spectrum to another,

in such a way that the arithmetic frequency difference of signals within the band is

unaltered.

FREQUENCY-DIVISION MULTIPLEXING (FDM):

It is a form of signal multiplexing which involves assigning non-overlapping

frequency ranges to different signals or to each "user" of a medium.

FDM can also be used to combine signals before final modulation onto a carrier

wave. In this case the carrier signals are referred to as subcarriers: an example is stereo

FM transmission, where a 38 kHz subcarrier is used to separate the left-right difference

signal from the central left-right sum channel, prior to the frequency modulation of the

composite signal. A television channel is divided into subcarrier frequencies for video,

color, and audio. DSL uses different frequencies for voice and

for upstream and downstream data transmission on the same conductors, which is also an

example of frequency duplex. Where frequency-division multiplexing is used as to allow

multiple users to share a physical communications channel, it is called frequency-division

multiple access (FDMA).

NONLINEAR DISTORTION:

It is a term used (in fields such as electronics, audio and telecommunications) to

describe the phenomenon of a non-linear relationship between the "input" and "output"

signals of - for example - an electronic device.

EFFECTS OF NONLINEARITY:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 17: Ec6402 Notes

Nonlinearity can have several effects, which are unwanted in typical situations.

The a3 term for example would, when the input is a sine wave with frequency ω, result in

an extra sine wave at 3ω, as shown below.

In certain situations, this spurious signal can be filtered away because the "harmonic" 3ω

lies far outside the frequency range used, but in cable television, for example, third order

distortion could cause a 200 MHz signal to interfere with the regular channel at 600

MHz.

Nonlinear distortion applied to a superposition of two signals at different frequencies

causes the circuit to act as a frequency mixer, creating intermodulation distortion.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 18: Ec6402 Notes

PART A (2 MARK) QUESTIONS.

1. As related to AM, what is over modulation, under modulation and 100% modulation?

2. Draw the frequency spectrum of VSB, where it is used

3. Define modulation index of an AM signal

4. Draw the circuit diagram of an envelope detector

5. What is the mid frequency of IF section of AM receivers and its bandwidth.

6. A transmitter radiates 9 kW without modulation and 10.125 kW after modulation.

Determine depth of modulation.

7. Draw the spectrum of DSB.

8. Define the transmission efficiency of AM signal.

9. Draw the phasor diagram of AM signal.

10. Advantages of SSB.

11. Disadvantages of DSB-FC.

12. What are the advantages of superhetrodyne receiver?

13. Advantages of VSB.

14. Distinguish between low level and high level modulator.

15. Define FDM & frequency translation.

16. Give the parameters of receiver.

17. Define sensitivity and selectivity.

18. Define fidelity.

19. What is meant by image frequency?

20. Define multitone modulation.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 19: Ec6402 Notes

PART B (16 MARK) QUESTIONS

1. Explain the generation of AM signals using square law modulator. (16)

2. Explain the detection of AM signals using envelope detector. (16)

3. Explain about Balanced modulator to generate DSB-SC signal. ` (16)

4. Explain about coherent detector to detect SSB-SC signal. (16)

5. Explain the generation of SSB using balanced modulator. (16)

6. Draw the circuit diagram of Ring modulator and explain with its operation? (16)

7. Discus the coherent detection of DSB-SC modulated wave with a block diagram of

detector and Explain. (16)

8. Explain the working of Superheterodyne receiver with its parameters. (16)

9. Draw the block diagram for the generation and demodulation of a VSB signal and

explain the principle of operation. (16)

10. Write short notes on frequency translation and FDM? (16)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 20: Ec6402 Notes

UNIT II

ANGLE MODULATION

Phase and frequency modulation

Single tone

Narrow band FM

Wideband FM

Transmission bandwidth

Generation of FM signal.

Demodulation of FM signal

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 21: Ec6402 Notes

PHASE MODULATION:

Phase modulation (PM) is a form of modulation that represents information as

variations in the instantaneous phase of a carrier wave.

Unlike its more popular counterpart, frequency modulation (FM), PM is not very

widely used for radio transmissions. This is because it tends to require more complex

receiving hardware and there can be ambiguity problems in determining whether, for

example, the signal has changed phase by +180° or -180°. PM is used, however, in digital

music synthesizers such as the Yamaha DX7, even though these instruments are usually

referred to as "FM" synthesizers (both modulation types sound very similar, but PM is

usually easier to implement in this area).

An example of phase modulation. The top diagram shows the modulating signal

superimposed on the carrier wave. The bottom diagram shows the resulting phase-

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 22: Ec6402 Notes

modulated signal. PM changes the phase angle of the complex envelope in direct

proportion to the message signal.

Suppose that the signal to be sent (called the modulating or message signal) is m(t) and the

carrier onto which the signal is to be modulated is

Annotated:

carrier(time) = (carrier amplitude)*sin(carrier frequency*time + phase shift)

This makes the modulated signal

This shows how m(t) modulates the phase - the greater m(t) is at a point in time, the

greater the phase shift of the modulated signal at that point. It can also be viewed as a

change of the frequency of the carrier signal, and phase modulation can thus be considered

a special case of FM in which the carrier frequency modulation is given by the time

derivative of the phase modulation.

The spectral behavior of phase modulation is difficult to derive, but the

mathematics reveals that there are two regions of particular interest:

For small amplitude signals, PM is similar to amplitude

modulation (AM) and exhibits its unfortunate doubling of

baseband bandwidth and poor efficiency.

For a single large sinusoidal signal, PM is similar to FM, and its

bandwidth is approximately

,

where fM = ωm / 2π and h is the modulation index defined below. This is

also known as Carson's Rule for PM.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 23: Ec6402 Notes

MODULATION INDEX:

As with other modulation indices, this quantity indicates by how much the

modulated variable varies around its unmodulated level. It relates to the variations in the

phase of the carrier signal:

,

where ∆θ is the peak phase deviation. Compare to the modulation index for frequency

modulation.

Variable-capacitance diode phase modulator:

This circuit varies the phase between two square waves through at least 180°. This

capability finds application in fixed-frequency, phase shift, resonant-mode converters. ICs

such as the UC3875 usually only work up to about 500 kHz, whereas this circuit can be

extended up to tens of megahertz. In addition, the circuit shown uses low-cost components.

This example was used for a high-efficiency 2-MHz RF power supply.

The signal is delayed at each gate by the RC network formed by the 4.7k input

resistor and capacitance of the 1N4003 diode. The capacitance of the diode, and hence

delay, can be varied by controlling the reverse dc bias applied across the diode. The 100k

resistor to ground at the input to the second stage corrects a slight loss of 1:1 symmetry.

The fixed delay for output A adjusts the phase to be approximately in phase at a 5-V bias.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 24: Ec6402 Notes

Note that the control voltage should not drop below approximately 3 V, because the diodes

will start to be forward-biased and the signal will be lost.

FREQUENCY MODULATION:

Frequency modulation (FM) conveys information over a carrier wave by varying

its instantaneous frequency. This is in contrast with amplitude modulation, in which

the amplitude of the carrier is varied while its frequency remains constant.

In analog applications, the difference between the instantaneous and the base frequency of

the carrier is directly proportional to the instantaneous value of the input signal

amplitude. Digital data can be sent by shifting the carrier's frequency among a set of

discrete values, a technique known as frequency-shift keying.

Frequency modulation can be regarded as phase modulation where the carrier phase

modulation is the time integral of the FM modulating signal.

FM is widely used for broadcasting of music and speech, and in two-way

radio systems, in magnetic tape recording systems, and certain video transmission systems.

In radio systems, frequency modulation with sufficient bandwidth provides an advantage in

cancelling naturally-occurring noise. Frequency-shift keying (digital FM) is widely used in

data and fax modems.

THEORY:

Suppose the baseband data signal (the message) to be transmitted is xm(t) and

the sinusoidal carrier is , where fc is the carrier's base frequency

and Ac is the carrier's amplitude. The modulator combines the carrier with the baseband

data signal to get the transmitted signal:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 25: Ec6402 Notes

In this equation, is the instantaneous frequency of the oscillator and is

the frequency deviation, which represents the maximum shift away from fc in one

direction, assuming xm(t) is limited to the range ±1.

Although it may seem that this limits the frequencies in use to fc ± f∆, this neglects the

distinction between instantaneous frequency and spectral frequency. The frequency

spectrum of an actual FM signal has components extending out to infinite frequency,

although they become negligibly small beyond a point.

SINUSOIDAL BASEBAND SIGNAL:

While it is an over-simplification, a baseband modulated signal may be approximated

by a sinusoidal Continuous Wave signal with a frequency fm. The integral of such a

signal is

Thus, in this specific case, equation (1) above simplifies to:

where the amplitude of the modulating sinusoid, is represented by the peak

deviation (see frequency deviation).

The harmonic distribution of a sine wave carrier modulated by such

a sinusoidal signal can be represented with Bessel functions - this provides a basis for a

mathematical understanding of frequency modulation in the frequency domain.

MODULATION INDEX:

As with other modulation indices, this quantity indicates by how much the

modulated variable varies around its unmodulated level. It relates to the variations in the

frequency of the carrier signal:

where is the highest frequency component present in the modulating

signal xm(t), and is the Peak frequency-deviation, i.e. the maximum deviation of

the instantaneous frequency from the carrier frequency. If , the modulation is

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 26: Ec6402 Notes

called narrowband FM, and its bandwidth is approximately . If , the

modulation is called wideband FM and its bandwidth is approximately . While

wideband FM uses more bandwidth, it can improve signal-to-noise ratio significantly.

With a tone-modulated FM wave, if the modulation frequency is held constant and

the modulation index is increased, the (non-negligible) bandwidth of the FM signal

increases, but the spacing between spectra stays the same; some spectral components

decrease in strength as others increase. If the frequency deviation is held constant and the

modulation frequency increased, the spacing between spectra increases.

Frequency modulation can be classified as narrow band if the change in the carrier

frequency is about the same as the signal frequency, or as wide-band if the change in the

carrier frequency is much higher (modulation index >1) than the signal frequency. [1]

For

example, narrowband FM is used for two way radio systems such as Family Radio

Service where the carrier is allowed to deviate only 2.5 kHz above and below the center

frequency, carrying speech signals of no more than 3.5 kHz bandwidth. Wide-band FM is

used for FM broadcasting where music and speech is transmitted with up to 75 kHz

deviation from the center frequency, carrying audio with up to 20 kHz bandwidth.

CARSON'S RULE:

A rule of thumb, Carson's rule states that nearly all (~98%) of the power of a

frequency-modulated signal lies within a bandwidth of

where , as defined above, is the peak deviation of the instantaneous frequency

from the center carrier frequency .

NOISE QUIETING:

The noise power decreases as the signal power increases, therefore the SNR goes

up significantly.

MODULATION:

FM signals can be generated using either direct or indirect frequency modulation.

Direct FM modulation can be achieved by directly feeding the message into the

input of a VCO.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 27: Ec6402 Notes

For indirect FM modulation, the message signal is integrated to generate a phase

modulated signal. This is used to modulate a crystal controlled oscillator, and the

result is passed through a frequency multiplier to give an FM signal.

DEMODULATION:

Many FM detector circuits exist. One common method for recovering the

information signal is through a Foster-Seeley discriminator. A phase-lock loop can be used

as an FM demodulator.

Slope detection demodulates an FM signal by using a tuned circuit, which has its

resonant frequency slightly offset from the carrier frequency. As the frequency rises and

falls, the tuned circuit provides a changing amplitude of response, converting FM to AM.

AM receivers may detect some FM transmissions by this means, though it does not provide

an efficient method of detection for FM broadcasts.

APPLICATIONS: MAGNETIC TAPE STORAGE:

FM is also used at intermediate frequencies by all analog VCR systems, including

VHS, to record both the luminance (black and white) and the chrominance portions of the

video signal. FM is the only feasible method of recording video to and retrieving video

from Magnetic tape without extreme distortion, as video signals have a very large range

of frequency components — from a few hertz to several megahertz, too wide for

equalizers towork with due to electronic noise below −60 dB. FM also keeps the tape at

saturation level, and therefore acts as a form of noise reduction, and a

simple limiter can mask variations in the playback output, and the FM capture effect

removes print-through and pre-echo. A continuous pilot-tone, if added to the signal — as

was done on V2000 and many Hi-band formats — can keep mechanical jitter under control

and assist time base correction.

These FM systems are unusual in that they have a ratio of carrier to maximum

modulation frequency of less than two; contrast this with FM audio broadcasting where the

ratio is around 10,000. Consider for example a 6 MHz carrier modulated at a 3.5 MHz rate;

by Bessel analysis the first sidebands are on 9.5 and 2.5 MHz, while the second sidebands

are on 13 MHz and −1 MHz The result is a sideband of reversed phase on +1 MHz; on

demodulation, this results in an unwanted output at 6−1 = 5 MHz The system must be

designed so that this is at an acceptable level.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 28: Ec6402 Notes

SOUND:

FM is also used at audio frequencies to synthesize sound. This technique, known

as FM synthesis, was popularized by early digital synthesizers and became a standard

feature for several generations of personal computer sound cards.

RADIO:

The wideband FM (WFM) requires a wider signal bandwidth than amplitude

modulation by an equivalent modulating signal, but this also makes the signal more robust

against noise and interference. Frequency modulation is also more robust against simple

signal amplitude fading phenomena. As a result, FM was chosen as the

modulation standard for high frequency, high fidelity radio transmission: hence the term

"FM radio" (although for many years the BBC called it "VHF radio", because commercial

FM broadcasting uses a well-known part of the VHF band—the FM broadcast band).

FM receivers employ a special detector for FM signals and exhibit

a phenomenon called capture effect, where the tuner is able to clearly receive the stronger

of two stations being broadcast on the same frequency. Problematically

however, frequency drift or lack of selectivity may cause one station or signal to be

suddenly overtaken by another on an adjacent channel. Frequency drift typically

constituted a problem on very old or inexpensive receivers, while inadequate selectivity

may plague any tuner.

An FM signal can also be used to carry a stereo signal: see FM stereo. However,

this is done by using multiplexing and demultiplexing before and after the FM process. The

rest of this article ignores the stereo multiplexing and demultiplexing process used in

"stereo FM", and concentrates on the FM modulation and demodulation process, which is

identical in stereo and mono processes.

A high-efficiency radio-frequency switching amplifier can be used to transmit FM

signals (and other constant-amplitude signals). For a given signal strength (measured at the

receiver antenna), switching amplifiers use less battery power and typically cost less than

a linear amplifier. This gives FM another advantage over other modulation schemes that

require linear amplifiers, such as AM and QAM.

FM is commonly used at VHF radio frequencies for high-

fidelity broadcasts of music and speech (see FM broadcasting). Normal (analog) TV sound

is also broadcast using FM. A narrow band form is used for voice communications in

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 29: Ec6402 Notes

commercial and amateur radio settings. In broadcast services, where audio fidelity is

important, wideband FM is generally used. In two-way radio, narrowband FM (NBFM) is

used to conserve bandwidth for land mobile radio stations, marine mobile, and many other

radio services.

VARACTOR FM MODULATOR:

Varactor FM Modulator

Another fm modulator which is widely used in transistorized circuitry uses a

voltage-variable capacitor (VARACTOR). The varactor is simply a diode, or pn junction,

that is designed to have a certain amount of capacitance between junctions. View (A) of

figure 2 shows the varactor schematic symbol. A diagram of a varactor in a simple

oscillator circuit is shown in view (B).This is not a working circuit, but merely a simplified

illustration. The capacitance of a varactor, as with regular capacitors, is determined by the

area of the capacitor plates and the distance between the plates. The depletion region in the

varactor is the dielectric and is located between the p and n elements, which serve as the

plates. Capacitance is varied in the varactor by varying the reverse bias which controls the

thickness of the depletion region. The varactor is so designed that the change in

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 30: Ec6402 Notes

capacitance is linear with the change in the applied voltage. This is a special design

characteristic of the varactor diode. The varactor must not be forward biased because it

cannot tolerate much current flow. Proper circuit design prevents the application of

forward bias. FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 31: Ec6402 Notes

IMPORTANT QUESTION

PART A

All questions – Two Marks:

1. What do you mean by narrowband and wideband FM?

2. Give the frequency spectrum of narrowband FM?

3. Why Armstrong method is superior to reactance modulator.

4. Define frequency deviation in FM?

5. State Carson‘s rule of FM bandwidth?

6. Differentiate between narrow band and wideband FM.?

7. What are the advantages of FM.?

8. Define PM.

9. What is meant by indirect FM generation?

10. Draw the phasor diagram of narrow band FM.

11. Write the expression for the spectrum of a single tone FM signal.

12. What are the applications of phase locked loop?

13. Define modulation index of FM and PM.

14. Differentiate between phase and frequency modulation.

15. A carrier of frequency 100 MHz is frequency modulated by a signal x(t)=20sin

(200¶x103t ). What is the bandwidth of the FM signal if the frequency sensitivity of the

modulator is 25 KHz per volt?

16. What is the bandwidth required for an FM wave in which the modulating frequency

signal

is 2 KHz and the maximum frequency deviation is 12 KHz?

17. Determine and draw the instantaneous frequency of a wave having a total phase angle

given by ø(t)= 2000t +sin10t.

18. Draw the block diagram of PLL.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 32: Ec6402 Notes

PART B

1. Explain the indirect method of generation of FM wave and any one method of

demodulating an FM wave. (16)

2. Derive the expression for the frequency modulated signal. Explain what is meant by

narrowband FM and wideband FM using the expression. (16)

3. Explain any two techniques of demodulation of FM. (16)

4. Explain the working of the reactance tube modulator and drive an expression to show

how the variation of the amplitude of the input signal changes the frequency of the output

signal of the modulator. (16)

5. Discuss the effects of nonlinearities in FM. (8)

6. Discuss in detail FM stereo multiplexing. (8)

7. Draw the frequency spectrum of FM and explain. Explain how Varactor diode can be

used for frequency modulation. (16)

8. Discuss the indirect method of generating a wide-band FM signal. (8)

9. Draw the circuit diagram of Foster-Seelay discriminator and explain its working. (16)

10. Explain the principle of indirect method of generating a wide-band FM signal with a

neat block diagram. (8)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 33: Ec6402 Notes

UNIT III

RANDOM PROCESS

Review of probability.

Random variables and random process.

Gaussian process.

Noise.

Shot noise.

Thermal noise.

White noise.

Narrow band noise.

Noise temperature.

Noise figure.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 34: Ec6402 Notes

INTRODUCTION OF PROBABILITY:

Probability theory is the study of uncertainty. Through this class, we will be relying on concepts

from probability theory for deriving machine learning algorithms. These notes attempt to cover the basics of probability theory at a level appropriate. The mathematical theory of probability is very sophisticated, and delves into a branch of analysis known as measure theory. In these notes, we provide a basic treatment of probability that does not address these finer details.

1 Elements of probability

In order to define a probability on a set we need a few basic elements,

• Sample space Ω: The set of all the outcomes of a random experiment. Here, each

outcome ω ∈ Ω can be thought of as a complete description of the state of the real world at the end of the experiment.

• Set of events (or event space) F : A set whose elements A ∈ F (called events) are subsets of Ω (i.e., A ⊆ Ω is a collection of possible outcomes of an experiment).1 .

• Probability measure: A function P : F → R that satisfies the following properties,

- P (A) ≥ 0, for all A ∈ F - P (Ω) = 1

- If A1 , A2 , . . . are disjoint events (i.e., Ai ∩ Aj = ∅ whenever i = j ), then X

P (ui Ai ) = P (Ai ) i

These three properties are called the Axioms of Probability.

Example: Consider the event of tossing a six-sided die. The sample space is Ω = 1, 2, 3, 4, 5, 6. We can define different event spaces on this sample space. For example, the simplest event

space is the trivial event space F = ∅, Ω. Another event space is the set of all subsets of Ω. For the first event space, the unique probability measure satisfying the requirements

above is given by P (∅) = 0, P (Ω) = 1. For the second event space, one valid probability measure is to assign the probability of each set in the event space to be i where i is the number of elements of that set. 6

Properties:

- If A ⊆ B =⇒ P (A) ≤ P (B). - P (A ∩ B) ≤ min(P (A), P (B)).

- (Union Bound) P (A u B) ≤ P (A) + P (B).

- P (Ω \ A) = 1 − P (A).

k

- (Law of Total Probability) If A1 , . . . , Ak are a set of disjoint events such that ui=1 k

2 Random variables

Consider an experiment in which we flip 10 coins, and we want to know the number of coins that come up heads. Here, the elements of the sample space I are 10-length sequences of heads and tails. For example, we might have wO = (H, H, T , H, T , H, H, T , T , T ) E I. However, in practice, we usually do not care about the probability of obtaining any particular sequence of heads and tails. Instead we usually care about real-valued functions of outcomes, such as the number of the number of heads that appear among our 10 tosses, or the length of the longest run of tails. These functions, under some technical conditions, are known as random variables.

More formally, a random variable X is a function X : I −→ R.2 Typically, we will denote random variables using upper case letters X (ω) or more simply X (where the dependence on the random outcome ω is implied). We will denote the value that a random variable may take on using lower case letters x.

Example: In our experiment above, suppose that X (ω) is the number of heads which occur in the sequence of tosses ω. Given that only 10 coins are tossed, X (ω) can take only a finite number of values, so it is known as a discrete random variable. Here, the probability of the set associated with a random variable X taking on some specific value k is

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 35: Ec6402 Notes

P (X = k) := P (ω : X (ω) = k).

Example: Suppose that X (ω) is a random variable indicating the amount of time it takes for a radioactive particle to decay. In this case, X (I) takes on a infinite number of possible values, so it is called a continuous random variable. We denote the probability that X takes on a value between two real constants a and b (where a < b) as

P (a ≤ X ≤ b) := P (ω : a ≤ X

(ω) ≤ b).

2.1 Cumulative distribution functions

In order to specify the probability measures used when dealing with random variables, it is often convenient to specify alternative functions (CDFs, PDFs, and PMFs) from which the probability measure governing an experiment immediately follows. In this section and the next two sections, we describe each of these types of functions in turn.

A cumulative distribution function (CDF) is a function FX : R → [0, 1] which specifies a proba- bility measure as,

FX (x) , P (X ≤ x). (1)

By using this function one can calculate the probability of any event in F .3 Figure 1 shows a sample CDF function.

2.2 Probability mass functions

When a random variable X takes on a finite set of possible values (i.e., X is a discrete random variable), a simpler way to represent the probability measure associated with a random variable is to directly specify the probability of each value that the random variable can assume. In particular,

a probability mass function (PMF) is a function pX : I → R such that

pX (x) , P (X = x).

In the case of discrete random variable, we use the notation V al(X ) for the set of possible values that the random variable X may assume. For example, if X (ω) is a random variable indicating the

number of heads out of ten tosses of coin, then V al(X ) = 0, 1, 2, . . . , 10. Properties:

- 0 ≤ pX (x)

≤ 1. P

- x∈V al(X ) pX (x) =

1. P

- x∈A pX (x) = P (X e

A).

2.3 Probability density functions

For some continuous random variables, the cumulative distribution function FX (x) is differentiable everywhere. In these cases, we define the Probability Density Function or PDF as the derivative of the CDF, i.e.,

fX (x) , dFX (x)

dx

. (2)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 36: Ec6402 Notes

Note here, that the PDF for a continuous random variable may not always exist (i.e., if FX (x) is not differentiable everywhere).

According to the properties of differentiation, for very small ∆x,

P (x ≤ X ≤ x + ∆x) ≈ fX (x)∆x. (3)

Both CDFs and BDFs (when they exist!) can be used for calculating the probabilities of different

events. But it should be emphasized that the value of PDF at any given point x is not the probability

of that event, i.e., fX (x) = P (X = x). For example, fX (x) can take on values larger than one (but the integral of fX (x) over any subset of R will be at most one).

Properties:

- fX (x) ≥ 0 .

R ∞ - R −∞

fX (x) = i.

- x∈A

fX (x)dx = P (X e A).

2.4 Expectation

Suppose that X is a discrete random variable with PMF pX (x) and g : R -→ R is an arbitrary function. In this case, g(X ) can be considered a random variable, and we define the expectation or expected value of g(X ) as

E[g(X )] , X

x∈V al(X )

g(x)pX (x).

If X is a continuous random variable with PDF fX (x), then the expected value of g(X ) is defined as,

Z ∞ E[g(X )] ,

−∞

g(x)fX (x)dx.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 37: Ec6402 Notes

p(x) =

1

x

Intuitively, the expectation of g(X ) can be thought of as a ―weighted averageǁ of the values that g(x) can taken on for different values of x, where the weights are given by pX (x) or fX (x). As a special

case of the above, note that the expectation, E[X ] of a random variable itself is found by letting g(x) = x; this is also known as the mean of the random variable X .

Properties:

- E[a] = a for any constant a e R. - E[af (X )] = aE[f (X )] for any constant a e R.

- (Linearity of Expectation) E [f (X ) + g(X )] = E[f (X )] + E[g(X )].

- For a discrete random variable X , E[iX = k] = P (X = k).

2.5 Variance

The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. Formally, the variance of a random variable X is defined as

V ar[X ] , E [(X - E (X ))2 ] Using the properties in the previous section, we can derive an alternate expression for the variance:

E [(X - E[X ])2 ] = E[X 2 - 2E [X ]X + E[X ]2 ]

= E[X 2 ] - 2E [X ]E[X ] + E[X ]2

= E[X 2 ] - E [X ]2 ,

where the second equality follows from linearity of expectations and the fact that E[X ] is actually a constant with respect to the outer expectation.

Properties:

- V ar[a] = 0 for any constant a e R. - V ar[af (X )] = a2 V ar[f (X )] for any constant a e R.

2.6 Some common random variables

Discrete random variables

• X ∼ B ernoulli(p) (where O ≤ p ≤ 1): one if a coin with heads probability p

comes up heads, zero otherwise.

p(x) =

p if p = 1

1 - p if p = O

• X ∼ B inomial(n, p) (where O ≤ p ≤ 1): the number of heads in n independent flips of a coin with heads probability p.

n

x px (1 - p)n−x

• X ∼ Geometric(p) (where p > O): the number of flips of a coin with heads probability p until the first heads.

p(x) = p(1 - p)x−1

• X ∼ P oisson(λ) (where λ > O): a probability distribution over the nonnegative integers used for modeling the frequency of rare events.

p(x) = e−λ λ x!

Continuous random variables

• X ∼ U nif orm(a, b) (where a < b): equal probability density to every value between a and b on the real line.

f (x) = b−a if a ≤ x ≤ b

O otherwise

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 38: Ec6402 Notes

− (x−µ)

• X ∼ E xponential(λ) (where λ > O): decaying probability density over the nonnegative reals.

f (x) = λe−λx if x > O

O otherwise

• X ∼ N ormal(µ, σ2 ): also known as the Gaussian distribution

f (x) = √ 1

e 1 2

2σ 2

2πσ

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 39: Ec6402 Notes

Figure 2: PDF and CDF of a couple of random variables.

3 Two random variables

Thus far, we have considered single random variables. In many situations, however, there may be more than one quantity that we are interested in knowing during a ran- dom experiment. For instance, in an experiment where we flip a coin ten times, we may care about both X (ω) = the number of heads that come up as well as Y (ω) = the length of the longest run of consecutive heads. In this section, we consider the setting of two random variables.

3.1 Joint and marginal distributions

Suppose that we have two random variables X and Y . One way to work with these two random variables is to consider each of them separately. If we do that we will only need FX (x) and FY (y).

But if we want to know about the values that X and Y assume simultaneously during outcomes of a random experiment, we require a more complicated structure known as the joint cumulative distribution function of X and Y , defined by

FX Y (x, y) = P (X x, Y y)

It can be shown that by knowing the joint cumulative distribution function, the probability of any event involving X and Y can be calculated.

6

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 40: Ec6402 Notes

X Y

The joint CDF FX Y (x, y) and the joint distribution functions FX (x) and FY (y) of each variable separately are related by

FX (x) = lim FX Y (x, y)dy y →∞

FY (y) = lim FX Y (x, y)dx. x→∞

Here, we call FX (x) and FY (y) the marginal cumulative distribution functions of FX Y (x, y).

Properties:

- o FX Y (x, y) 1.

- limx,y →∞ FX Y (x, y) = 1.

- limx,y→—∞ FX Y (x, y) = o.

- FX (x) = limy →∞ FX Y (x, y).

3.2 Joint and marginal probability mass functions

If X and Y are discrete random variables, then the joint probability mass function pX Y : R ×R → [o, 1] is defined by

pX Y (x, y) = P (X = x, Y = y).

Here, o PX Y (x, y) 1 for all x, y, and P

x∈V al(X )

P

y∈V al(Y ) PX Y (x, y) = 1.

How does the joint PMF over two variables relate to the probability mass function for each variable separately? It turns out that

pX (x) =

X pX Y (x, y).

y

and similarly for pY (y). In this case, we refer to pX (x) as the marginal probability mass function of X . In statistics, the process of forming the marginal distribution with respect to one variable by summing out the other variable is often known as ―marginalization.ǁ

3.3 Joint and marginal probability density functions

Let X and Y be two continuous random variables with joint distribution function FX Y . In the case that FX Y (x, y) is everywhere differentiable in both x and y, then we can define the joint probability density function,

fX Y (x, y) =

∂2 FX Y (x, y) .

∂ x∂y

Like in the single-dimensional case, fX Y (x, y) = P (X = x, Y = y), but rather

Z Z

x∈A

fX Y (x, y)dxdy = P ((X, Y ) e A).

Note that the values of the probability density function fX Y R(x, y)Rare always nonnegative, but they

may be greater than 1. Nonetheless, it must be the case that

Analagous to the discrete case, we define

∞ ∞ f (x, y) = 1. —∞ —∞

fX (x) =

Z ∞ fX Y (x, y)dy,

—∞

as the marginal probability density function (or marginal density) of X , and similarly for fY (y).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 41: Ec6402 Notes

X

X

3.4 Conditional distributions

Conditional distributions seek to answer the question, what is the probability distribution over Y , when we know that X must take on a certain value x? In the discrete case, the conditional probability mass function of X given Y is simply

assuming that pX (x) _ o.

pY |X

pX Y (x, y)

(y|x) _ p (x)

,

In the continuous case, the situation is technically a little more complicated because the probability

that a continuous random variable X takes on a specific value x is equal to zero4 . Ignoring this technical point, we simply define, by analogy to the discrete case, the conditional probability density of Y given X _ x to be

provided fX (x) _ o.

3.5 Bayes‘s rule

fY |X

fX Y (x, y)

(y|x) _ f (x)

,

A useful formula that often arises when trying to derive expression for the conditional probability of one variable given another, is Bayes‘s rule.

In the case of discrete random variables X and Y ,

PX Y (x, y) PX |Y (x|y)PY (y) P (y|x) _ _ . Y |X PX (x)

yi EV al(Y ) PX |Y (x|yl)PY (y l)

If the random variables X and Y are continuous,

fX Y (x, y) fX |Y (x|y)fY (y) f (y|x) _ _ . Y |X fX (x)

—∞ fX |Y (x|yl)fY (y l)dyl

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 42: Ec6402 Notes

3.6 Independence

Two random variables X and Y are independent if FX Y (x, y) _ FX (x)FY (y) for all values of x

and y. Equivalently,

• For discrete random variables, pX Y (x, y) _ pX (x)pY (y) for all x e V al(X ), y e V al(Y ).

• For discrete random variables, pY |X (y|x) _ pY (y) whenever pX (x) _ o for all y e V al(Y ).

• For continuous random variables, fX Y (x, y) _ fX (x)fY (y) for all x, y e R.

• For continuous random variables, fY |X (y|x) _ fY (y) whenever fX (x) _ o for all y e R.

To get around this, a more reasonable way to calculate the conditional CDF is,

FY |X (y, x) = lim ∆x→0

P (Y ≤ y|x ≤ X ≤ x + ∆x).

It can be easily seen that if F (x, y) is differentiable in both x, y then,

Z y f X,Y

(x,

FY |X (y, x) =

−∞

α) dα

fX (x)

and therefore we define the conditional PDF of Y given X = x in the following way,

fY |X

(y|x) = fX Y (x, y) fX (x)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 43: Ec6402 Notes

Informally, two random variables X and Y are independent if ―knowingǁ the value of one variable will never have any effect on the conditional probability distribution of the other variable, that is, you know all the information about the pair (X, Y ) by just knowing f (x) and f (y). The following lemma formalizes this observation:

Lemma 3.1. If X and Y are independent then for any subsets A, B ⊆ R, we have,

P (X e A, y e B) _ P (X e A)P (Y e B)

By using the above lemma one can prove that if X is independent of Y then any function of X is independent of any function of Y .

3.7 Expectation and covariance

Suppose that we have two discrete random variables X, Y and g : R2 -→ R is a function of these two random variables. Then the expected value of g is defined in the following way,

X X E [g(X, Y )] ,

xEV al(X ) yEV al(Y )

g(x, y)pX Y (x, y).

For continuous random variables X, Y , the analogous expression is

Z ∞ Z ∞

E[g(X, Y )] _ —∞ —∞

g(x, y)fX Y (x, y)dxdy.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 44: Ec6402 Notes

We can use the concept of expectation to study the relationship of two random variables with each other. In particular, the covariance of two random variables X and Y is defined as

C ov[X, Y ] , E [(X - E[X ])(Y - E[Y ])]

Using an argument similar to that for variance, we can rewrite this as,

C ov[X, Y ] _ E [(X - E[X ])(Y - E[Y ])] _ E[X Y - X E[Y ] - Y E [X ] + E[X ]E[Y ]]

_ E[X Y ] - E[X ]E[Y ] - E[Y ]E[X ] + E[X ]E[Y ]]

_ E[X Y ] - E[X ]E[Y ].

Here, the key step in showing the equality of the two forms of covariance is in the third equality, where we use the fact that E[X ] and E[Y ] are actually constants which can be pulled out of the

expectation. When C ov[X, Y ] _ o, we say that X and Y are uncorrelated5 .

Properties:

- (Linearity of expectation) E[f (X, Y ) + g(X, Y )] _ E[f (X, Y )] + E[g(X, Y )].

- V ar[X + Y ] _ V ar[X ] + V ar[Y ] + 2C ov[X, Y ].

- If X and Y are independent, then C ov[X, Y ] _ o.

- If X and Y are independent, then E[f (X )g(Y )] _ E[f (X )]E[g(Y )].

4 Multiple random variables

The notions and ideas introduced in the previous section can be generalized to more than two random variables. In particular, suppose that we have n continuous random variables, X1 (ω), X2 (ω), . . . Xn (ω). In this section, for simplicity of presentation, we focus only on the

continuous case, but the generalization to discrete random variables works similarly.

4.1 Basic properties

We can define the joint distribution function of X1 , X2 , . . . , Xn , the joint probability density function of X1 , X2 , . . . , Xn , the marginal probability density function of X1 , and the condi- tional probability density function of X1 given X2 , . . . , Xn , as

FX1 ,X2 ,...,Xn (x1 , x2 , . . . xn ) = P (X1 x1 , X2 x2 , . . . , Xn xn )

n

fX1 ,X2 ,...,Xn

(x1

, x2

, . . . xn ) ∂ FX1 ,X ,.2..,Xn (x 1 , x2 , . . . xn )

∂ x1 . . . ∂xn

=

fX1 (X1 ) =

Z ∞ Z ∞

· · ·

fX1 ,X2 ,...,Xn

(x1 , x2 , . . . xn )dx2 . . . dxn

fX1 |X2 ,...,Xn

(x1

|x2 , . . . xn ) =

—∞ —∞

fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )

f X2 ,...,Xn(x 1 , x 2, . . . xn )

To calculate the probability of an event A ⊆ Rn we have, Z

P ((x1 , x2 , . . . xn ) e A) = (x1 ,x2 ,...xn )EA

fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )dx1 dx2 . . . dxn (4)

Chain rule: From the definition of conditional probabilities for multiple random variables, one can show that

f (x1 , x2 , . . . , xn ) = f (xn |x1 , x2 . . . , xn—1 )f (x1 , x2 . . . , xn—1 ) = f (xn |x1 , x2 . . . , xn—1 )f (xn—1 |x1 , x2 . . . , xn—2 )f (x1 , x2 . . . , xn—2 )

Yn

= . . . = f (x1 ) f (xi |x1 , . . . , xi—1 ).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 45: Ec6402 Notes

i=2

Independence: For multiple events, A1 , . . . , Ak , we say that A1 , . . . , Ak are mutually indepen-

dent if for any subset S ⊆ 1, 2, . . . , k, we have

Y P (niES Ai ) =

iES

P (Ai ).

Likewise, we say that random variables X1 , . . . , Xn are independent if

f (x1 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn ).

Here, the definition of mutual independence is simply the natural generalization of independence of two random variables to multiple random variables.

Independent random variables arise often in machine learning algorithms where we assume that the training examples belonging to the training set represent independent samples from some unknown probability distribution. To make the significance of independence clear, consider a ―badǁ training

set in which we first sample a single training example (x(1) , y(1) ) from the some unknown distribu- tion, and then add m - 1 copies of the exact same training example to the training set. In this case, we have (with some abuse of notation)

P ((x(1) , y(1) ), . . . .(x(m) , y(m) )) =

Ym

i=1

P (x(i) , y(i) ).

Despite the fact that the training set has size m, the examples are not independent! While clearly the procedure described here is not a sensible method for building a training set for a machine learning algorithm, it turns out that in practice, non-independence of samples does come up often, and it has the effect of reducing the ―effective sizeǁ of the training set.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 46: Ec6402 Notes

1

1

1

n

. . . .

n

=

4.2 Random vectors

Suppose that we have fl random variables. When working with all these random variables together,

we will often find it convenient to put them in a vector X = [X1 X2 . . . Xn ]T . We call the

resulting vector a random vector (more formally, a random vector is a mapping from I to Rn ). It should be clear that random vectors are simply an alternative notation for dealing with fl random variables, so the notions of joint PDF and CDF will apply to random vectors as well.

Expectation: Consider an arbitrary function from g : Rn → R. The expected value of this function is defined as

E[g(X )] =

R

Z

g(x1 , x2 , . . . , xn )fX1 ,X2 ,...,Xn (x1 , x2 , . . . xn )dx1 dx2 . . . dxn , (5)

Rn

where Rn is fl consecutive integrations from -∞ to ∞. If g is a function from R to Rm

, then the expected value of g is the element-wise expected values of the output vector, i.e., if g is

g1 (x)

g2 (x) g(x) = . ,

. gm (x)

Then, E[g (X )]

E[g2 (X )]

E[g(X )] = . . .

E[gm (X )]

Covariance matrix: For a given random vector X : I → Rn , its covariance matrix Σ is the fl × fl square matrix whose entries are given by Σij = C ov[Xi , Xj ].

From the definition of covariance, we have C ov[X1 , X1 ] · · · C ov[X1 , Xn ]

Σ = . . . . .

C ov[Xn , X1 ] · · · C ov[Xn , Xn ]

E[X 2 ] - E[X1 ]E[X1 ] · · · E[X1 Xn ] - E[X1 ]E[Xn ]

. . . . .

2

E[Xn X1 ] - E[Xn ]E[X1 ] · · · E[X n ] - E[Xn ]E[Xn ]

E[X 2 ] · · · E[X1 Xn ]

E[X1 ]E[X1 ] · · · E[X1 ]E[Xn ]

=

. . . . . -

. . . . .

E[Xn X1 ] · · · E[X 2 ] E[Xn ]E[X1 ] · · · E[Xn ]E[Xn ]

= E[X X T ] - E[X ]E[X ]T = . . . = E [(X - E[X ])(X - E[X ])T ]. where the matrix expectation is defined in the obvious way.

The covariance matrix has a number of useful properties:

- Σ 0; that is, Σ is positive semi definite.

- Σ = ΣT ; that is, Σ is symmetric.

4.3 The multivariate Gaussian distribution

One particularly important example of a probability distribution over random vectors X is called

the multivariate Gaussian or multivariate normal distribution. A random vector X e Rn is said

to have a multivariate normal (or Gaussian) distribution with mean µ e Rn and covariance matrix n n Σ e S++ (where S++ refers to the space of symmetric positive definite fl × fl matrices)

1 fX1 ,X2 ,...,Xn

(x1 , x2 , . . . , xn ; µ, Σ) = (2π)n/2 |Σ|1/2

exp

1 T —1 -

2 (x - µ) Σ (x

- µ) .

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 47: Ec6402 Notes

We write this as X ∼ N (µ, Σ). Notice that in the case fl = 1, this reduces the regular definition of a normal distribution with mean parameter µ1 and variance Σ11 .

Generally speaking, Gaussian random variables are extremely useful in machine learning and statistics for

two main reasons. First, they are extremely common when modeling ―noiseǁ in statistical algorithms. Quite

often, noise can be considered to be the accumulation of a large number of small independent random

perturbations affecting the measurement process; by the Central Limit Theorem, summations of independent

random variables will tend to ―look Gaussian.ǁ Second, Gaussian random variables are convenient for many

analytical manipulations, because many of the integrals involving Gaussian distributions that arise in practice

have simple closed form solutions.

GAUSSIAN PROCESS:

In probability theory and statistics, a Gaussian process is a stochastic process whose realizations

consist of random values associated with every point in a range of times (or of space) such that each

such random variable has a normal distribution. Moreover, every finite collection of those random variables

has a multivariate normal distribution.

Gaussian processes are important in statistical modeling because of properties inherited from the normal

distribution. For example, if a random process is modeled as a Gaussian process, the distributions of various

derived quantities can be obtained explicitly. Such quantities include: the average value of the process over a

range of times; the error in estimating the average using sample values at a small set of times.

A process is Gaussian if and only if for every finite set of indices t1, ..., tk in the index set T

is a vector-valued Gaussian random variable. Using characteristic functions of random variables, the

Gaussian property can be formulated as follows: Xt ; t ∈ T is Gaussian if and only if, for every finite

set of indices t1, ..., tk, there are reals σl j with σi i > 0 and reals µj such that

NOISE:

The numbers σl j and µj can be shown to be the covariances and means of the variables in the

process.

In common use, the word noise means any unwanted sound. In both analog and digital

electronics, noise is an unwanted perturbation to a wanted signal; it is called noise as a generalization of the

audible noise heard when listening to a weak radio transmission. Signal noise is heard as acoustic noise if

played through a loudspeaker; it manifests as 'snow' on a television or video image. Noise can block, distort,

change or interfere with the meaning of a message in human, animal and electronic communication.

In signal processing or computing it can be considered unwanted data without meaning; that is, data

that is not being used to transmit a signal, but is simply produced as an unwanted by-product of other activities.

"Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or

irrelevant data in a conversation or exchange, such as off-topic posts and spam in online discussion forums and

other online communities. In information theory, however, noise is still considered to be information. In a

broader sense, film grain or even advertisements encountered while looking for something else can be

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 48: Ec6402 Notes

considered noise. In biology, noise can describe the variability of a measurement around the mean, for

example transcriptional noise describes the variability in gene activity between cells in a population.

In many of these areas, the special case of thermal noise arises, which sets a fundamental lower limit

to what can be measured or signaled and is related to basic physical processes at the molecular level described

by well-established thermodynamics considerations, some of which are expressible by simple formulae.

SHOT NOISE:

Shot noise consists of random fluctuations of the electric current in an electrical conductor, which are

caused by the fact that the current is carried by discrete charges (electrons). The strength of this noise increases

for growing magnitude of the average current flowing through the conductor. Shot noise is to be distinguished

from current fluctuations in equilibrium, which happen without any applied voltage and without any average

current flowing. These equilibrium current fluctuations are known as Johnson-Nyquist noise.

Shot noise is important in electronics, telecommunication, and fundamental physics.

The strength of the current fluctuations can be expressed by giving the variance of the current I, where

<I> is the average ("macroscopic") current. However, the value measured in this way depends on the frequency

range of fluctuations which is measured ("bandwidth" of the measurement): The measured variance of the

current grows linearly with bandwidth. Therefore, a more fundamental quantity is the noise power, which is

essentially obtained by dividing through the bandwidth (and, therefore, has the dimension ampere squared

divided by Hertz). It may be defined as the zero-frequency Fourier transform of the current-current correlation

function.

THERMAL NOISE:

Thermal noise (Johnson–Nyquist noise, Johnson noise, or Nyquist noise) is the electronic noise

generated by the thermal agitation of the charge carriers (usually the electrons) inside an electrical conductor at

equilibrium, which happens regardless of any applied voltage.

Thermal noise is approximately white, meaning that the power spectral density is nearly equal

throughout the frequency spectrum (however see the section below on extremely high frequencies).

Additionally, the amplitude of the signal has very nearly a Gaussian probability density function.

This type of noise was first measured by John B. Johnson at Bell Labs in 1928. He described his

findings to Harry Nyquist, also at Bell Labs, who was able to explain the results.

Noise voltage and power

Thermal noise is distinct from shot noise, which consists of additional current fluctuations that occur

when a voltage is applied and a macroscopic current starts to flow. For the general case, the above definition

applies to charge carriers in any type of conducting medium (e.g. ions in an electrolyte), not just resistors. It

can be modeled by a voltage source representing the noise of the non-ideal resistor in series with an ideal noise

free resistor.

The power spectral density, or voltage variance (mean square) per hertz of bandwidth, is given by

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 49: Ec6402 Notes

where kB is Boltzmann's constant in joules per kelvin, T is the resistor's absolute temperature in kelvins, and R

is the resistor value in ohms (Ω). Use this equation for quick calculation:

For example, a 1 kΩ resistor at a temperature of 300 K has

For a given bandwidth, the root mean square (RMS) of the voltage, vn, is given by

where ∆f is the bandwidth in hertz over which the noise is measured. For a 1 kΩ resistor at room temperature

and a 10 kHz bandwidth, the RMS noise voltage is 400 nV. A useful rule of thumb to remember is that 50 Ω at

1 Hz bandwidth correspond to 1 nV noise at room temperature.

A resistor in a short circuit dissipates a noise power of

The noise generated at the resistor can transfer to the remaining circuit; the maximum noise power

transfer happens with impedance matching when the Thevenin equivalent resistance of the remaining circuit is

equal to the noise generating resistance. In this case each one of the two participating resistors dissipates noise

in both itself and in the other resistor. Since only half of the source voltage drops across any one of these

resistors, the resulting noise power is given by

where P is the thermal noise power in watts. Notice that this is independent of the noise generating resistance.

Noise current:

The noise source can also be modeled by a current source in parallel with the resistor by taking the Norton

equivalent that corresponds simply to divide by R. This gives the root mean square value of the current source

as:

Thermal noise is intrinsic to all resistors and is not a sign of poor design or manufacture, although resistors

may also have excess noise.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 50: Ec6402 Notes

Noise power in decibels:

Signal power is often measured in dBm (decibels relative to 1 milliwatt, assuming a 50 ohm load).

From the equation above, noise power in a resistor at room temperature, in dBm, is then:

where the factor of 1000 is present because the power is given in milliwatts, rather than watts. This equation

can be simplified by separating the constant parts from the bandwidth:

which is more commonly seen approximated as:

Noise power at different bandwidths is then simple to calculate:

Bandwidth (∆f) Thermal noise power Notes

1 Hz −174 dBm

10 Hz −164 dBm

100 Hz −154 dBm

1 kHz −144 dBm

10 kHz −134 dBm FM channel of 2-way radio

100 kHz −124 dBm

180 kHz −121.45 dBm One LTE resource block

200 kHz −120.98 dBm One GSM channel (ARFCN)

1 MHz −114 dBm

2 MHz −111 dBm Commercial GPS channel

6 MHz −106 dBm Analog television channel

20 MHz −101 dBm WLAN 802.11 channel

Thermal noise on capacitors:

Thermal noise on capacitors is referred to as kTC noise. Thermal noise in an RC circuit has an

unusually simple expression, as the value of the resistance (R) drops out of the equation. This is because higher

R contributes to more filtering as well as to more noise. The noise bandwidth of the RC circuit is 1/(4RC),

which can substituted into the above formula to eliminate R. The mean-square and RMS noise voltage

generated in such a filter are:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 51: Ec6402 Notes

Thermal noise accounts for 100% of kTC noise, whether it is attributed to the resistance or to the

capacitance.

In the extreme case of the reset noise left on a capacitor by opening an ideal switch, the resistance is

infinite, yet the formula still applies; however, now the RMS must be interpreted not as a time average, but as

an average over many such reset events, since the voltage is constant when the bandwidth is zero. In this sense,

the Johnson noise of an RC circuit can be seen to be inherent, an effect of the thermodynamic distribution of

the number of electrons on the capacitor, even without the involvement of a resistor.

The noise is not caused by the capacitor itself, but by the thermodynamic equilibrium of the amount

of charge on the capacitor. Once the capacitor is disconnected from a conducting circuit, the thermodynamic

fluctuation is frozen at a random value with standard deviation as given above.

The reset noise of capacitive sensors is often a limiting noise source, for example in image sensors.

As an alternative to the voltage noise, the reset noise on the capacitor can also be quantified as the electrical

charge standard deviation, as

Since the charge variance is kBTC, this noise is often called kTC noise.

Any system in thermal equilibrium has state variables with a mean energy of kT/2 per degree of

freedom. Using the formula for energy on a capacitor (E = ½CV2), mean noise energy on a capacitor can be

seen to also be ½C(kT/C), or also kT/2. Thermal noise on a capacitor can be derived from this relationship,

without consideration of resistance.

The kTC noise is the dominant noise source at small capacitors.

Noise of capacitors at 300 K

Capacitance Electrons

1 fF 2 mV 12.5 e–

10 fF 640 µV 40 e–

100 fF 200 µV 125 e–

1 pF 64 µV 400 e–

10 pF 20 µV 1250 e–

100 pF 6.4 µV 4000 e–

1 nF 2 µV 12500 e–

Noise at very high frequencies:

The above equations are good approximations at any practical radio frequency in use (i.e. frequencies

below about 80 gigahertz). In the most general case, which includes up to optical frequencies, the power

spectral density of the voltage across the resistor R, in V2/Hz is given by:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 52: Ec6402 Notes

where f is the frequency, h Planck's constant, kB Boltzmann constant and T the temperature in kelvins. If the

frequency is low enough, that means:

(this assumption is valid until few terahertz at room temperature) then the exponential can be expressed in

terms of its Taylor series. The relationship then becomes:

In general, both R and T depend on frequency. In order to know the total noise it is enough to

integrate over all the bandwidth. Since the signal is real, it is possible to integrate over only the positive

frequencies, then multiply by 2. Assuming that R and T are constants over all the bandwidth ∆f, then the root

mean square (RMS) value of the voltage across a resistor due to thermal noise is given by

that is, the same formula as above.

WHITE NOISE:

White noise is a random signal (or process) with a flat power spectral density. In other words, the

signal contains equal power within a fixed bandwidth at any center frequency. White noise draws its name

from white light in which the power spectral density of the light is distributed over the visible band in such a

way that the eye's three color receptors (cones) are approximately equally stimulated. In statistical sense, a

time series rt is called a white noise if rt is a sequence of independent and identically distributed (iid) random

variables with finite mean and variance. In particular, if rt is normally distributed with mean zero and variance

σ , the series is called a Gaussian white noise.

An infinite-bandwidth white noise signal is a purely theoretical construction. The bandwidth of white noise is

limited in practice by the mechanism of noise generation, by the transmission medium and by finite

observation capabilities. A random signal is considered "white noise" if it is observed to have a flat spectrum

over a medium's widest possible bandwidth.

WHITE NOISE IN A SPATIAL CONTEXT:

While it is usually applied in the context of frequency domain signals, the term white noise is also

commonly applied to a noise signal in the spatial domain. In this case, it has an auto correlation which can be

represented by a delta function over the relevant space dimensions. The signal is then "white" in the spatial

frequency domain (this is equally true for signals in the angular frequency domain, e.g., the distribution of a

signal across all angles in the night sky).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 53: Ec6402 Notes

STATISTICAL PROPERTIES:

The image to the right displays a finite length, discrete time realization of a white noise process

generated from a computer.

Being uncorrelated in time does not restrict the values a signal can take. Any distribution of values is

possible (although it must have zero DC components). Even a binary signal which can only take on the values

1 or -1 will be white if the sequence is statistically uncorrelated. Noise having a continuous distribution, such

as a normal distribution, can of course be white.

It is often incorrectly assumed that Gaussian noise (i.e., noise with a Gaussian amplitude distribution — see

normal distribution) is necessarily white noise, yet neither property implies the other. Gaussianity refers to the

probability distribution with respect to the value i.e. the probability that the signal has a certain given value,

while the term 'white' refers to the way the signal power is distributed over time or among frequencies.

We can therefore find Gaussian white noise, but also Poisson, Cauchy, etc. white noises. Thus, the

two words "Gaussian" and "white" are often both specified in mathematical models of systems. Gaussian white

noise is a good approximation of many real-world situations and generates mathematically tractable models.

These models are used so frequently that the term additive white Gaussian noise has a standard abbreviation:

AWGN. Gaussian white noise has the useful statistical property that its values are independent (see Statistical

independence).

White noise is the generalized mean-square derivative of the Wiener process or Brownian motion.

APPLICATIONS:

It is used by some emergency vehicle sirens due to its ability to cut through background noise, which

makes it easier to locate.

White noise is commonly used in the production of electronic music, usually either directly or as an

input for a filter to create other types of noise signal. It is used extensively in audio synthesis, typically to

recreate percussive instruments such as cymbals which have high noise content in their frequency domain.

It is also used to generate impulse responses. To set up the equalization (EQ) for a concert or other

performance in a venue, a short burst of white or pink noise is sent through the PA system and monitored from

various points in the venue so that the engineer can tell if the acoustics of the building naturally boost or cut

any frequencies. The engineer can then adjust the overall equalization to ensure a balanced mix.

White noise can be used for frequency response testing of amplifiers and electronic filters. It is not

used for testing loudspeakers as its spectrum contains too great an amount of high frequency content. Pink

noise is used for testing transducers such as loudspeakers and microphones.

White noise is a common synthetic noise source used for sound masking by a tinnitus masker.

[1]

White noise is a particularly good source signal for masking devices as it contains higher frequencies in equal

volumes to lower ones, and so is capable of more effective masking for high pitched ringing tones most

commonly perceived by tinnitus sufferers.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 54: Ec6402 Notes

White noise is used as the basis of some random number generators. For example, Random.org uses a

system of atmospheric antennae to generate random digit patterns from white noise.

White noise machines and other white noise sources are sold as privacy enhancers and sleep aids and

to mask tinnitus. Some people claim white noise, when used with headphones, can aid concentration by

masking irritating or distracting noises in a person's environment.

MATHEMATICAL DEFINITION:

White random vector:

A random vector is a white random vector if and only if its mean vector and autocorrelation matrix

are the following:

That is, it is a zero mean random vector, and its autocorrelation matrix is a multiple of the identity

matrix. When the autocorrelation matrix is a multiple of the identity, we say that it has spherical correlation.

White random process (white noise)

A continuous time random process w(t) where is a white noise process if and only if its mean

function and autocorrelation function satisfy the following:

i.e. it is a zero mean process for all time and has infinite power at zero time shift since its autocorrelation

function is the Dirac delta function.

The above autocorrelation function implies the following power spectral density.

since the Fourier transform of the delta function is equal to 1. Since this power spectral density is the same at

all frequencies, we call it white as an analogy to the frequency spectrum of white light.

A generalization to random elements on infinite dimensional spaces, such as random fields, is the

white noise measure.

Random vector transformations

Two theoretical applications using a white random vector are the simulation and whitening of another

arbitrary random vector. To simulate an arbitrary random vector, we transform a white random vector with a

carefully chosen matrix. We choose the transformation matrix so that the mean and covariance matrix of the

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 55: Ec6402 Notes

transformed white random vector matches the mean and covariance matrix of the arbitrary random vector that

we are simulating. To whiten an arbitrary random vector, we transform it by a different carefully chosen matrix

so that the output random vector is a white random vector.

These two ideas are crucial in applications such as channel estimation and channel equalization in

communications and audio. These concepts are also used in data compression.

Simulating a random vector

Suppose that a random vector has covariance matrix Kxx. Since this matrix is Hermitian symmetric

and positive semi definite, by the spectral theorem from linear algebra, we can diagonalize or factor the matrix

in the following way.

where E is the orthogonal matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues.

We can simulate the 1st and 2nd moment properties of this random vector with mean and

covariance matrix Kxx via the following transformation of a white vector of unit variance:

where

Thus, the output of this transformation has expectation

and covariance matrix

Whitening a random vector

The method for whitening a vector with mean and covariance matrix Kxx is to perform the

following calculation:

Thus, the output of this transformation has expectation

and covariance matrix

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 56: Ec6402 Notes

By diagonalizing Kxx, we get the following:

Thus, with the above transformation, we can whiten the random vector to have zero mean and the

identity covariance matrix.

Random signal transformations

We cannot extend the same two concepts of simulating and whitening to the case of continuous time

random signals or processes. For simulating, we create a filter into which we feed a white noise signal. We

choose the filter so that the output signal simulates the 1st and 2nd moments of any arbitrary random process.

For whitening, we feed any arbitrary random signal into a specially chosen filter so that the output of the filter

is a white noise signal.

Simulating a continuous-time random signal

White noise fed into a linear, time-invariant filter to simulate the 1st and 2nd moments of an arbitrary

random process.

We can simulate any wide-sense stationary, continuous-time random process with constant

mean µ and covariance function

and power spectral density

We can simulate this signal using frequency domain techniques.

Because Kx(τ) is Hermitian symmetric and positive semi-definite, it follows that Sx(ω) is real and can be

factored as

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 57: Ec6402 Notes

if and only if Sx(ω) satisfies the Paley-Wiener criterion.

If Sx(ω) is a rational function, we can then factor it into pole-zero form as

Choosing a minimum phase H(ω) so that its poles and zeros lie inside the left half s-plane, we can

then simulate x(t) with H(ω) as the transfer function of the filter.

We can simulate x(t) by constructing the following linear, time-invariant filter

where w(t) is a continuous-time, white-noise signal with the following 1st and 2nd moment properties:

Thus, the resultant signal has the same 2nd moment properties as the desired signal x(t).

Whitening a continuous-time random signal

An arbitrary random process x(t) fed into a linear, time-invariant filter that whitens x(t) to create

white noise at the output.

Suppose we have a wide-sense stationary, continuous-time random process

defined with the same mean µ, covariance function Kx(τ), and power spectral density Sx(ω) as above.

We can whiten this signal using frequency domain techniques. We factor the power spectral density

Sx(ω) as described above.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 58: Ec6402 Notes

Choosing the minimum phase H(ω) so that its poles and zeros lie inside the left half s-plane, we can

then whiten x(t) with the following inverse filter

We choose the minimum phase filter so that the resulting inverse filter is stable. Additionally, we

must be sure that H(ω) is strictly positive for all so that Hinv(ω) does not have any singularities.

The final form of the whitening procedure is as follows:

so that w(t) is a white noise random process with zero mean and constant, unit power spectral density

Note that this power spectral density corresponds to a delta function for the covariance function of w(t).

Narrowband Noise Representation

In most communication systems, we are often dealing with band-pass filtering of

signals. Wideband noise will be shaped into band limited noise. If the bandwidth of the

band limited noise is relatively small compared to the carrier frequency, we refer to this as

narrowband noise.

We can derive the power spectral density G n(f) and the auto-correlation

function R nn( ) of the narrowband noise and use them to analyze the performance

of linear systems. In practice, we often deal with mixing (multiplication), which is a

non-linear operation, and the system analysis becomes difficult. In such a case, it is

useful to express the narrowband noise as

n(t) = x(t) cos 2 fct - y(t) sin 2 fct (1)

where fc is the carrier frequency within the band occupied by the noise. x(t) and y(t)

are known as the quadrature components of the noise n(t). The Hibert transform of

n(t) is

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 59: Ec6402 Notes

n (t) = H[n(t)] = x(t) sin 2 fct + y(t) cos 2 fct (2)

Proof:

The Fourier transform of n (t) is

N(f) = 1

X(f - fc) + 1

X(f+ fc) + 1

jY(f- fc) - 1

jY(f+ fc)

2 2 2 2

Let N^

( f ) be the Fourier transform of n ( t ). In the frequenc y domain,

N^

(f) = N (f)[-j sgn(f)]. We simply multiply all positive frequency components of

N(f) by -j and all negative frequency components of N(f) by j. Thus,

N^

(f) = -j 1

X(f-fc)+ j 1

X(f+ fc) - j 1

jY(f- fc) - j 1

2 2 2 2

jY(f+ fc)

N^

(f) = -j 1

X(f - fc) + j 1

X(f+ fc) + 1

Y(f- fc) + 1

Y(f+ fc)

2 2 2 2

and the inverse Fourier transform of N^

(f) is

n (t) = x(t) sin 2 fct + y(t) cos 2 fct

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 60: Ec6402 Notes

The quadrature components x(t) and y(t) can now be derived from equations (1)

and (2).

x(t) = n(t)cos 2 fct + n (t)sin 2 fct (3)

and

y(t) = n(t)cos 2 fct - n (t)sin 2 fct (4)

Given n(t), the quadrature components x(t) and y(t) can be obtained by using the

arrangement.

x(t) and y(t) have the following properties:

1 . E[x(t) y(t)] = 0. x(t) and y(t) are uncorrelated with each other.

2 . x(t) and y(t) have the same means and variances as n(t).

3 . If n(t) is Gaussian, then x(t) and y(t) are also Gaussian.

4 . x(t) and y (t) have identical power spectral densities, related to the power

spectral density of n(t) by

Gx(f) = Gy(f) = Gn(f- fc) + Gn(f+ fc) (5)

for fc - 0.5B < | f | < fc + 0.5B and B is the bandwidth of n(t).

Proof:

Equation (5) is the key that will enable us to calculate the effect of noise on AM and FM

systems. It implies that the power spectral density of x(t) and y(t) can be found by

shifting the positive portion and negative portion of Gn(f) to zero frequency and adding to

give Gx(f) and Gy(f).

In the special case where G n(f) is symmetrical about the carrier frequency fc,

the positive- and negative-frequency contributions are shifted to zero frequency and added to

give

Gx(f) = Gy(f) = 2Gn(f- fc) = 2Gn(f+ fc) (6)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 61: Ec6402 Notes

Performance of Binary FSK:

Consider the synchronous detector of binary FSK signals. In the presence of additive white Gaussian

noise (AWGN), w(t), the received signal is

r(t) = Acos 2 fc1t + w(t)

where A is a constant and fc1 is the carrier frequency employed if a 1 has been sent. The signals at the output of

the band-pass filters of centre frequencies fc1 and fc2

are

r1(t) = Acos 2 fc1t + n1(t)

and

where

and

r2(t) = n2(t)

n1(t) = x1(t) cos 2 fc1t - y1(t) sin 2 fc1

t

n2(t) = x2(t) cos 2 fc2t - y2(t) sin 2 fc2

t

are the narrowband noise. With appropriate design of low-pass filter and sampling period, the sampled output

signals are

vo1 = A + x1

vo2 = x2 and

v = A + [x1 - x2].

x1 and x2 are statistically independent Gaussian random variables with zero mean and fixed variance 2

= N, where N is the power of the random variable. It can be seen that one of the detectors has signal plus

noise, the other detector has noise only.

When fc2

is the carrier frequency employed for sending a 0, the received signal is

r(t) = Acos 2 fc2t + w(t). It can be

shown that

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 62: Ec6402 Notes

v = -A + [x1 - x2]

Sinc e E [ x 1 - x 2 ]

2 = E [ x 1 ] 2 - 2E [ x 1 x 2 ]

2 + E [ x 2 ] 2 = E [ x 1 ]

2 + E [ x 2 ] 2 = 2 + 2 , the total

variance t2 = 2 2 .

NOISE TEMPERATURE:

In electronics, noise temperature is a temperature (in Kelvin‘s) assigned to a component such that the

noise power delivered by the noisy component to a noiseless matched resistor is given by

PRL = kBTsBn

in watts, where:

is the Boltzmann constant (1.381×10−23 J/K, joules per Kelvin)

is the noise temperature (K)

is the noise bandwidth (Hz)

Engineers often model noisy components as an ideal component in series with a noisy resistor. The source resistor

is often assumed to be at room temperature, conventionally taken as 290 K (17 °C, 62 °F).

APPLICATIONS:

A communications system is typically made up of a transmitter, a communications channel, and a

receiver. The communications channel may consist of any one or a combination of many different physical media

(air, coaxial cable, printed wiring board traces…). The important thing to note is that no matter what physical

media the channel consists of, the transmitted signal will be randomly corrupted by a number of different

processes. The most common form of signal degradation is called additive noise.

The additive noise in a receiving system can be of thermal origin (thermal noise) or can be from other noise-

generating processes. Most of these other processes generate noise whose spectrum and probability distributions

are similar to thermal noise. Because of these similarities, the contributions of all noise sources can be lumped

together and regarded as thermal noise. The noise power generated by all these sources ( ) can be described by

assigning to the noise a noise temperature ( ) defined as:

Tn = Pn / (kBn)

In a wireless communications receiver, would equal the sum of two noise temperatures:

Tn = (Tant + Tsys)

is the antenna noise temperature and determines the noise power seen at the output of the antenna. The

physical temperature of the antenna has no affect on . is the noise temperature of the receiver circuitry

and is representative of the noise generated by the non-ideal components inside the receiver.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 63: Ec6402 Notes

NOISE FACTOR AND NOISE FIGURE:

An important application of noise temperature is its use in the determination of a component‘s noise

factor. The noise factor quantifies the noise power that the component adds to the system when its input noise

temperature is .

The noise factor (a linear term) can be converted to noise figure (in decibels) using:

NOISE TEMPERATURE OF A CASCADE:

If there are multiple noisy components in cascade, the noise temperature of the cascade can be calculated

using the Friis equation:

where

= cascade noise temperature

= noise temperature of the first component in the cascade

= noise temperature of the second component in the cascade

= noise temperature of the third component in the cascade

= noise temperature of the nth component in the cascade

= linear gain of the first component in the cascade

= linear gain of the second component in the cascade

= linear gain of the third component in the cascade

= linear gain of the (n-1) component in the cascade

Components early in the cascade have a much larger influence on the overall noise temperature than those

later in the chain. This is because noise introduced by the early stages is, along with the signal, amplified by the

later stages. The Friis equation shows why a good quality preamplifier is important in a receive chain.

MEASURING NOISE TEMPERATURE:

The direct measurement of a component‘s noise temperature is a difficult process. Suppose that the noise

temperature of a low noise amplifier (LNA) is measured by connecting a noise source to the LNA with a piece of

transmission line. From the cascade noise temperature it can be seen that the noise temperature of the transmission

line ( ) has the potential of being the largest contributor to the output measurement (especially when you consider

that LNA‘s can have noise temperatures of only a few Kelvin). To accurately measure the noise temperature of the

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 64: Ec6402 Notes

LNA the noise from the input coaxial cable needs to be accurately known. This is difficult because poor surface

finishes and reflections in the transmission line make actual noise temperature values higher than those predicted

by theoretical analysis.

Similar problems arise when trying to measure the noise temperature of an antenna. Since the noise

temperature is heavily dependent on the orientation of the antenna, the direction that the antenna was pointed

during the test needs to be specified. In receiving systems, the system noise temperature will have three main

contributors, the antenna ( ), the transmission line ( ), and the receiver circuitry ( ). The antenna noise

temperature is considered to be the most difficult to measure because the measurement must be made in the field

on an open system. One technique for measuring antenna noise temperature involves using cryogenically cooled

loads to calibrate a noise figure meter before measuring the antenna. This provides a direct reference comparison at

a noise temperature in the range of very low antenna noise temperatures, so that little extrapolation of the collected

data is required.

NOISE FIGURE:

Noise figure (NF) is a measure of degradation of the signal-to-noise ratio (SNR), caused by components

in a radio frequency (RF) signal chain. The noise figure is defined as the ratio of the output noise power of a device

to the portion thereof attributable to thermal noise in the input termination at standard noise temperature T0 (usually

290 K). The noise figure is thus the ratio of actual output noise to that which would remain if the device itself did

not introduce noise. It is a number by which the performance of a radio receiver can be specified.

The noise figure is the difference in decibels (dB) between the noise output of the actual receiver to the

noise output of an ―idealǁ receiver with the same overall gain and bandwidth when the receivers are connected to

sources at the standard noise temperature T0 (usually 290 K). The noise power from a simple load is equal to kTB,

where k is Boltzmann's constant, T is the absolute temperature of the load (for example a resistor), and B is the

measurement bandwidth.

This makes the noise figure a useful figure of merit for terrestrial systems where the antenna effective

temperature is usually near the standard 290 K. In this case, one receiver with a noise figure say 2 dB better than

another, will have an output signal to noise ratio that is about 2 dB better than the other. However, in the case of

satellite communications systems, where the antenna is pointed out into cold space, the antenna effective

temperature is often colder than 290 K. In these cases a 2 dB improvement in receiver noise figure will result in

more than a 2 dB improvement in the output signal to noise ratio. For this reason, the related figure of effective

noise temperature is therefore often used instead of the noise figure for characterizing satellite-communication

receivers and low noise amplifiers.

In heterodyne systems, output noise power includes spurious contributions from image-frequency

transformation, but the portion attributable to thermal noise in the input termination at standard noise temper ature

includes only that which appears in the output via the principal frequency transformation of the system and

excludes that which appears via the image frequency transformation.

DEFINITION:

The noise factor of a system is defined as:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 65: Ec6402 Notes

where SNRin and SNRout are the input and output power signal-to-noise ratios, respectively. The noise figure is

defined as:

where SNRin,dB and SNRout,dB are in decibels (dB). The noise figure is the noise factor, given in dB:

These formulae are only valid when the input termination is at standard noise temperature T0, although in practice

small differences in temperature do not significantly affect the values.

The noise factor of a device is related to its noise temperature Te:

Devices with no gain (e.g., attenuators) have a noise factor F equal to their attenuation L (absolute value, not in dB)

when their physical temperature equals T0. More generally, for an attenuator at a physical temperature T, the noise

temperature is Te = (L − 1)T, giving a noise factor of:

If several devices are cascaded, the total noise factor can be found with Friis' Formula:

where Fn is the noise factor for the n-th device and Gn is the power gain (linear, not in dB) of the n-th device. In a

well designed receive chain, only the noise factor of the first amplifier should be significant.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 66: Ec6402 Notes

IMPORTANT QUESTIONS

PART A

1. Define noise figure.

2. What is white noise?

3. What is thermal noise? Give the expression for the thermal noise voltage across a resistor.

4. What is shot noise?

5. Define noise temperature.

6. Find the thermal noise voltage developed across a resistor of 700ohm. The bandwidth of the

measuring instrument is 7MHz and the ambient temperature is 27‘C.

7. Define a random variable?

8. What is a random process?

9. What is Gaussian process?

10. What is a stationary random process?

PART B

1. Derive the effective noise temperature of a cascade amplifier. Explain how the various

noises are generated in the method of representing them. (16)

2. Explain how the various noises are generated and the method of representing them. (16)

3. Write notes on noise temperature and noise figure. (8)

4. Derive the noise figure for cascade stages. (8)

5. What is narrowband noise discuss the properties of the quadrature components of a

narrow band noise. (8)

6. What is meant by noise equivalent bandwidth? Illustrate it with a diagram. (8)

7. Derive the expression for output signal to noise for a DSB-SC receiver using coherent

detection. (16)

8. Write short notes on noise in SSB. (16)

9. Discuss the following: . (16)

i) Noise equivalent bandwidth (4)

ii) Narrow band noise (4)

iii) Noise temperature (4)

iv) Noise spectral density (4)

12. How sine wave plus noise is represented? Obtain the joint PDF of such noise

Component. (16)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 67: Ec6402 Notes

UNIT IV NOISE CHARACTERIZATION

Superheterodyne radio receiver and its characteristic.

SNR.

Noise in DSBSC systems using coherent detection.

Noise in AM system using envelope detection FM system.

FM threshold effect.

Pre-emphasis and de-emphasis in FM.

Comparison of performances.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 68: Ec6402 Notes

SUPERHETERODYNE RADIO RECEIVER:

In electronics, a superheterodyne receiver uses frequency mixing or heterodyning to convert a received

signal to a fixed intermediate frequency, which can be more conveniently processed than the original radio carrier

frequency. Virtually all modern radio and television receivers use the superheterodyne principle.

DESIGN AND EVOLUTION:

Schematic of a typical superheterodyne receiver.

The diagram at right shows the minimum requirements for a single-conversion superheterodyne receiver

design. The essential elements are common to all superheterodyne circuits. A signal receiving antenna, a

broadband r.f. amplifier, a variable frequency local oscillator, a frequency mixer, a band pass filter to remove

unwanted mixer product signals, a demodulator to recover the original audio signal. Cost-optimized designs use

one active device for both local oscillator and mixer— called a "converter" stage. One example is the pentagrid

converter.

Circuit description:

A suitable antenna is required to receive the chosen range of broadcast signals. The signal received is very

small, sometimes only a few microvolt‘s. Reception starts with the antenna signal fed to the R.F. stage. The R.F.

amplifier stage must be selectively tuned to pass only the desired range of channels required. To allow the receiver

to be tuned to a particular broadcast channel a method of changing the frequency of the local oscillator is needed.

The tuning circuit in a simple design may use a variable capacitor, or varicap diode. Only one or two tuned stages

need to be adjusted to track over the tuning range of the receiver.

Mixer stage

The signal is then fed into the mixer stage circuit. The mixer is also fed with a signal from the variable

frequency local oscillator (VFO) circuit. The mixer produces both sum and difference beat frequencies signals

catch one containing a copy of the desired signal. The four frequencies at the output include the wanted signal fd,

the original fLO, and the two new frequencies fd+fLO and fd-fLO. The output signal also contains a number of

undesirable frequencies. These are 3rd- and higher-order inter modulation products. These multiple signals are

removed by the R.F. bandpass filter, leaving only the desired offset I.F frequency signal fIF which contains the

original broadcast information fd.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 69: Ec6402 Notes

Intermediate frequency stage:

All the intermediate-frequency stages operate at a fixed frequency which need not be adjusted. [6]

The I.F.

amplifier section fIF is tuned to be highly selective. By changing fLO, the resulting fd-fLO (or fd+fLO) signal can be

tuned to the amplifier's fIF. the suitably amplified signal includes the frequency the user wishes to tune, fd. The local

oscillator is tuned to produce a frequency close to fd, fLO. In typical amplitude modulation ("AM radio" in the U.S.,

or MW) receivers, that frequency is 455 kHz;[10]

for FM receivers, it is usually 10.7 MHz; for television 33.4 to

45.75 MHz.

Other signals from the mixed output of the heterodyne are filtered out by this stage. This depends on the

intermediate frequency chosen in the design process. Typically it is 455 kHz for a single stage conversion receiver.

The higher the chosen I.F. offset will reduce the effect interference from powerful radio transmissions in adjacent

broadcast bands will have on the required signal.

Usually the intermediate frequency is lower than either the carrier or oscillator frequencies, but with some

types of receiver (e.g. scanners and spectrum analyzers) it is more convenient to use a higher intermediate

frequency. In order to avoid interference to and from signal frequencies close to the intermediate frequency, in

many countries IF frequencies are controlled by regulatory authorities. Examples of common IFs are 455 kHz for

medium-wave AM radio, 10.7 MHz for FM, 38.9 MHz (Europe) or 45 MHz (US) for television, and 70 MHz for

satellite and terrestrial microwave equipment.

Bandpass filter:

The filter must have a band pass range equal to or lesser than the frequency spacing between adjacent

broadcast channels. A perfect filter would have high attenuation factor to adjacent channels, but with a broad

bandpass response to obtain a better quality of received signal. This may be designed with a dual frequency tuned

coil filter design, or a multi pole ceramic crystal filter.

Demodulation:

The received signal is now processed by the demodulator stage where the broadcast, (usually audio, but

may be data), signal is recovered and amplified. A.M. demodulation requires the simple rectification of the R.F.

signal to remove one sideband, and a simple resistor and capacitor low pass RC filter to remove the high frequency

R.F. carrier component. Other modes of transmission will require more specialized circuits to recover the

broadcast signal. The remaining audio signal is then amplified and fed to a suitable transducer, such as a

loudspeaker or headphones.

Advanced designs:

To overcome obstacles such as image response, multiple IF stages are used, and in some cases multiple

stages with two IFs of different values are used. For example, the front end might be sensitive to 1–30 MHz, the

first half of the radio to 5 MHz, and the last half to 50 kHz. Two frequency converters would be used, and the radio

would be a double conversion superheterodyne; a common example is a television receiver where the audio

information is obtained from a second stage of intermediate-frequency conversion. Receivers which are tunable

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 70: Ec6402 Notes

over a wide bandwidth (e.g. scanners) may use an intermediate frequency higher than the signal, in order to

improve image rejection.

Other uses:

In the case of modern television receivers, no other technique was able to produce the precise bandpass

characteristic needed for vestigial sideband reception, first used with the original NTSC system introduced in 1941.

This originally involved a complex collection of tunable inductors which needed careful adjustment, but since the

1970s or early 1980s these have been replaced with precision electromechanical surface acoustic wave (SAW)

filters. Fabricated by precision laser milling techniques, SAW filters are cheaper to produce, can be made to

extremely close tolerances, and are stable in operation. To avoid tooling costs associated with these components

most manufacturers then tended to design their receivers around the fixed range of frequencies offered which

resulted in de-facto standardization of intermediate frequencies.

Modern designs:

Microprocessor technology allows replacing the superheterodyne receiver design by a software defined

radio architecture, where the IF processing after the initial IF filter is implemented in software. This technique is

already in use in certain designs, such as very low-cost FM radios incorporated into mobile phones, since the

system already has the necessary microprocessor.

Radio transmitters may also use a mixer stage to produce an output frequency, working more or less as the

reverse of a superheterodyne receiver.

Technical advantages:

Superheterodyne receivers have superior characteristics to simpler receiver types in frequency stability

and selectivity. They offer better stability than Tuned radio frequency receivers (TRF) because a tunable oscillator

is more easily stabilized than a tunable amplifier, especially with modern frequency synthesizer technology. IF

filters can give narrower pass bands at the same Q factor than an equivalent RF filter. A fixed IF also allows the

use of a crystal filter when exceptionally high selectivity is necessary. Regenerative and super-regenerative

receivers offer better sensitivity than a TRF receiver, but suffer from stability and selectivity problems.

Drawbacks of this design:

High-side and low-side injection:

The amount that a signal is down-shifted by the local oscillator depends on whether its frequency f is

higher or lower than fLO. That is because its new frequency is |f − fLO| in either case. Therefore, there are potentially

two signals that could both shift to the same fIF; one at f = fLO + fIF and another at f = fLO − fIF. One of those signals,

called the image frequency, has to be filtered out prior to the mixer to avoid aliasing. When the upper one is filtered

out, it is called high-side injection, because fLO is above the frequency of the received signal. The other case is

called low-side injection. High-side injection also reverses the order of a signal's frequency components. Whether

that actually changes the signal depends on whether it has spectral symmetry. The reversal can be undone later in

the receiver, if necessary.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 71: Ec6402 Notes

Image Frequency (fimage):

One major disadvantage to the superheterodyne receiver is the problem of image frequency. In heterodyne

receivers, an image frequency is an undesired input frequency equal to the station frequency plus twice the

intermediate frequency. The image frequency results in two stations being received at the same time, thus

producing interference. Image frequencies can be eliminated by sufficient attenuation on the incoming signal by

the RF amplifier filter of the superheterodyne receiver.

Early Autodyne receivers typically used IFs of only 150 kHz or so, as it was difficult to maintain reliable

oscillation if higher frequencies were used. As a consequence, most Autodyne receivers needed quite elaborate

antenna tuning networks, often involving double-tuned coils, to avoid image interference. Later super heterodynes

used tubes especially designed for oscillator/mixer use, which were able to work reliably with much higher IFs,

reducing the problem of image interference and so allowing simpler and cheaper aerial tuning circuitry.

For medium-wave AM radio, a variety of IFs have been used, but usually 455 kHz is used.

Local oscillator radiation:

It is difficult to keep stray radiation from the local oscillator below the level that a nearby receiver can

detect. The receiver's local oscillator can act like a miniature CW transmitter. This means that there can be mutual

interference in the operation of two or more superheterodyne receivers in close proximity. In espionage, oscillator

radiation gives a means to detect a covert receiver and its operating frequency. One effective way of preventing the

local oscillator signal from radiating out from the receiver's antenna is by adding a shielded and power supply

decoupled stage of RF amplification between the receiver's antenna and its mixer stage.

Local oscillator sideband noise:

Local oscillators typically generate a single frequency signal that has negligible amplitude modulation but

some random phase modulation. Either of these impurities spreads some of the signal's energy into sideband

frequencies. That causes a corresponding widening of the receiver's frequency response, which would defeat the

aim to make a very narrow bandwidth receiver such as to receive low-rate digital signals. Care needs to be taken to

minimize oscillator phase noise, usually by ensuring that the oscillator never enters a non-linear mode.

SIGNAL-TO-NOISE RATIO:

Signal-to-noise ratio (often abbreviated SNR or S/N) is a measure used in science and engineering to

quantify how much a signal has been corrupted by noise. It is defined as the ratio of signal power to the noise

power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise. While SNR is commonly

quoted for electrical signals, it can be applied to any form of signal (such as isotope levels in an ice core or

biochemical signaling between cells).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 72: Ec6402 Notes

In less technical terms, signal-to-noise ratio compares the level of a desired signal (such as music) to the

level of background noise. The higher the ratio, the less obtrusive the background noise is.

"Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or

irrelevant data in a conversation or exchange. For example, in online discussion forums and other online

communities, off-topic posts and spam are regarded as "noise" that interferes with the "signal" of appropriate

discussion.

FM DEMODULATORS AND THRESHOLD EFFECT:

An important aspect of analogue FM satellite systems is FM threshold effect. In FM systems where the

signal level is well above noise received carrier-to-noise ratio and demodulated signal-to-noise ratio are related by:

The expression however does not apply when the carrier-to-noise ratio decreases below a certain point.

Below this critical point the signal-to-noise ratio decreases significantly. This is known as the FM threshold effect

(FM threshold is usually defined as the carrier-to-noise ratio at which the demodulated signal-to-noise ratio fall 1

dB below the linear relationship. It generally is considered to occur at about 10 dB).

Below the FM threshold point the noise signal (whose amplitude and phase are randomly varying), may

instantaneously have an amplitude greater than that of the wanted signal. When this happens the noise will produce

a sudden change in the phase of the FM demodulator output. In an audio system this sudden phase change makes a

"click". In video applications the term "click noise" is used to describe short horizontal black and white lines that

appear randomly over a picture.

Because satellite communications systems are power limited they usually operate with only a small design

margin above the FM threshold point (perhaps a few dB). Because of this circuit designers have tried to devise

techniques to delay the onset of the FM threshold effect. These devices are generally known as FM threshold

extension demodulators. Techniques such as FM feedback, phase locked loops and frequency locked loops are used

to achieve this effect. By such techniques the onset of FM threshold effects can be delayed till the C/N ratio is

around 7 dB

Pre-emphasis and de-emphasis:

Random noise has a 'triangular' spectral distribution in an FM system, with the effect that noise occurs

predominantly at the highest frequencies within the baseband. This can be offset, to a limited extent, by boosting

the high frequencies before transmission and reducing them by a corresponding amount in the receiver. Reducing

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 73: Ec6402 Notes

the high frequencies in the receiver also reduces the high-frequency noise. These processes of boosting and then

reducing certain frequencies are known as pre-emphasis and de-emphasis, respectively.

The amount of pre-emphasis and de-emphasis used is defined by the time constant of a simple RC

filter circuit. In most of the world a 50 µs time constant is used. In North America, 75 µs is used. This applies to

both mono and stereo transmissions and to baseband audio (not the subcarriers).

The amount of pre-emphasis that can be applied is limited by the fact that many forms of contemporary

music contain more high-frequency energy than the musical styles which prevailed at the birth of FM broadcasting.

They cannot be pre-emphasized as much because it would cause excessive deviation of the FM carrier. (Systems

more modern than FM broadcasting tend to use either programme-dependent variable pre-emphasis—e.g. dbx in

the BTSC TV sound system—or none at all.)

FM stereo:

In the late 1950s, several systems to add stereo to FM radio were considered by the FCC. Included were

systems from 14 proponents including Crosley, Halstead, Electrical and Musical Industries, Ltd (EMI), Zenith

Electronics Corporation and General Electric. The individual systems were evaluated for their strengths and

weaknesses during field tests in Uniontown, Pennsylvania using KDKA-FM in Pittsburgh as the originating

station. The Crosley system was rejected by the FCC because it degraded the signal-to-noise ratio of the main

channel and did not perform well under multipath RF conditions. In addition, it did not allow for SCA services

because of its wide FM sub-carrier bandwidth. The Halstead system was rejected due to lack of high frequency

stereo separation and reduction in the main channel signal-to-noise ratio. The GE and Zenith systems, so similar

that they were considered theoretically identical, were formally approved by the FCC in April 1961 as the standard

stereo FM broadcasting method in the USA and later adopted by most other countries.

It is important that stereo broadcasts should be compatible with mono receivers. For this reason, the left

(L) and right (R) channels are algebraically encoded into sum (L+R) and difference (L−R) signals. A mono

receiver will use just the L+R signal so the listener will hear both channels in the single loudspeaker. A stereo

receiver will add the difference signal to the sum signal to recover the left channel, and subtract the difference

signal from the sum to recover the right channel.

The (L+R) Main channel signal is transmitted as baseband audio in the range of 30 Hz to 15 kHz. The

(L−R) Sub-channel signal is modulated onto a 38 kHz double-sideband suppressed carrier (DSBSC) signal

occupying the baseband range of 23 to 53 kHz.

A 19 kHz pilot tone, at exactly half the 38 kHz sub-carrier frequency and with a precise phase relationship

to it, as defined by the formula below, is also generated. This is transmitted at 8–10% of overall modulation level

and used by the receiver to regenerate the 38 kHz sub-carrier with the correct phase.

The final multiplex signal from the stereo generator contains the Main Channel (L+R), the pilot tone, and

the sub-channel (L−R). This composite signal, along with any other sub-carriers, modulates the FM transmitter.

The instantaneous deviation of the transmitter carrier frequency due to the stereo audio and pilot tone (at

10% modulation) is:

[2]

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 74: Ec6402 Notes

Where A and B are the pre-emphasized Left and Right audio signals and fp is the frequency of the pilot tone. Slight

variations in the peak deviation may occur in the presence of other subcarriers or because of local regulations.

Converting the multiplex signal back into left and right audio signals is performed by a stereo decoder,

which is built into stereo receivers.

In order to preserve stereo separation and signal-to-noise parameters, it is normal practice to apply pre-

emphasis to the left and right channels before encoding, and to apply de-emphasis at the receiver after decoding.

Stereo FM signals are more susceptible to noise and multipath distortion than are mono FM signals.

In addition, for a given RF level at the receiver, the signal-to-noise ratio for the stereo signal will be worse

than for the mono receiver. For this reason many FM stereo receivers include a stereo/mono switch to allow

listening in mono when reception conditions are less than ideal, and most car radios are arranged to reduce the

separation as the signal-to-noise ratio worsens, eventually going to mono while still indicating a stereo signal is

being received.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 75: Ec6402 Notes

PART B

1. Define Hilbert Transform with a suitable example. Give the method of generation and

detection of SSB waver. . (16)

2. Discuss the noise performance of AM system using envelope detection. (16)

3. Compare the noise performance of AM and FM systems. (16)

4. Explain the significance of pre-emphasis and de-emphasis in FM system? (8)

5. Derive the noise power spectral density of the FM demodulation and explain its

performance with diagram. (16)

6. Draw the block diagram of FM demodulator and explain the effect of noise in detail.

Explain the FM threshold effect and capture effect in FM? (16)

7. Explain the FM receiver with block diagram. (8)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 76: Ec6402 Notes

UNIT V

INFORMATION THEORY

Discrete messages and information content.

Concept of amount of information.

Average information.

Entropy.

Information rate.

Source coding to increase average information per bit.

Shannon-fano coding.

Huffman coding.

Lempel-Ziv (LZ) coding.

Shannon‘s theorem.

Channel capacity.

Bandwidth.

S/N trade-off.

Mutual information.

Channel capacity.

Rate distortion theory.

Lossy source coding.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 77: Ec6402 Notes

INFORMATION THEORY:

Information theory is a branch of applied mathematics and electrical engineering involving the

quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits

on signal processing operations such as compressing data and on reliably storing and communicating data. Since its

inception it has broadened to find applications in many other areas, including statistical inference, natural language

processing, cryptography generally, networks other than communication networks — as inneurobiology, the

evolution and function of molecular codes, model selection in ecology, thermal physics, quantum computing,

plagiarism detection and other forms of data analysis.

A key measure of information is known as entropy, which is usually expressed by the average number of

bits needed for storage or communication. Entropy quantifies the uncertainty involved in predicting the value of

a random variable. For example, specifying the outcome of a fair coin flip (two equally likely outcomes) provides

less information (lower entropy) than specifying the outcome from a roll of a die (six equally likely outcomes).

Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP

files), lossy data compression (e.g. MP3s), and channel coding (e.g. for DSL lines). The field is at the intersection

of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. Its impact has been

crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of

mobile phones, the development of the Internet, the study of linguistics and of human perception, the

understanding of black holes, and numerous other fields. Important sub-fields of information theory are source

coding, channel coding, algorithmic complexity theory, algorithmic information theory, information-theoretic

security, and measures of information.

OVERVIEW:

The main concepts of information theory can be grasped by considering the most widespread means of

human communication: language. Two important aspects of a concise language are as follows: First, the most

common words (e.g., "a", "the", "I") should be shorter than less common words (e.g., "benefit", "generation",

"mediocre"), so that sentences will not be too long. Such a tradeoff in word length is analogous to data

compression and is the essential aspect of source coding. Second, if part of a sentence is unheard or misheard due

to noise — e.g., a passing car — the listener should still be able to glean the meaning of the underlying message.

Such robustness is as essential for an electronic communication system as it is for a language; properly building

such robustness into communications is done by channel coding. Source coding and channel coding are the

fundamental concerns of information theory.

Note that these concerns have nothing to do with the importance of messages. For example, a platitude

such as "Thank you; come again" takes about as long to say or write as the urgent plea, "Call an ambulance!" while

the latter may be more important and more meaningful in many contexts. Information theory, however, does not

consider message importance or meaning, as these are matters of the quality of data rather than the quantity and

readability of data, the latter of which is determined solely by probabilities.

Information theory is generally considered to have been founded in 1948 by Claude Shannon in his

seminal work, "A Mathematical Theory of Communication". The central paradigm of classical information theory

is the engineering problem of the transmission of information over a noisy channel. The most fundamental results

of this theory are Shannon's source coding theorem, which establishes that, on average, the number of bits needed

to represent the result of an uncertain event is given by its entropy; and Shannon's noisy-channel coding theorem,

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 78: Ec6402 Notes

which states that reliable communication is possible over noisy channels provided that the rate of communication

is below a certain threshold, called the channel capacity. The channel capacity can be approached in practice by

using appropriate encoding and decoding systems.

Information theory is closely associated with a collection of pure and applied disciplines that have been

investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half

century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity

science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information

theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst whic h is the

vital field of coding theory.

Coding theory is concerned with finding explicit methods, called codes, of increasing the efficiency and

reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the

maximum possible for that channel. These codes can be roughly subdivided into data compression (source coding)

and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods

Shannon's work proved were possible. A third class of information theory codes are cryptographic algorithms (both

codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used

in cryptography and cryptanalysis. See the article ban (information) for a historical application.

Information theory is also used in information retrieval, intelligence gathering, gambling, statistics, and

even in musical composition.

Quantities of information

Information theory is based on probability theory and statistics. The most important quantities of

information are entropy, the information in a random variable, and mutual information, the amount of

information in common between two random variables. The former quantity indicates how easily message

data can be compressed while the latter can be used to find the communication rate across a channel.

The choice of logarithmic base in the following formulae determines the unit of information entropy that is

used. The most common unit of information is the bit, based on the binary logarithm. Other units include

the nat, which is based on the natural logarithm, and the hartley, which is based on the common logarithm.

In what follows, an expression of the form is considered by convention to be equal to zero

whenever p = 0. This is justified because for any logarithmic base.

Entropy:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 79: Ec6402 Notes

Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy

function, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally

probable, as in an unbiased coin toss.

The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with

the value of X.

Suppose one transmits 1000 bits (0s and 1s). If these bits are known ahead of transmission (to be a certain

value with absolute probability), logic dictates that no information has been transmitted. If, however, each is

equally and independently likely to be 0 or 1, 1000 bits (in the information theoretic sense) have been

transmitted. Between these two extremes, information can be quantified as follows. If is the set of all

messagesx1,...,xn that X could be, and p(x) is the probability of X given some , then the entropy

of X is defined:

(Here, I(x) is the self-information, which is the entropy contribution of an individual message, and is

the expected value.) An important property of entropy is that it is maximized when all the messages in the message

space are equiprobable p(x) = 1 / n,—i.e., most unpredictable—in which case H(X) = logn.

The special case of information entropy for a random variable with two outcomes is the binary entropy

function, usually taken to the logarithmic base 2:

Joint entropy:

The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing: (X,Y).

This implies that if X and Y areindependent, then their joint entropy is the sum of their individual entropies.

For example, if (X,Y) represents the position of a chess piece — X the row and Y the column, then the joint entropy

of the row of the piece and the column of the piece will be the entropy of the position of the piece.

Despite similar notation, joint entropy should not be confused with cross entropy.

Conditional entropy (equivocation):

The conditional entropy or conditional uncertainty of X given random variable Y (also called

the equivocation of X about Y) is the average conditional entropy over Y:

Because entropy can be conditioned on a random variable or on that random variable being a certain

value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in

more common use. A basic property of this form of conditional entropy is that:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 80: Ec6402 Notes

Mutual information (transinformation):

Mutual information measures the amount of information that can be obtained about one random variable

by observing another. It is important in communication where it can be used to maximize the amount of

information shared between sent and received signals. The mutual information of X relative to Y is given by:

where SI (Specific mutual Information) is the pointwise mutual information.

A basic property of the mutual information is that

That is, knowing Y, we can save an average of I(X;Y) bits in encoding X compared to not knowing Y.

Mutual information is symmetric:

Mutual information can be expressed as the average Kullback – Leibler divergence (information gain) of

the posterior probability distribution of X given the value of Y to the prior distribution on X:

In other words, this is a measure of how much, on the average, the probability distribution on X will

change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal

distributions to the actual joint distribution:

Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables

and the multinomial distribution and to Pearson's χ2

test: mutual information can be considered a statistic for

assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

Kullback–Leibler divergence (information gain):

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a

way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability

distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data,

when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average

additional bits per datum necessary for compression. It is thus defined

Although it is sometimes used as a 'distance metric', it is not a true metric since it is not symmetric and does not

satisfy the triangle inequality (making it a semi-quasimetric).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 81: Ec6402 Notes

Coding theory:

Coding theory is one of the most important and direct applications of information theory. It can be

subdivided into source coding theory and channel coding theory. Using a statistical description for data,

information theory quantifies the number of bits needed to describe the data, which is the information entropy of

the source.

Data compression (source coding): There are two formulations for the compression problem:

1. lossless data compression: the data must be reconstructed exactly;

2. lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level

measured by a distortion function. This subset of Information theory is called rate–distortion theory.

Error-correcting codes (channel coding): While data compression removes as much redundancy as

possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to

transmit the data efficiently and faithfully across a noisy channel.

This division of coding theory into compression and transmission is justified by the information transmission

theorems, or source–channel separation theorems that justify the use of bits as the universal currency for

information in many contexts. However, these theorems only hold in the situation where one transmitting user

wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access

channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more

general networks, compression followed by transmission may no longer be optimal. Network information

theory refers to these multi-agent communication models.

SOURCE THEORY:

Any process that generates successive messages can be considered a source of information. A memoryless

source is one in which each message is an independent identically-distributed random variable, whereas the

properties of ergodicity and stationarity impose more general constraints. All such sources are stochastic. These

terms are well studied in their own right outside information theory.

Rate:

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of

each symbol, while, in the case of a stationary stochastic process, it is

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general

case of a process that is not necessarily stationary, the average rate is

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the

same result.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 82: Ec6402 Notes

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate,

for example, when the source of information is English prose. The rate of a source of information is related to

its redundancy and how well it can be compressed, the subject of source coding.

Channel capacity:

Communications over a channel—such as an ethernet cable—is the primary motivation of information

theory. As anyone who's ever used a telephone (mobile or landline) knows, however, such channels often fail to

produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often

degrade quality. How much information can one hope to communicate over a noisy (or otherwise imperfect)

channel?

below:

Consider the communications process over a discrete channel. A simple model of the process is shown

Here X represents the space of messages transmitted, and Y the space of messages received during a unit

time over our channel. Let p(y | x) be the conditional probability distribution function of Y given X. We will

consider p(y | x) to be an inherent fixed property of our communications channel (representing the nature of

the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by

our choice of f(x), the marginal distribution of messages we choose to send over the channel. Under these

constraints, we would like to maximize the rate of information, or the signal, we can communicate over the

channel. The appropriate measure for this is the mutual information, and this maximum mutual information is

called the channel capacity and is given by:

This capacity has the following property related to communicating at information rate R (where R is

usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a

code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε;

that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is

impossible to transmit with arbitrarily small block error.

Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over

a noisy channel with a small coding error at a rate near the channel capacity.

BIT RATE:

In telecommunications and computing, bitrate (sometimes written bit rate, data rate or as a

variable R or fb) is the number of bits that are conveyed or processed per unit of time.

The bit rate is quantified using the bits per second (bit/s or bps) unit, often in conjunction with an SI

prefix such as kilo- (kbit/s or kbps), mega-(Mbit/s or Mbps), giga- (Gbit/s or Gbps) or tera- (Tbit/s or Tbps). Note

that, unlike many other computer-related units, 1 kbit/s is traditionally defined as 1,000 bit/s, not 1,024 bit/s, etc.,

also before 1999 when SI prefixes were introduced for units of information in the standard IEC 60027-2.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 83: Ec6402 Notes

The formal abbreviation for "bits per second" is "bit/s" (not "bits/s", see writing style for SI units). In less

formal contexts the abbreviations "b/s" or "bps" are often used, though this risks confusion with "bytes per second"

("B/s", "Bps"). 1 Byte/s (Bps or B/s) corresponds to 8 bit/s (bps or b/s).

Shannon–Fano coding

In the field of data compression, Shannon–Fano coding, named after Claude Elwood

Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and

their probabilities (estimated or measured). It is suboptimal in the sense that it does not achieve the

lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does

guarantee that all code word lengths are within one bit of their theoretical ideal − logP(x). The technique

was proposed in Shannon's "A Mathematical Theory of Communication", his 1948 article introducing the

field of information theory. The method was attributed to Fano, who later published it as a technical

report. Shannon–Fano coding should not be confused with Shannon coding, the coding method used to

prove Shannon's noiseless coding theorem, or with Shannon-Fano-Elias coding (also known as Elias

coding), the precursor to arithmetic coding.

In Shannon–Fano coding, the symbols are arranged in order from most probable to least

probable, and then divided into two sets whose total probabilities are as close as possible to being

equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0"

and symbols in the second set receive "1". As long as any sets with more than one member remain, the

same process is repeated on those sets, to determine successive digits of their codes. When a set has

been reduced to one symbol, of course, this means the symbol's code is complete and will not form the

prefix of any other symbol's code.

The algorithm works, and it produces fairly efficient variable-length encodings; when the two

smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to

distinguish them is used most efficiently. Unfortunately, Shannon–Fano does not always produce

optimal prefix codes; the set of probabilities 0.35, 0.17, 0.17, 0.16, 0.15 is an example of one that will

be assigned non-optimal codes by Shannon–Fano coding.

For this reason, Shannon–Fano is almost never used; Huffman coding is almost as

computationally simple and produces prefix codes that always achieve the lowest expected code word

length, under the constraints that each symbol is represented by a code formed of an integral number of

bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long

sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal

if the probabilities of the symbols are independent and are some power of a half, i.e., . In most

situations, arithmetic coding can produce greater overall compression than either Huffman or Shannon–

Fano, since it can encode in fractional numbers of bits which more closely approximate the actual

information content of the symbol. However, arithmetic coding has not superseded Huffman the way that

Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally

expensive and because it is covered by multiple patents.

Shannon–Fano coding is used in the IMPLODE compression method, which is part of

the ZIP file format.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 84: Ec6402 Notes

SHANNON–FANO ALGORITHM:

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The

actual algorithm is simple:

1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each

symbol‘s relative frequency of occurrence is known.

2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left

and the least common at the right.

3. Divide the list into two parts, with the total frequency counts of the left half being as close to the total of

the right as possible.

4. The left half of the list is assigned the binary digit 0, and the right half is assigned the digit 1. This means

that the codes for the symbols in the first half will all start with 0, and the codes in the second half will all

start with 1.

5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the

codes until each symbol has become a corresponding code leaf on the tree.

Example

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 85: Ec6402 Notes

Shannon–Fano Algorithm

The example shows the construction of the Shannon code for a small alphabet. The five symbols which can be

coded have the following frequency:

Symbol

A

B

C

D

E

Count

15

7

6

6

5

Probabilities

0.38461538

0.17948718

0.15384615

0.15384615

0.12820513

All symbols are sorted by frequency, from left to right (shown in Figure a). Putting the dividing line

between symbols B and C results in a total of 22 in the left group and a total of 17 in the right group. This

minimizes the difference in totals between the two groups.

With this division, A and B will each have a code that starts with a 0 bit, and the C, D, and E codes will all

start with a 1, as shown in Figure b. Subsequently, the left half of the tree gets a new division between A and

B, which puts A on a leaf with code 00 and B on a leaf with code 01.

After four division procedures, a tree of codes results. In the final tree, the three symbols with the highest

frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as

shown table below:

Symbol

A

B

C

D

E

Code

00

01

10

110

111

Results in 2 bits for A, B and C and per 3 bits for D and E an average bit number of

HUFFMAN CODING:

The Shannon-Fano algorithm doesn't always generate an optimal code. In 1952, David A. Huffman gave a

different algorithm that always produces an optimal tree for any given probabilities. While the Shannon -Fano tree

is created from the root to the leaves, the Huffman algorithm works from leaves to the root in the opposite

direction.

1. Create a leaf node for each symbol and add it to frequency of occurrence.

2. While there is more than one node in the queue:

1. Remove the two nodes of lowest probability or frequency from the queue

2. Prepend 0 and 1 respectively to any code already assigned to these nodes

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 86: Ec6402 Notes

3. Create a new internal node with these two nodes as children and with probability equal to the

sum of the two nodes' probabilities.

4. Add the new node to the queue.

3. The remaining node is the root node and the tree is complete.

Example

Huffman Algorithm

Using the same frequencies as for the Shannon-Fano example above, viz:

Symbol

A

B

C

D

E

Count

15

7

6

6

5

Probabilities

0.38461538

0.17948718

0.15384615

0.15384615

0.12820513

In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped

together with a combined probability of 0.28205128. The lowest pair now are B and C so they're allocated 0

and 1 and grouped together with a combined probability of 0.33333333. This leaves BC and DE now with

the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just

A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a

single node and our algorithm is complete.

The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 87: Ec6402 Notes

Symbol

A

B

C

D

E

Code

0

100

101

110

111

Results in 1 bit for A and per 3 bits for B, C, D and E an average bit number of

Lempel–Ziv–Welch:

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham

Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of

the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has very high

throughput

ALGORITHM:

Idea:

The scenario described in Welch's 1984 paper[1] encodes sequences of 8-bit data as fixed-length 12-bit

codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character,

and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded.

At each stage in compression, input bytes are gathered into a sequence until the next character would make a

sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is

emitted, and a new code (for the sequence with that character) is added to the dictionary.

The idea was quickly adapted to other situations. In an image based on a color table, for example, the

natural character alphabet is the set of color table indexes, and in the 1980s, many images had small color tables

(on the order of 16 colors). For such a reduced alphabet, the full 12-bit codes yielded poor compression unless the

image was large, so the idea of a variable-width code was introduced: codes typically start one bit wider than the

symbols being encoded, and as each code size is used up, the code width increases by 1 bit, up to some prescribed

maximum (typically 12 bits).

Further refinements include reserving a code to indicate that the code table should be cleared (a "clear

code", typically the first value immediately after the values for the individual alphabet characters), and a code to

indicate the end of data (a "stop code", typically one greater than the clear code). The clear code allows the table to

be reinitialized after it fills up, which lets the encoding adapt to changing patterns in the input data. Smart encoders

can monitor the compression efficiency and clear the table whenever the existing table no longer matches the input

well.

Since the codes are added in a manner determined by the data, the decoder mimics building the table as it

sees the resulting codes. It is critical that the encoder and decoder agree on which variety of LZW is being used:

the size of the alphabet, the maximum code width, whether variable-width encoding is being used, the initial code

size, whether to use the clear and stop codes (and what values they have). Most formats that employ LZW build

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 88: Ec6402 Notes

this information into the format specification or provide explicit fields for them in a compression header for the

data.

Encoding:

A dictionary is initialized to contain the single-character strings corresponding to all the possible input

characters (and nothing else except the clear and stop codes if they're being used). The algorithm works by

scanning through the input string for successively longer substrings until it finds one that is not in the dictionary.

When such a string is found, the index for the string less the last character (i.e., the longest substring that is in the

dictionary) is retrieved from the dictionary and sent to output, and the new string (including the last character) is

added to the dictionary with the next available code. The last input character is then used as the next starting point

to scan for substrings.

In this way, successively longer strings are registered in the dictionary and made available for subsequent

encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a

message will see little compression. As the message grows, however, the compression ratio tends asymptotically to

the maximum.

Decoding:

The decoding algorithm works by reading a value from the encoded input and outputting the

corresponding string from the initialized dictionary. At the same time it obtains the next value from the input, and

adds to the dictionary the concatenation of the string just output and the first character of the string obtained by

decoding the next input value. The decoder then proceeds to the next input value (which was already read in as the

"next value" in the previous pass) and repeats the process until there is no more input, at which point the final input

value is decoded without any more additions to the dictionary.

In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to

decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data; just the

initial dictionary containing the single-character strings is sufficient (and is typically defined beforehand within the

encoder and decoder rather than being explicitly sent with the encoded data.)

Variable-width codes:

If variable-width codes are being used, the encoder and decoder must be careful to change the width at the

same points in the encoded data, or they will disagree about where the boundaries between individual codes fall in

the stream. In the standard version, the encoder increases the width from p to p + 1 when a sequence ω + s is

encountered that is not in the table (so that a code must be added for it) but the next available code in the table is

2p

(the first code requiring p + 1 bits). The encoder emits the code for ω at width p (since that code does not require

p + 1 bits), and then increases the code width so that the next code emitted will be p + 1 bits wide.

The decoder is always one code behind the encoder in building the table, so when it sees the code for ω, it

will generate an entry for code 2p

− 1. Since this is the point where the encoder will increase the code width, the

decoder must increase the width here as well: at the point where it generates the largest code that will fit in p bits.

Unfortunately some early implementations of the encoding algorithm increase the code width and then emit ω at

the new width instead of the old width, so that to the decoder it looks like the width changes one code too early.

This is called "Early Change"; it caused so much confusion that Adobe now allows both versions in PDF files, but

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 89: Ec6402 Notes

includes an explicit flag in the header of each LZW-compressed stream to indicate whether Early Change is being

used. Most graphic file formats do not use Early Change.

When the table is cleared in response to a clear code, both encoder and decoder change the code width

after the clear code back to the initial code width, starting with the code immediately following the clear code.

Packing order:

Since the codes emitted typically do not fall on byte boundaries, the encoder and decoder must agree on

how codes are packed into bytes. The two common methods are LSB-First ("Least Significant Bit First")

and MSB-First ("Most Significant Bit First"). In LSB-First packing, the first code is aligned so that the least

significant bit of the code falls in the least significant bit of the first stream byte, and if the code has more than 8

bits, the high order bits left over are aligned with the least significant bit of the next byte; further codes are packed

with LSB going into the least significant bit not yet used in the current stream byte, proceeding into further bytes as

necessary. MSB-first packing aligns the first code so that its most significant bit falls in the MSB of the first stream

byte, with overflow aligned with the MSB of the next byte; further codes are written with MSB going into the most

significant bit not yet used in the current stream byte.

Example:

The following example illustrates the LZW algorithm in action, showing the status of the output and

the dictionary at every stage, both in encoding and decoding the data. This example has been constructed to give

reasonable compression on a very short message. In real text data, repetition is generally less pronounced, so

longer input streams are typically necessary before the compression builds up efficiency.

The plaintext to be encoded (from an alphabet using only the capital letters) is:

The # is a marker used to show that the end of the message has been reached. There are thus 26 symbols in

the plaintext alphabet (the 26 capital letters A through Z), plus the stop code #. We arbitrarily assign these the

values 1 through 26 for the letters, and 0 for '#'. (Most flavors of LZW would put the stop code after the data

alphabet, but nothing in the basic algorithm requires that. The encoder and decoder only have to agree what value it

has.)

A computer will render these as strings of bits. Five-bit codes are needed to give sufficient combinations

to encompass this set of 27 values. The dictionary is initialized with these 27 values. As the dictionary grows, the

codes will need to grow in width to accommodate the additional entries. A 5-bit code gives 25

= 32 possible

combinations of bits, so when the 33rd dictionary word is created, the algorithm will have to switch at that point

from 5-bit strings to 6-bit strings (for all code values, including those which were previously output with only five

bits). Note that since the all-zero code 00000 is used, and is labeled "0", the 33rd dictionary entry will be

labeled 32. (Previously generated output is not affected by the code-width change, but once a 6-bit value is

generated in the dictionary, it could conceivably be the next code emitted, so the width for subsequent output shifts

to 6 bits to accommodate that.)

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 90: Ec6402 Notes

The initial dictionary, then, will consist of the following entries:

Symbol

Binary

Decimal

#

00000

0

A

00001

1

B

00010

2

C

00011

3

D

00100

4

E

00101

5

F

00110

6

G

00111

7

H

01000

8

I 01001

9

J

01010

10

K

01011

11

L

01100

12

M

01101

13

N

01110

14

O

01111

15

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 91: Ec6402 Notes

Output

Current Sequence

Next Char

Extended Dictionary

Comments

Code

Bits

NULL

T

T

O

20

10100

27:

TO

27 = first available code after 0 through 26

O

B

15

01111

28:

OB

P

10000

16

Q

10001

17

R

10010

18

S

10011

19

T

10100

20

U

10101

21

V

10110

22

W

10111

23

X

11000

24

Y

11001

25

Z

11010

26

Encoding:

Buffer input characters in a sequence ω until ω + next character is not in the dictionary. Emit the code for

ω, and add ω + next character to the dictionary. Start buffering again with the next character.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 92: Ec6402 Notes

B

E

2

00010

29:

BE

E

O

5

00101

30:

EO

O

R

15

01111

31:

OR

R

N

18

10010

32:

RN

32 requires 6 bits, so for next output use 6 bits

N

O

14

001110

33:

NO

O

T

15

001111

34:

OT

T

T

20

010100

35:

TT

TO

B

27

011011

36:

TOB

BE

O

29

011101

37:

BEO

OR

T

31

011111

38:

ORT

TOB

E

36

100100

39:

TOBE

EO

R

30

011110

40:

EOR

RN

O

32

100000

41:

RNO

OT

#

34

100010

# stops the algorithm; send the cur seq

0

000000

and the stop code

Unencoded length = 25 symbols × 5 bits/symbol = 125 bits

Encoded length = (6 codes × 5 bits/code) + (11 codes × 6 bits/code) = 96 bits.

Using LZW has saved 29 bits out of 125, reducing the message by almost 22%. If the message were longer, then

the dictionary words would begin to represent longer and longer sections of text, allowing repeated words to be

sent very compactly.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 93: Ec6402 Notes

Input

New Dictionary Entry

Output Sequence

Comments

Bits

Code

Full

Conjecture

10100

20

T

27:

T?

01111

15

O

27:

TO

28:

O?

00010

2

B

28:

OB

29:

B?

00101

5

E

29:

BE

30:

E?

01111

15

O

30:

EO

31:

O?

10010

18

R

31:

OR

32:

R?

created code 31 (last to fit in 5 bits)

001110

14

N

32:

RN

33:

N?

so start using 6 bits

001111

15

O

33:

NO

34:

O?

010100

20

T

34:

OT

35:

T?

011011

27

TO

35:

TT

36:

TO?

011101

29

BE

36:

TOB

37:

BE?

36 = TO + 1st symbol (B) of

011111

31

OR

37:

BEO

38:

OR?

next coded sequence received (BE)

100100

36

TOB

38:

ORT

39:

TOB?

011110

30

EO

39:

TOBE

40:

EO?

Decoding:

To decode an LZW-compressed archive, one needs to know in advance the initial dictionary used, but

additional entries can be reconstructed as they are always simply concatenations of previous entries.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 94: Ec6402 Notes

100000

32

RN

40:

EOR

41:

RN?

100010

34

OT

41:

RNO

42:

OT?

000000

0

#

At each stage, the decoder receives a code X; it looks X up in the table and outputs the sequence χ it

codes, and it conjectures χ + ? as the entry the encoder just added — because the encoder emitted X for χ precisely

because χ + ? was not in the table, and the encoder goes ahead and adds it. But what is the missing letter? It is the

first letter in the sequence coded by the next code Z that the decoder receives. So the decoder looks up Z, decodes

it into the sequence ω and takes the first letter z and tacks it onto the end of χ as the next dictionary entry.

This works as long as the codes received are in the decoder's dictionary, so that they can be decoded into

sequences. What happens if the decoder receives a code Z that is not yet in its dictionary? Since the decoder is

always just one code behind the encoder, Z can be in the encoder's dictionary only if the encoder just generated it,

when emitting the previous code X for χ. Thus Z codes some ω that is χ + ?, and the decoder can determine the

unknown character as follows:

1. The decoder sees X and then Z.

2. It knows X codes the sequence χ and Z codes some unknown sequence ω.

3. It knows the encoder just added Z to code χ + some unknown character,

4. and it knows that the unknown character is the first letter z of ω.

5. But the first letter of ω (= χ + ?) must then also be the first letter of χ.

6. So ω must be χ + x, where x is the first letter of χ.

7. So the decoder figures out what Z codes even though it's not in the table,

8. and upon receiving Z, the decoder decodes it as χ + x, and adds χ + x to the table as the value of Z.

This situation occurs whenever the encoder encounters input of the form cScSc, where c is a single

character, S is a string and cS is already in the dictionary, but cSc is not. The encoder emits the code for cS, putting

a new code for cSc into the dictionary. Next it sees cSc in the input (starting at the second c of cScSc) and emits the

new code it just inserted. The argument above shows that whenever the decoder receives a code not in its

dictionary, the situation must look like this.

Although input of form cScSc might seem unlikely, this pattern is fairly common when the input stream is

characterized by significant repetition. In particular, long strings of a single character (which are common in the

kinds of images LZW is often used to encode) repeatedly generate patterns of this sort.

Further coding:

The simple scheme described above focuses on the LZW algorithm itself. Many applications apply further

encoding to the sequence of output symbols. Some package the coded stream as printable characters using some

form of Binary-to-text encoding; this will increase the encoded length and decrease the compression frequency.

Conversely, increased compression can often be achieved with an adaptive entropy encoder. Such a coder estimates

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 95: Ec6402 Notes

the probability distribution for the value of the next symbol, based on the observed frequencies of values so far.

Standard entropy encoding such as Huffman coding or arithmetic coding then uses shorter codes for values with

higher probabilities.

Uses:

When it was introduced, LZW compression provided the best compression ratio among all well -known

methods available at that time. It became the first widely used universal data compression method on computers. A

large English text file can typically be compressed via LZW to about half its original size.

LZW was used in the program compress, which became a more or less standard utility in Unix systems

circa 1986. It has since disappeared from many distributions, for both legal and technical reasons, but as of 2008 at

least FreeBSD includes both compress and uncompress as a part of the distribution. Several other popular

compression utilities also used LZW, or closely related methods.

LZW became very widely used when it became part of the GIF image format in 1987. It may also

(optionally) be used in TIFF and PDF files. (Although LZW is available in Adobe Acrobat software, Acrobat by

default uses the DEFLATE algorithm for most text and color-table-based image data in PDF files.)

Shannon's Theorem:

Shannon's Theorem gives an upper bound to the capacity of a link, in bits per second (bps), as a function

of the available bandwidth and the signal-to-noise ratio of the link.

The Theorem can be stated as:

C = B * log2(1+ S/N)

where C is the achievable channel capacity, B is the bandwidth of the line, S is the average signal power

and N is the average noise power.

The signal-to-noise ratio (S/N) is usually expressed in decibels (dB) given by the formula:

10 * log10(S/N)

so for example a signal-to-noise ratio of 1000 is commonly expressed as

10 * log10(1000) = 30 dB.

Here is a graph showing the relationship between C/B and S/N (in dB):

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 96: Ec6402 Notes

Examples

Here are two examples of the use of Shannon's Theorem.

Modem

For a typical telephone line with a signal-to-noise ratio of 30dB and an audio bandwidth of 3kHz, we get a

maximum data rate of:

C = 3000 * log2(1001)

which is a little less than 30 kbps.

Satellite TV Channel

For a satellite TV channel with a signal-to noise ratio of 20 dB and a video bandwidth of 10MHz, we get a

maximum data rate of:

C=10000000 * log2(101)

which is about 66 Mbps.

CHANNEL CAPACITY:

In electrical engineering, computer science and information theory, channel capacity is the tightest upper

bound on the amount of information that can be reliably transmitted over a communications channel. By the noisy-

channel coding theorem, the channel capacity of a given channel is the limiting information rate (in units

of information per unit time) that can be achieved with arbitrarily small error probability.

Information theory, developed by Claude E. Shannon during World War II, defines the notion of channel

capacity and provides a mathematical model by which one can compute it. The key result states that the capacity of

the channel, as defined above, is given by the maximum of the mutual information between the input and output of

the channel, where the maximization is with respect to the input distribution.

BANDWIDTH:

It has several related meanings: Bandwidth (signal processing) or analog bandwidth, frequency bandwidth or radio bandwidth: a measure

of the width of a range of frequencies, measured in hertz

Bandwidth (computing) or digital bandwidth: a rate of data transfer, bit rate or throughput, measured in

bits per second (bps)

Spectral line width: the width of an atomic or molecular spectral line, measured in hertz

Bandwidth can also refer to:

Bandwidth (linear algebra), the width of the terms around the diagonal of a matrix hypotenuse

In kernel density estimation, "bandwidth" describes the width of the convolution kernel used

A normative expected range of linguistic behavior in language expectancy theory

In business jargon, the resources needed to complete a task or project

Bandwidth (radio program): A Canadian radio program

Graph bandwidth, in graph theory

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 97: Ec6402 Notes

SIGNAL-TO-NOISE RATIO:

Signal-to-noise ratio (often abbreviated SNR or S/N) is a measure used in science and engineering to

quantify how much a signal has been corrupted by noise. It is defined as the ratio of signal power to the noise

power corrupting the signal. A ratio higher than 1:1 indicates more signal than noise. While SNR is commonly

quoted for electrical signals, it can be applied to any form of signal (such as isotope levels in an ice

core or biochemical signaling between cells).

In less technical terms, signal-to-noise ratio compares the level of a desired signal (such as music) to the

level of background noise. The higher the ratio, the less obtrusive the background noise is.

"Signal-to-noise ratio" is sometimes used informally to refer to the ratio of useful information to false or

irrelevant data in a conversation or exchange. For example, in online discussion forums and other online

communities, off-topic posts and spam are regarded as "noise" that interferes with the "signal" of appropriate

discussion.

Signal-to-noise ratio is defined as the power ratio between a signal (meaningful information) and the

background noise (unwanted signal):

where P is average power. Both signal and noise power must be measured at the same or equivalent points

in a system, and within the same system bandwidth. If the signal and the noise are measured across the

same impedance, then the SNR can be obtained by calculating the square of the amplitude ratio:

where A is root mean square (RMS) amplitude (for example, RMS voltage). Because many signals have a

very wide dynamic range, SNRs are often expressed using the logarithmicdecibel scale. In decibels, the SNR is

defined as

which may equivalently be written using amplitude ratios as

The concepts of signal-to-noise ratio and dynamic range are closely related. Dynamic range measures the

ratio between the strongest un-distorted signal on a channel and the minimum discernable signal, which for most

purposes is the noise level. SNR measures the ratio between an arbitrary signal level (not necessarily the most

powerful signal possible) and noise. Measuring signal-to-noise ratios requires the selection of a representative

or reference signal. In audio engineering, the reference signal is usually a sine wave at a standardized nominal

or alignment level, such as 1 kHz at +4 dBu (1.228 VRMS).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 98: Ec6402 Notes

SNR is usually taken to indicate an average signal-to-noise ratio, as it is possible that (near) instantaneous

signal-to-noise ratios will be considerably different. The concept can be understood as normalizing the noise level

to 1 (0 dB) and measuring how far the signal 'stands out'.

Mutual information:

In probability theory and information theory, the mutual information (sometimes known by

the archaic term trans information) of two random variables is a quantity that measures the mutual dependence of

the two variables. The most common unit of measurement of mutual information is the bit, when logarithms to the

base 2 are used.

Definition of mutual information:

Formally, the mutual information of two discrete random variables X and Y can be defined as:

where p(x,y) is the joint probability distribution function of X and Y, and p1(x) and p2(y) are the marginal

probability distribution functions of X and Y respectively.

In the case of a continuous function, summation is matched with a definite double integral:

where p(x,y) is now the joint probability density function of X and Y, and p1(x) and p2(y) are the marginal

probability density functions of X and Y respectively.

These definitions are ambiguous because the base of the log function is not specified. To disambiguate,

the function I could be parameterized as I(X,Y,b) where b is the base. Alternatively, since the most common unit of

measurement of mutual information is the bit, a base of 2 could be specified.

Intuitively, mutual information measures the information that X and Y share: it measures how much

knowing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent,

then knowing X does not give any information about Y and vice versa, so their mutual information is zero. At the

other extreme, if X and Y are identical then all information conveyed by X is shared with Y:

knowing X determines the value of Y and vice versa. As a result, in the case of identity the mutual information is

the same as the uncertainty contained in Y (or X) alone, namely the entropy of Y (or X: clearly if X and Y are

identical they have equal entropy).

Mutual information quantifies the dependence between the joint distribution of X and Y and what the joint

distribution would be if X and Y were independent. Mutual information is a measure of dependence in the

following sense: I(X; Y) = 0 if and only if X and Y are independent random variables. This is easy to see in one

direction: if X and Y are independent, then p(x, y) = p(x) p(y), and therefore:

Moreover, mutual information is nonnegative (i.e. I(X;Y) ≥ 0; see below) and symmetric (i.e. I(X;Y) = I(Y;X)).

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 99: Ec6402 Notes

CHANNEL CAPACITY:

In electrical engineering, computer science and information theory, channel capacity is the tightest upper

bound on the amount of information that can be reliably transmitted over a communications channel. By the noisy-

channel coding theorem, the channel capacity of a given channel is the limiting information rate (in units

of information per unit time) that can be achieved with arbitrarily small error probability.

Information theory, developed by Claude E. Shannon during World War II, defines the notion of channel

capacity and provides a mathematical model by which one can compute it. The key result states that the capacity of

the channel, as defined above, is given by the maximum of the mutual information between the input and output of

the channel, where the maximization is with respect to the input distribution

Formal definition

Let X represent the space of signals that can be transmitted, and Y the space of signals received, during a block of

time over the channel. Let

be the conditional distribution function of Y given X. Treating the channel as a known statistic system, pY | X(y | x) is

an inherent fixed property of the communications channel (representing the nature of the noise in it). Then the joint

distribution

of X and Y is completely determined by the channel and by the choice of

the marginal distribution of signals we choose to send over the channel. The joint distribution can be recovered by

using the identity

Under these constraints, next maximize the amount of information, or the message, that one can communicate over

the channel. The appropriate measure for this is the mutual information I(X;Y), and this maximum mutual

information is called the channel capacity and is given by

Noisy-channel coding theorem:

The noisy-channel coding theorem states that for any ε > 0 and for any rate R less than the channel

capacity C, there is an encoding and decoding scheme that can be used to ensure that the probability of block error

is less than ε for a sufficiently long code. Also, for any rate greater than the channel capacity, the probability of

block error at the receiver goes to one as the block length goes to infinity.

Example application:

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC

Page 100: Ec6402 Notes

An application of the channel capacity concept to an additive white Gaussian noise (AWGN)

channel with B Hz bandwidth and signal-to-noise ratio S/N is the Shannon–Hartley theorem:

C is measured in bits per second if the logarithm is taken in base 2, or nats per second if the natural

logarithm is used, assuming B is in hertz; the signal and noise powers S and N are measured in watts or

volts2, so the signal-to- noise ratio here is expressed as a power ratio, not in decibels (dB); since figures are

often cited in dB, a conversion may be needed. For example, 30 dB is a power ratio of 1030 / 10

= 103

=

1000.

Slow-fading channel:

In a slow-fading channel, where the coherence time is greater than the latency requirement, there

is no definite capacity as the maximum rate of reliable communications supported by the channel, log2(1

+ | h | 2SNR), depends on the random channel gain | h |

2. If the transmitter encodes data at rate R

[bits/s/Hz], there is a certain probability that the decoding error probability cannot be made arbitrarily

small,

,

in which case the system is said to be in outage. With a non-zero probability that the channel is in

deep fade, the capacity of the slow-fading channel in strict sense is zero. However, it is possible to

determine the largest value of R such that the outage probability pout is less than ε. This value is known as

the ε-outage capacity.

FAST-FADING CHANNEL:

In a fast-fading channel, where the latency requirement is greater than the coherence time and the codeword

length spans many coherence periods, one can average over many independent channel fades by coding

over a large

number of coherence time intervals. Thus, it is possible to achieve a reliable rate of communication

of [bits/s/Hz] and it is meaningful to speak of this value as the

capacity of the fast-fading channel.

RATE DISTORTION THEORY:

Rate–distortion theory is a major branch of information theory which provides the theoretical

foundations for lossy data compression; it addresses the problem of determining the minimal amount

of entropy (or information) R that should be communicated over a channel, so that the source (input signal)

can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.

FRANCIS XAVIER ENGINEERING COLLEGE

www.francisxavier.ac.in

Department of ECE-FXEC