linear systems and noise with applicationselec.otago.ac.nz/w/images/1/10/linearsystemsnoise.pdf ·...

Linear Systems and Noisewith Applications

One-dimensional Fresnel diffraction from a slit

−3 −2 −1 0 1 2 3

x 10−3

0

1

2

3

4

5

6

Position on screen (m)

Inte

nsity

z = 0.01 m

z = 0.05 m

z = 0.2 m

z = 1 m

z = 2 m

Sze Tan and Colin Fox2009 Edition

Linear Systems and Noisewith Applications

2009 Edition

M. Sze TanColin Fox

Contents

Contents iii

Preface v

1 The Dirac Delta Function, and Distributions 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Delta as the limit of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Systems 29

2.1 The concept of a system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 Classes of systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3 Properties of the Convolution Integral . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Example: Indirect Computation of a Convolution Integral . . . . . . . . . . . 39

2.5 Stability of linear time-invariant systems . . . . . . . . . . . . . . . . . . . . . 40

2.6 Eigenfunctions and eigenvalues of linear time-invariant systems . . . . . . . . 41

2.7 Cascaded systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 The Fourier Transform 45

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 The Fourier transform and its inverse . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Examples of Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.5 Convergence of the Dirichlet integrals . . . . . . . . . . . . . . . . . . . . . . 56

3.6 Fourier transforms of generalized functions . . . . . . . . . . . . . . . . . . . . 58

3.7 Examples of Fourier transforms of generalized functions . . . . . . . . . . . . 60

4 Sampling in Frequency and Time 65

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 Fourier transform of a train of delta functions . . . . . . . . . . . . . . . . . . 65

4.3 The Fourier transform of a periodic signal . . . . . . . . . . . . . . . . . . . . 66

4.4 Sampling in time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5 The sampling theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.6 Bernstein’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.7 The discrete-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . 72

iii

5 The Discrete Fourier Transform 755.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.2 Discrete Fourier transform of a sampled complex exponential . . . . . . . . . 755.3 Sinusoidal and Cosinusoidal components . . . . . . . . . . . . . . . . . . . . . 775.4 Relationship to the continuous time Fourier transform . . . . . . . . . . . . . 785.5 The fast Fourier transform (FFT) algorithm . . . . . . . . . . . . . . . . . . . 80

6 The Hilbert Transform and Linear Modulation Theory 836.1 The Hilbert transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.2 The Fourier transform of a real causal function . . . . . . . . . . . . . . . . . 846.3 Transmission and reception of information . . . . . . . . . . . . . . . . . . . . 856.4 Double sideband (DSB) modulation . . . . . . . . . . . . . . . . . . . . . . . 866.5 Amplitude modulation (AM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.6 Single sideband (SSB) modulation . . . . . . . . . . . . . . . . . . . . . . . . 886.7 Quadrature modulation/demodulation – The complex envelope . . . . . . . . 916.8 The optical homodyne detector . . . . . . . . . . . . . . . . . . . . . . . . . . 916.9 The television video signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 The Laplace transform 997.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2 The one-sided Laplace transform . . . . . . . . . . . . . . . . . . . . . . . . . 997.3 Properties of the one-sided Laplace transform . . . . . . . . . . . . . . . . . . 1037.4 The two-sided Laplace transform . . . . . . . . . . . . . . . . . . . . . . . . . 1047.5 Example: Solution of the diffusion equation . . . . . . . . . . . . . . . . . . . 1057.6 The Laplace transform and the matrix exponential . . . . . . . . . . . . . . . 106

8 Wave Propagation and Fourier Optics 1098.1 Propagation of light in the paraxial approximation . . . . . . . . . . . . . . . 1098.2 Solving the paraxial wave equation . . . . . . . . . . . . . . . . . . . . . . . . 1108.3 Fresnel and Fraunhofer Diffraction . . . . . . . . . . . . . . . . . . . . . . . . 1128.4 The diffraction grating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138.5 Fresnel diffraction – numerical calculation . . . . . . . . . . . . . . . . . . . . 1158.6 Matlab code for computing Fresnel diffraction patterns . . . . . . . . . . . . 1168.7 Fourier transforming and imaging properties of lenses . . . . . . . . . . . . . 1178.8 Gaussian Beams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.9 Tracing Gaussian beams through optical systems . . . . . . . . . . . . . . . . 1238.10 One-dimensional propagation in a dispersive medium . . . . . . . . . . . . . . 125

9 Energy and Power, Random Signals and Noise 1299.1 Energy and Power in Deterministic Signals . . . . . . . . . . . . . . . . . . . 1299.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329.3 Power Spectrum of a Stationary Stochastic Process . . . . . . . . . . . . . . . 1369.4 Signals in Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

iv

Preface

These notes accompany the lecture course PHSI 461 in the Physics Department at the Uni-versity of Otago, and Physics 701 in the Physics Department at The University of Auckland.The notes cover the topics of linear systems, transform methods, and noise.

The ‘systems’ approach regards physical systems somewhat abstractly as devices which trans-form inputs into outputs. We are concerned with those cases when it is possible to choosevariables and find parameter regimes such that the transformations may be treated as linearmappings. The mathematical tools which are used to study linear mappings are useful in awide variety of applications as well as being interesting in their own right. The key idea isthe superposition principle; and so transform methods play a central role.

Besides their theoretical importance, many ideas from linear systems theory have found tech-nological applications in a variety of familiar devices. Concepts such as the frequency andphase response of amplifiers, the representation of continuous audio signals by their samplevalues, techniques for encoding audio and video information on radio frequency signals, thedesign of lens systems for cameras and compact disc players, the operation of optical inter-ferometers, etc. can all readily be understood from within the context of linear systems. Inthis course, we shall try to discuss both the theoretical and the applied aspects of the sub-ject, and stress the rather fascinating connections between them. As a result, the topics inthese notes range from mathematical analysis through differential equations to applications.Despite this breadth and the rather cursory treatment that this approach necessarily entails,it is our hope that the central unifying principles will remain evident.

This version has been adapted by Colin Fox from the notes written by Sze Tan for a courseon Linear Systems that we taught for many years, with other colleagues, in the PhysicsDepartment at The University of Auckland. Much of the material in the early chapters wasdeveloped from the lecture notes prepared by Dr Murray Johns for the Transform Methodscourse 31.410. We owe a great debt to him for his ability to extract the important ideas fromthe subject and present them in such a clear way that we hardly realised at the time howmuch we were learning.

Sze Tan and Colin Fox

Sunnyvale and DunedinFebruary 2009

v

1

The Dirac Delta Function, and Distributions

1.1 Motivation

In many applications in physics, we would like to be able to handle quantities such as:

1. The density ρ(r) of a point mass at r0

2. The probability density function of a discrete distribution

3. The force F (t) associated with an “instantaneous” collision,

4. The derivative of a function with a step discontinuity.

These idealized situations can be represented in terms of the Dirac delta function, that wecan think of intuitively as being non-zero at one point only, but in such a way that its integralequals 1. Unfortunately, there is no ‘function’ of the usual kind that satisfies these properties.Instead we must define the Dirac delta function via its ’action’. There are two useful ways todo this; one as the limit of standard functions, the other as a ’distribution’ that defines theaction directly.

The following section develops the notion of the delta function as the limit of standardfunctions. That approach is all that is needed for this course, and is all that we will use.However, there are many situations where the more formal treatment of the delta functionas a distribution is needed. For those who are interested, a section on distribution theory isincluded. The value of mastering distribution theory is that it actually makes manipulationseasier, to a point where they look embarrassingly simple.

1.2 Delta as the limit of functions

We may regard each of the motivating examples as the result of a limiting process in which aquantity gets more and more concentrated at a point in space or time. Consider the sequenceof functions

fn(t) =

n for |t| < 12n

0 otherwise. (1.1)

These functions get narrower and taller as n → ∞. The area in any interval enclosing theorigin tends to one as n becomes large.

1

These functions do not tend pointwise to any limit as n → ∞, but physically it is useful tothink that for n large fn represents a “point” source such as a point charge, point mass etc.We would like to invent a “limit” for this sequence of functions which should have certainproperties. It is called the (Dirac) delta function or the unit impulse and is denoted byδ(t). The property of being a localized unit source means that δ (t) needs to satisfy

∫ b

aδ(t) dt =

1 if 0 ∈ (a, b)0 if 0 /∈ [a, b]

. (1.2)

This property implies that, for any given function φ(t) continuous at the origin,

limn→∞

∫ ∞

−∞fn(t)φ(t) dt = φ(0), (1.3)

and so we would like δ (t) to have the property that

∫ ∞

−∞δ(t)φ(t) dt = φ(0). (1.4)

This is called the substitution, or sifting, property. Many other sequences of functions maybe found which also satisfy the limit (1.3) for a suitable class of functions φ. These providealternative representations of the delta function.

1.2.1 Properties of the delta function

Substitution property

A change of variables shows that

∫ ∞

−∞δ(t − τ)φ(t) dt = φ(τ) (1.5)

for any continuous function φ. We can think of this formula as showing how to decompose

the function φ into a linear combination of delta functions. This turns out to be a very usefuldecomposition.

Another way of describing this result is to say that δ(t − τ) is the kernel of the integraltransform representation of the identity operator.

Indefinite integral

Equation 1.2 allows us to evaluate the indefinite integral of δ(t) as

∫ t

−∞δ(τ) dτ =

1 if t < 00 if t > 0

= u(t) (1.6)

the unit step. (We refuse to say what the value is for t = 0.)

Hence, by differentiating both sides, we find that the derivative of the unit step is

u′(t) = δ(t). (1.7)

2

Derivative

Now that we can differentiate a step, we can differentiate the functions fn(t). By linearity ofdifferentiation

f ′n(t) =δ(

t− 12n

)

− δ(

t+ 12n

)

n. (1.8)

The action of this sequence of functions on a (continuously differentiable) function φ(t) is

∫ ∞

−∞f ′n(t)φ(t) dt =

φ(

t− 12n

)

− φ(

t+ 12n

)

n(1.9)

→ −φ′(0). (1.10)

Here we used the substitution property and the centered-difference form of differentiation.Since we want f ′n → δ′ as n→∞, this defines the action of the derivative of the Dirac deltafunction.

So whereas δ(t) acting on a function returns the value of the function ay t = 0, the derivativeof delta acting on a function returns minus the derivative of the function at t = 0.

Scaling

The action of δ(at), for some a ∈ R, is

∫ ∞

−∞δ(at)φ(t) dt =

∫ ∞

−∞δ(u)φ

(u

a

) dt

|a| (1.11)

=1

|a|φ(0). (1.12)

Since this is the same action as produced by scaling the delta function by 1/|a|, we concludethat

δ(at) =1

|a|δ(t).

By setting a = −1, this result shows that δ(t) is an even function.

By a similar argument, the product of a delta function and a continuous function f(t) is

δ(t)f(t) = δ(t)f(0).

1.2.2 Examples of the convergence of action

We saw, above, that defining the delta function as the limit of functions runs into problemsbecause the limit is not defined as a function. However, the delta function is a useful ideal-ization because something physical does converge, namely the action of the delta function,i.e. the output of a physical system. To show that property in general we need to use dis-tribution theory as presented in the next section. But first we establish the convergence ofaction (output) explicitly in three examples.

3

R

C

iC(t)

vin(t) vout(t)

Figure 1.1: A RC low-pass filter, or quasi-integrator.

Accelerating a mass from rest

Consider a mass m initially at rest at the origin that is forced in the positive x-direction byforce f(t). If the position of the mass at time t is x(t) then f and x are related by Newton’slaw

f(t) = mx(t)

or the velocity v(t) = x(t) is related to f by f(t) = mv(t). When f = fn in equation 1.1, wemay integrate to get

vn(t) =1

m

∫ t

−∞fn(τ) dτ =

1

m

0 t < −12n

n(

t+ 12n

)

−12n ≤ t ≤ 1

2n1 1

2n < t

Notice how the velocity tends to the well-defined limit as n→∞

v(t) =

0 t < 01m 0 < t

(at least for t 6= 0). This is then the velocity for forcing f(t) = δ(t).

Any other property of the position of the mass will also converge to a well defined limit. Forexample, consider the position x(t) that results from forcing fn(t). Integrating the velocitygives

xn(t) =1

m

0 t < −12n

12t

2 + 12n t+ 1

8n2

−12n ≤ t ≤ 1

2nt− 1

2n + 12n2

12n < t

(1.13)

→

0 t ≤ 0tm 0 < t

(1.14)

which is well defined for all t.

Impulse response an R-C low-pass circuit

Consider the low-pass RC circuit in figure 1.1. The impulse response of the circuit is theoutput voltage vout when the input voltage is a unit impulse, i.e. vin = δ(t), and all voltages

4

Figure 1.2: Output of the RC low-pass filter using RC = 1 and for n = 0.5, 1, 2, 3.

are zero for t < 0. The output can be calculated as the limit of outputs when vin = fn(t)in the limit n → ∞. (As usual, we assume the ideal situation where the circuit is drivenwith zero source impedance and the output is measured with a device with infinite inputimpedance.)

Since the current through the capacitor is iC(t) = (vin − vout) /R and the voltage across thecapacitor is vout = 1

C

∫ t−∞ iC(τ) dτ we derive that

vout =1

C

∫ t

−∞(vin(τ)− vout(τ)) dτ

(note that the circuit integrates the input when vout ≃ 0 which is why the circuit is also calleda quasi-integrator). Differentiating and setting vin = fn(t) gives the differential equation

vout +RCvout = fn.

This is easy to solve analytically (or by sticking it into Maple) giving

vout (t) =

0 t < −12n

nCR

(

1− e−CR(t+ 1

2n))

−12n ≤ t ≤ 1

2n

nCR

(

1− e−CR

n

)

e−CR(t− 1

2n) 1

2n < t

which is plotted in figure 1.2 for RC = 1 and N = 0.5, 1, 2, 3 As you can see, the outputtends to a well-defined limit as n→∞. The limit function is easily found to be

vout (t) =

0 t < 01

RC e−RCt 0 ≤ t

This is the impulse response of the circuit.

5

1.3 Distributions

This section is included for interest only, and won’t be used or examined in the course. Allthat you will gain is knowledge.

Distribution theory allows us to deal with objects like delta functions in a correct way.These are not ordinary functions but are rather examples of distributions or generalizedfunctions. The extra conceptual power and elegance of distribution theory will (hopefully)become obvious as we proceed. In particular, we shall find that any distribution can bedifferentiated (to give another distribution) any number of times and that distributions justifymany operations which we often carry out in Physics calculations which are incorrect for“ordinary” functions.

1.3.1 Some concepts from mathematical analysis

The following is a rather brief summary of a few definitions and results which will be usefullater. They should also serve to emphasize that we do need to be rather careful whenmanipulating ordinary functions.

Suppose that xn for n = 1, 2, ... is a sequence of real or complex numbers. We say thatthe sequence converges to x as n → ∞ if |xn − x| can be made to be as small as welike whenever n is sufficiently large. Formally, this is written ∀ǫ > 0, ∃N ∈ N, such thatn ≥ N ⇒ |xn − x| < ǫ where the quantifiers ∀ and ∃ represent “for all” and “there exist”respectively.

A function f(t) is said to have limit b as x→ a and we write

limt→a

f(t) = b, (1.15)

provided that |f(t)− b| can be made as small as we like whenever t is sufficiently close (butnot equal) to a. Formally, ∀ǫ > 0, ∃δ > 0 such that 0 < |t− a| < δ ⇒ |f(t)− b| < ǫ.

A function f(t) is said to be continuous at a if

limt→a

f(t) = f(a), (1.16)

i.e., ∀ǫ > 0, ∃δ > 0 such that |t− a| < δ ⇒ |f(t)− f(a)| < ǫ.

Note that continuity is a property that a function may have at a point. There are functionswhich are continuous only at a single point and are discontinuous everywhere else. Considerfor example

f (t) =

t if t is a rational number0 if t is irrational.

(1.17)

You should be able to show (using the definition of continuity) that this function is continuousat t = 0 and nowhere else.

It should be familiar result from analysis that a function f(t) which is continuous on a closedand bounded set K is bounded, i.e., ∃M ∈ R such that |f(t)| ≤M for all t ∈ K.

6

We now wish to consider sequences of functions. Suppose fn is a sequence of functionsall defined on some set S. The functions are said to converge pointwise to a function f asn→∞ if for each t ∈ S,

f(t) = limn→∞

fn(t). (1.18)

Notice that for a given t, the limit is simply the limit of a sequence of numbers.

Examples of sequences of functions

1. Define

fn(t) =

0 for t < 0tn for 0 ≤ t ≤ 11 for t > 1

.

This sequence of functions (see figure 1.3) converges pointwise to the limit function

0 0.5 1 1.50

0.5

1

1.5

t

f n(t)

n=1n=2n=3n=4

Figure 1.3: An example of a sequence of functions which converges pointwise but not uni-formly to a limiting function as n→∞.

f(t) =

0 for t < 11 for t ≥ 1

. (1.19)

Note that the limiting function is discontinuous at t = 1, even though every functionfn (t) in the sequence is continuous at t = 1.

2. The sequence of functions

fn(t) =

n |t| < 12n

0 otherwise, (1.20)

does not converge pointwise to any function since the sequence of numbers fn(0)does not converge.

7


fn(t) =

n2t for 0 ≤ t ≤ 1n

n2(

2n − t

)

for 1n < t ≤ 2

n0 otherwise

(1.21)

converges pointwise to the zero function. i.e., for every t, the sequence of numbers fn (t)converges to zero (see Figure 1.4). This is because given any t > 0, there is an integerN, such that for all n > N, fn (t) = 0. For all other values of t, the value of fn (t) iszero for every n.

0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

3.5

4

t

f n(t)

n=1n=2n=3n=4

Figure 1.4: An example of a sequence of functions which converges pointwise but not uni-formly to the function which is identically zero.


fn(t) =1

nsin(n2t) (1.22)

converges pointwise to the zero function.

Example 1 shows that even when all the functions in a sequence are continuous, the point-wise limit can fail to be continuous (in this case at t = 1).

In example 3, each function fn(t) satisfies

∫ ∞

−∞fn(t) dt = 1. (1.23)

Thus the limiting value of the integral as n → ∞ is one. However, the limiting function iszero and so the integral of the limit function is zero. We see in general that

limn→∞

∫ ∞

−∞fn(t) dt 6=

∫ ∞

−∞f(t) dt even if fn(t)→ f(t) pointwise. (1.24)

8

In example 4, the derivative sequence

f ′n(t) = n cos(n2t) (1.25)

does not converge pointwise to any function, and certainly not to f ′(t), the derivative of thelimiting function. Thus in general

limn→∞

d

dtfn(t) 6= d

dtf(t) even if fn(t)→ f(t) pointwise. (1.26)

One way of trying to improve matters is to strengthen what we mean by convergence offunctions. We thus define the concept of uniform convergence. A sequence of functionsall defined on some set S is said to converge uniformly on S to a function f as n→∞ if∀ǫ > 0, ∃N ∈ N such that ∀t ∈ S, |fn(t)− f(t)| < ǫ whenever n ≥ N .

It is important to know how this is different from the concept of pointwise convergence.Translating the definition of pointwise convergence given previously into symbols reads ∀t ∈S, ∀ǫ > 0, ∃N ∈ N such that |fn(t) − f(t)| < ǫ whenever n ≥ N . We see that for pointwiseconvergence, the value of N depends both on ǫ and on t whereas for uniform convergence, thesame N must work for all t ∈ S. Notice that if a sequence of functions converges uniformlyto some limit, the sequence must also converge pointwise to that limit, but the converse doesnot necessarily hold.

Exercise: Show that the convergence in examples 1 and 3 is not uniform on R but thatconvergence is uniform in example 4.

Another way of checking whether a pointwise-convergent sequence is uniformly convergentis to look at the difference sequence dn(t) = fn(t) − f(t), which is another sequence offunctions. The convergence of fn(t) to f(t) is uniform on S provided that for all t ∈ S, |dn(t)|may be bounded above by a sequence which converges to zero. i.e., if we can find a sequenceMn such that Mn → 0 as n→∞ and |dn(t)| ≤Mn for all t ∈ S.

If fn converges to f uniformly on S, then

1. If each fn is continuous at a ∈ S, f is also continuous at a.

2. Provided that fn and f are integrable, the limit of the integral is equal to the integralof the limit, i.e.,

limn→∞

∫ x

afn(t) dt =

∫ x

af(t) dt, (1.27)

where [a, x] is contained in S. The convergence of the integral is uniform.

Exercise: Prove the above assertions using the definitions given.

However, example 4 shows that we still cannot interchange the limiting process and a deriva-tive, even given uniform convergence. This whole situation is rather unsatisfactory, especiallysince uniform convergence is quite a strong condition which may be difficult to establish inpractical situations. By contrast, by using the theory of distributions, we shall see that itis always possible to differentiate distributions and to interchange the limiting process withderivatives of any order.

9

1.3.2 Test functions, functionals and distributions

In equation (1.4) we found it convenient to think about the delta function in terms of itsaction on another function φ which is called a test function. The delta function may beregarded as a machine which operates on the test function φ to produce the number φ (0) asthe result. We shall consider such machines (Figure 1.5) which take test functions as inputsand produce numbers as outputs in more detail.

Figure 1.5: A Dirac delta function acts on a test function to return its value at zero. Moregenerally, a functional is a machine which acts on a test function to produce a number.

We must first define the class of test functions on which our machines work. This class isdenoted by D, which we shall require to be a vector space. For technical reasons, we shallinitially assume that test functions are taken from the class of infinitely differentiablefunctions which vanish outside a finite interval. It is important first to check that testfunctions exist.

Exercise: Show that if

h(t) =

exp(−1/t) for t > 00 otherwise

. (1.28)

Then φ(t) = h(t)h(1 − t) is a test function. (i.e., it is infinitely differentiable and vanishesoutside a finite interval). In Figure 1.6, graphs of h (t) and φ (t) are shown.

A functional T is a mapping from test functions to numbers. Given a test function φ ∈ D, wedenote the number obtained by applying T to φ by the notation 〈T, φ〉. We shall sometimeswrite this as 〈T (t), φ(t)〉 to indicate explicitly the independent variable although T is not tobe thought of as a function of t. Similarly we shall often refer to “the functional T (t)” bywhich we mean that T acts on a test function which is a function of t. In particular, it is notmeaningful to speak of “the value of T at t”.

Two functionals S and T are said to be equal (with respect to the class of test functions D)if for every test function φ ∈ D,

〈S, φ〉 = 〈T, φ〉. (1.29)

A distribution is a functional which is both linear and continuous. A functional T is saidto be linear if for any φ1, φ2 ∈ D and for any numbers c1 and c2, we have that

〈T, c1φ1 + c2φ2〉 = c1〈T, φ1〉+ c2〈T, φ2〉. (1.30)

10

−1 0 1 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

t

h(t)

−1 0 1 20

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

t

h(t)

h(1−

t)Figure 1.6: The function h (t) is infinitely differentiable everywhere, and in particular att = 0, although it is constant for t < 0. The function h (t)h (1− t) is infinitely differentiableeverywhere and vanishes identically outside the interval [0, 1] and is an example of a testfunction.

By analogy with the definition for an ordinary function to be continuous, the functional T issaid to be continuous if for any sequence of test functions φn which converges to φ in D,

limn→∞

〈T, φn〉 = 〈T, φ〉. (1.31)

In order for this definition to make sense, we need to know what is meant by “a sequence oftest functions converging in D”. It turns out to be convenient to define:

A sequence of test functions φn(t) in D converges to zero and we write φn → 0 if

1. for every k ∈ N ∪ 0, the sequence of k’th derivatives φ(k)1 (t), φ

(k)2 (t), ... converges

uniformly to zero.

2. there is a common interval [a, b] independent of n such that every φn(t) vanishesoutside of [a, b].

We then say that φn converges to φ in D iff the sequence φn − φ converges to zero inD.

Note: For a linear functional T , the condition for continuity of T may be written in termsof sequences of test functions φn(t) which converge to zero in D. The linear functional Tis continuous iff for any such sequence of test functions φn → 0,

limn→∞

〈T, φn〉 = 0. (1.32)

Example: The delta distribution δ is defined by

〈δ, φ〉 = φ(0). (1.33)

11

Check that this is indeed a distribution by showing that it is both linear and continuous.

Exercise: With the definition of h (t) given by equation (1.28), determine which of thefollowing sequences of test functions converge to zero in D as n→∞:

1. αn (t) = 1nh (t)h (1− t)

2. βn (t) = 1nh (nt)h (1− nt)

3. γn (t) = 1nh (n− t)h (n+ 1− t) .

1.3.3 Regular distributions

Suppose that f(t) is a locally integrable function (i.e., the integral of |f(t)| exists and isfinite for every finite interval [a, b]). Then associated with the function f we may define afunctional Tf via the rule

〈Tf , φ〉 =

∫ ∞

−∞f(t)φ(t) dt. (1.34)

Since φ (t) is continuous on a closed and bounded interval, it is bounded, so the integral existsand is finite for any test function φ ∈ D. We now wish to prove that the functional is in facta distribution.

Linearity follows from the linearity of integration. Continuity holds for if φn(t)→ 0 in D,and all the φn vanish outside [a, b],

∣

∣

∣

∣

∫ ∞

−∞f(t)φn(t) dt

∣

∣

∣

∣

≤∫ b

a|f(t)||φn(t)|dt ≤ max

t∈R

|φn(t)|∫ b

a|f(t)|dt.

As n→∞, the uniform convergence of φn (t) to the zero function means that the maximumof |φn (t)| converges to zero. Since f is locally integrable, the integral in the last term is finite,and so the entire right-hand side converges to zero.

Distributions which are induced by some locally integrable function are said to be regular.Other distributions (such as the delta distribution) are said to be singular. (As an exercise,prove that the delta distribution is not induced by any locally integrable function).

As mentioned above, two distributions S and T are said to be equal iff 〈S, φ〉 = 〈T, φ〉 forall φ ∈ D. Notice that it is possible for two different locally integrable functions f and g toinduce the same distribution, i.e., for

∫ ∞

−∞f(t)φ(t) dt =

∫ ∞

−∞g(t)φ(t) dt (1.35)

for all φ ∈ D. This will happen, for example, if f(t) and g(t) differ only on a set of measurezero, such as on a finite or countable set of points.

12

1.3.4 Some notational matters, generalized functions

Every locally integrable function f(t) induces a regular distribution Tf . We delibrately wishto blur the distinction between the function f(t) and the distribution it induces and so weshall often talk about “the distribution f(t)” to mean Tf and write 〈f(t), φ(t)〉 instead of〈Tf , φ〉. Thus for locally integrable functions we shall write

〈f(t), φ(t)〉 ≡ 〈Tf , φ〉 =

∫ ∞

−∞f(t)φ(t) dt. (1.36)

Since two different locally integrable functions f(t) and g(t) can induce the same distributionif they differ on a set of measure zero, we shall say that “f(t) = g(t) distributionally” tomean that they induce the same distribution, i.e., that for all φ ∈ D,

〈f(t), φ(t)〉 = 〈g(t), φ(t)〉. (1.37)

This is to be understood as being quite distinct from the usual concept of equality of func-tions. Normally, we would write f (t) = g (t) only if f and g agree at every point t in theircommon domain.

For singular distributions, such as the delta distribution δ there is no locally integrablefunction δ(t) satisfying

∫ ∞

−∞δ(t)φ(t) dt = 〈δ, φ〉 = φ(0). (1.38)

Nevertheless, we still use the notation above with the understanding that the integral formis simply an alternative way of writing the distribution. Unlike an ordinary function, wecannot evaluate δ(t) at an arbitrary value of t.

We shall always feel free to use the integral notation to denote the action of a distributionon a test function, whether the distribution is regular or singluar. We call the entity in theintegrand which multiplies the test function a “generalized function”.

1.3.5 Operations on distributions

In this section, we shall begin to describe how all this mathematical structure works. We shalluse the regular distributions to tell us how to define operations on general distributions.These definitions will be chosen so as to be consistent with the usual operations on ordinaryfunctions. The role of the test functions and the reasons for choosing them from such arestricted class (i.e., the infinitely differentiable functions which vanish outside a boundedset) will also become clearer.

Differentiation

Suppose that f(t) is a function which is both locally integrable and continuously differen-tiable. It induces a distribution 〈f, φ〉. The derivative f ′(t) of the function f(t) is also locallyintegrable and induces another distribution 〈f ′, φ〉. For consistency, we want the distribu-tion f ′ to be the derivative of the distribution f whenever the function f ′ is the derivativeof the function f .

13

For any test function φ ∈ D,

〈f ′(t), φ(t)〉 =

∫ ∞

−∞f ′(t)φ(t) dt (1.39)

= [f (t)φ (t)]∞−∞ −∫ ∞

−∞f(t)φ′(t) dt

= −〈f(t), φ′(t)〉, (1.40)

where the boundary terms from the integration by parts have been set to zero since testfunctions vanish outside a bounded set. The fact that φ is infinitely differentiable means thatφ′ is also infinitely differentiable and is also in D.

The above shows how we must define the derivative of a regular distribution. The notationsuggests that we should extend this to define the derivative of an arbitrary distribution T (t)to be the functional T ′ where

〈T ′(t), φ(t)〉 ≡ −〈T (t), φ′(t)〉. (1.41)

We need to check that T ′ is in fact linear and continuous so that it is actually a distribution.Both of these facts follow readily from our definitions of D and convergence in this space.

Examples:

1. What is u′(t)? (Note: u(t) is the unit step function defined by

u(t) =

1 for t > 00 for t < 0

. (1.42)

The value of u(0) is usually unspecified and can take on any value).

The function u(t) is locally integrable and induces the regular distribution

〈u(t), φ(t)〉 =

∫ ∞

−∞u(t)φ(t) dt =

∫ ∞

0φ(t) dt. (1.43)

The derivative of u is found by applying the definition

〈u′(t), φ(t)〉 = −〈u(t), φ′(t)〉

= −∫ ∞

0φ′(t) dt = −φ(∞) + φ(0)

= φ(0) = 〈δ(t), φ(t)〉. (1.44)

Since this holds for all φ ∈ D, we see that u′(t) = δ(t) in a distributional sense. Thisholds despite the fact that u(t) is not differentiable in the ordinary sense at the pointt = 0.

2. What is the derivative of δ(t)? Again we simply use the definition. Given any testfunction φ ∈ D, the action of δ′ on φ is given by

〈δ′(t), φ(t)〉 = −〈δ(t), φ′(t)〉 = −φ′(0). (1.45)

Formally, we write∫ ∞

−∞δ′(t)φdt = −φ′(0). (1.46)

14

This is a distribution called the dipole or doublet. Just as the charge density for apoint charge can be written as a delta function, the charge density for an electric dipolecan be written as a dipole function.

Similarly it is easy to show that

〈δ(k)(t), φ(t)〉 = (−1)k〈δ(t), φ(k)(t)〉 = (−1)kφ(k)(0). (1.47)

These examples and the definition of the derivative of distributions show that every gener-alized function can be differentiated an infinite number of times. Thus we have not onlyextended our class of locally integrable functions (which can be discontinuous) to the gen-eralized functions but in doing so have also made them all infinitely differentiable (in thedistributional sense).

Shifting

Suppose that T (t) is a distribution. How should we define the shifted distribution T (t− a)?Again we proceed via consistency with the regular distributions. Suppose that f(t) is alocally integrable function. We want to see how the distribution induced by f(t−a) is relatedto that induced by f(t). Using the definition of a regular distribution in terms of the integral,

〈f(t− a), φ(t)〉 =

∫ ∞

−∞f(t− a)φ(t) dt =

∫ ∞

−∞f(u)φ(u+ a) du = 〈f(t), φ(t+ a)〉. (1.48)

We thus make the definition:

Given any distribution T (t), the functional T (t− a) is defined by

〈T (t− a), φ(t)〉 = 〈T (t), φ(t + a)〉, (1.49)

for any φ ∈ D. It is easy to check that the shifted distribution is also a distribution.

Scaling

Suppose that T (t) is a distribution. We wish to assign a meaning to T (at) for a real numbera. Start once again with a locally integrable function f(t). The distribution induced by f(at)is

∫ ∞

−∞f(at)φ(t) dt =

1

a

∫ a∞

−a∞f(u)φ

(u

a

)

du. (1.50)

The limits on the right-hand side depend on the sign of a. If a > 0 this is

1

a

∫ ∞

−∞f(u)φ

(u

a

)

du =1

a

⟨

f(t), φ

(

t

a

)⟩

. (1.51)

On the other hand if a < 0, the lower limit is ∞ and the upper limit is −∞. To swap theselimits around, we need to introduce an additional minus sign and so the right-hand side of(1.50) becomes

1

a

∫ −∞

∞f(u)φ

(u

a

)

du = −1

a

⟨

f(t), φ

(

t

a

)⟩

. (1.52)

15

These results may be summarized by the rule

〈f(at), φ(t)〉 =1

|a|

⟨

f(t), φ

(

t

a

)⟩

. (1.53)

This holds for real, nonzero values of a. We use this to define the scaling law for an arbitrarydistribution T (t)

〈T (at), φ(t)〉 =1

|a|

⟨

T (t), φ

(

t

a

)⟩

. (1.54)

Examples:

1. The delta function is even:

〈δ(−t), φ(t)〉 =1

| − 1|

⟨

δ(t), φ

(

t

a

)⟩

= φ(0) = 〈δ(t), φ(t)〉 (1.55)

2. The shifted delta function:

〈δ(t − τ), φ(τ)〉 = 〈δ(τ − t), φ(τ)〉 = 〈δ(τ), φ(τ + t)〉 = φ(t) (1.56)

Thus we may write∫ ∞

−∞φ(τ)δ(t − τ) dτ = φ(t) (1.57)

This equation holds at each point t for any test function φ ∈ D.

Product of a C∞ function and a distribution

If T (t) is a distribution and g(t) is infinitely differentiable, g(t)T (t) is defined by

〈g(t)T (t), φ(t)〉 = 〈T (t), g(t)φ(t)〉 (1.58)

It is easy to show that this is a distribution and that it is consistent with the result for regulardistributions. We need g(t) to be C∞ to ensure that g(t)φ(t) ∈ D whenever φ(t) ∈ D.

In general, it is not possible to define the product of two arbitrary distributions.

Integral of a delta function

Consider

f(t) =

∫ t

−∞δ(τ) dτ (1.59)

This simply means that we require that f ′(t) = δ(t) and that f(−∞) = 0. Both of these aresatisfied by the unit step u(t) which is also called the Heaviside function (sometimes writtenas H(t)).

Since u(t) induces a regular distribution (as it is locally integrable), the successive integralsof u(t) are ordinary functions such as the ramp r(t) = t u(t), the parabola p(t) = 1

2t2 u(t) etc.

These all induce regular distributions.

16

1.3.6 Sequences and series of distributions

In the previous section, we have seen how various mathematical operations defined on ordi-nary functions may be extended to apply to distributions. In physics, it is useful to developan intuitive, pictorial understanding as well as a mathematical understanding of the theory.In this section we wish to see in what senses the intuitive picture of a delta function as aninfinitely high spike of zero width but with unit area is a useful representation. For example,although there is no function which is infinitely tall and of zero width, we feel intuitively thata sequence of functions fn (t) , such as

fn(t) =

n |t| < 12n

0 otherwise, (1.60)

in some sense “tends to” a delta function, even though the sequence does not have a pointwiselimit. As we shall see, the key idea is not to focus on the above as a sequence of functions,but rather on the sequence of regular distributions that the functions induce. To this end,we need to consider the properties of sequences of distributions and their convergence.

Suppose that Tn(t) is a sequence of distributions and T (t) is a distribution. We say thatTn(t) converges to T (t) and write Tn(t)→ T (t) distributionally if

limn→∞

〈Tn(t), φ(t)〉 = 〈T (t), φ(t)〉, (1.61)

for all test functions φ ∈ D.

Note: It is in fact unnecessary to assume that the limit T is a distribution. If for all φ ∈ D,〈Tn, φ〉 is a convergent series and we define the action of T in terms of this limit, T can beshown (with difficulty!) to be a distribution.

Theorem: If Sn and Tn are sequences of distributions which converge to S and Trespectively, then

1. aSn + bTn → aS + bT for any constants a and b,

2. T ′n → T ′,

3. Tn(at− b)→ T (at− b) for any real a and b,

4. g(t)Tn(t)→ g(t)T (t) for any C∞ function g(t).

Perhaps the most surprising aspect of these relationships is the ease of the proofs. Considerfor example the proof of the derivative result. For any test function φ ∈ D,

〈T ′n, φ〉 = −〈Tn, φ

′〉 → −〈T, φ′〉 = 〈T ′, φ〉 (1.62)

These results are nonetheless remarkable, since they tell us that all the common distributionaloperations commute with the limiting process.

Let us again consider the sequence of functions (1.60), which we saw does not convergepointwise. For each n, fn(t) is locally integrable and so induces the regular distribution

〈fn(t), φ(t)〉 = n

∫ 1/2n

−1/(2n)φ(t) dt. (1.63)

17

We wish to consider the sequence of distributions fn . For any test function φ ∈ D, asn→∞, the sequence 〈fn(t), φ(t)〉 tends to φ(0) since φ is continuous at t = 0. By definitionof the convergence of a sequence of distributions, we see that fn → δ distributionally whichconfirms our original intuition. This is an example of a sequence of functions which does notconverge pointwise, but which does converge distributionally.

Let us next consider the sequence of functions

gn(t) =

n2t for 0 ≤ t ≤ 1n

n2(

2n − t

)

for 1n < t ≤ 2

n0 otherwise

(1.64)

which, as we discovered (see Figure 1.4), converges pointwise to zero. Given a test functionφ ∈ D, we see that

∫ ∞

−∞gn(t)φ (t) dt = n2

∫ 1/n

0tφ (t) dt+ n2

∫ 2/n

1/n

(

2

n− t)

φ (t) dt (1.65)

We now wish to show that each of the terms on the right hand side is equal to 12φ (0) . Since

n2

∫ 1/n

0t dt =

1

2, (1.66)

we see that∣

∣

∣

∣

∣

n2

∫ 1/n

0tφ (t) dt− 1

2φ (0)

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

n2

∫ 1/n

0t (φ (t)− φ (0)) dt

∣

∣

∣

∣

∣

≤ n2

∫ 1/n

0|t| |φ (t)− φ (0)|dt. (1.67)

Since φ is a test function, it is continuous at t = 0. By definition of continuity, given anyε > 0, we can find δ > 0 such that whenever |t| < δ, |φ (t)− φ (0)| < ε. So if we choose anyn > δ−1,

∣

∣

∣

∣

∣

n2

∫ 1/n

0tφ (t) dt− 1

2φ (0)

∣

∣

∣

∣

∣

≤ n2

∫ 1/n

0|t| εdt =

ε

2. (1.68)

This establishes that

limn→∞

n2

∫ 1/n

0tφ (t) dt =

1

2φ (0) . (1.69)

Similarly, you should check that you can show

limn→∞

n2

∫ 2/n

1/n

(

2

n− t)

φ (t) dt =1

2φ (0) , (1.70)

so that

limn→∞

∫ ∞

−∞gn(t)φ (t) dt = φ (0) . (1.71)

This establishes that the sequence of distributions gn induced by gn (t) also tends distri-butionally to the Dirac δ.distribution. Notice that the distributional limit of a sequence canbe very different from the pointwise limit. It should also be clear that there are an infinite

18

number of sequences of regular distributions which converge distributionally to the delta dis-tribution. When contemplating the Dirac delta function, one should try to think of it as thecommon limit of all such sequences of distributions.

Exercise: Show that the sequence of functions (see figure 1.7)

hn (t) =1

π

(

n

n2t2 + 1

)

(1.72)

converges distributionally to δ (t) as n→∞.

−5 0 50

0.2

0.4

0.6

0.8

1

1.2

1.4

t

h n(t)

n=1n=2n=3n=4

Figure 1.7: A sequence hn (t) which tends distributionally to δ (t) .

Let us now see how the theorem helps us to visualize what we mean by the derivative of a deltafunction, which we denoted by δ′ (t) . According to the theorem if we can find any sequenceof distributions Tn with the property that Tn → δ distributionally, we are guaranteed thatT ′

n will also converge to the distribution δ′. We have found three sequences fn , gn andhn , all of which tend distributionally to δ. By differentiating each of these, we will getthree sequences of distributions which tend to δ′ (t) . It is straightforward to find h′n (t) andg′n (t) , which are

h′

n (t) = − 2

π

n3t

(n2t2 + 1)2(1.73)

and

g′n (t) =

n2 for 0 ≤ t ≤ 1n

−n2 for 1n < t ≤ 2

n0 otherwise

A graph of h′n (t) for n = 1, ..., 4 is shown in Figure 1.8.

The common feature of g′n and h′n is that as n → ∞, they both have a large positive peakfollowed by a large negative peak. The areas of each of the two peaks increases proportionallyto n. We can also find the sequence of distributions f ′n (t) , each of which turns out to be

19

−5 0 5−4

−3

−2

−1

0

1

2

3

4

t

h’n(t

)

n=1n=2n=3n=4

Figure 1.8: A sequence h′n (t) which tends distributionally to δ′ (t) , the derivative of the Diracδ function.

singular, since the functions fn are not differentiable. We may write fn (t) as the differenceof two unit step functions,

fn (t) =

n |t| < 12n

0 otherwise

= n

[

u

(

t+1

2n

)

− u(

t− 1

2n

)]

(1.74)

From our previous result (1.44) that u′ (t) = δ (t) distributionally, we see that

f ′n(t) = n

[

δ

(

t+1

2n

)

− δ(

t− 1

2n

)]

, (1.75)

which consists of two delta functions of strengths ±n located at distance 1/n apart. Asn→∞, the two spikes get closer to each other and their areas become larger. From this wecan see why δ′ is referred to as a dipole.

Given a test function φ (t) ∈ D, it is clear that

⟨

f ′n (t) , φ (t)⟩

=

⟨

n

[

δ

(

t+1

2n

)

− δ(

t− 1

2n

)]

, φ (t)

⟩

= n

⟨

δ

(

t+1

2n

)

, φ (t)

⟩

− n⟨

δ

(

t− 1

2n

)

, φ (t)

⟩

= n

[

φ

(

− 1

2n

)

− φ(

1

2n

)]

. (1.76)

Taking the limit as n→∞, we see that

limn→∞

⟨

f ′n (t) , φ (t)⟩

= −φ′ (0) , (1.77)

20

which is consistent with what we found (1.45) from the mathematical definition of the deriva-tive of a distribution.

These results often allow us to visualize arbitrary (singular) distributions since we can alwaysexpress such a distribution as the limit of a sequence of regular distributions associated witha sequence of locally integrable functions. It is possible to calculate with the singular distri-bution by taking limits over the results obtained with the sequence of regular distributions.

Exercise: Show that for any test function φ(t),

limn→∞

n

∫ ∞

−∞φ(t) cos(n2t) dt = 0 (1.78)

We can similarly treat a series of distributions∑∞

k=1 Tk which is said to converge to thedistribution S if the sequence of numbers 〈∑n

k=1 Tk(t), φ(t)〉 converges as n → ∞ for everyφ ∈ D. We may treat a series in terms of the sequence of its partial sums and apply the aboveresults for sequences. In particular we can differentiate a series of distributions term-by-term,i.e.,

⟨(

∞∑

k=1

Tk

)′

, φ

⟩

=

∞∑

k=1

〈T ′k, φ〉 (1.79)

Using a Riemann approach to integration as a limit of a summation, we may use linearity towrite distributional equations such as

f(t) =

∫ ∞

−∞f(τ)δ(t− τ) dτ (1.80)

where f is now regarded as a locally integrable function. What this means is that for allφ ∈ D,

⟨∫ ∞

−∞f(τ)δ(t− τ) dτ, φ(t)

⟩

= 〈f(t), φ(t)〉 (1.81)

Notice the difference between this distributional equality and the previous pointwise relation-ship (1.57).

We may regard (1.80) as expressing f(t) as a linear combination of impulses. It may behelpful to think of the analogy of writing a vector as a linear combination of basis vectors,e.g.,

v =∑

viei

The delta functions are the basis vectors, indexed by their location τ . The sum is over theindex of these basis vectors τ which correspond to i. The numbers f(τ) dτ are analogous tothe coefficients of the expansion vi, and the resulting function f is analogous to the vector v.

1.3.7 Differential equations involving distributions

We wish to consider differential equations in which the solutions and driving terms are gen-eralized functions. We shall mainly be concerned with linear differential equations withhomogeneous boundary and/or initial conditions.

As a specific example, consider the problem of a string of length L under tension T constrainedso that its transverse displacement u(x) is zero at the endpoints. i.e., u(0) = u(l) = 0. A

21

distribution of transverse “pressure” p(x) (actually a force per unit length) is applied to thestring and we wish to determine the resulting displacement. The differential equation is

Lu = −T d2u

dx2= p(x), u(0) = u(l) = 0 (1.82)

where L denotes the differential operator

L = −T d2

dx2(1.83)

For each distribution of pressure p(x) there is a resulting displacement u(x). We can treatthe equation like a physical system with p(x) as an input function and u(x) as the outputfunction. We see that the system is linear, i.e., if u1(x) is the displacement for pressurep1(x) and u2(x) is the displacement for pressure p2(x), the displacement if we apply thepressure c1p1(x) + c2p2(x) is c1u1(x) + c2u2(x). This follows from the linearity of L and thehomogeneity of the boundary conditions. Linearity allows us to apply the following importantidea:

We know that for any locally integrable function p(x), we can write p(x) as a linear combi-nation of impulses i.e.,

p(x) =

∫ ∞

−∞p(ξ)δ(x − ξ) dξ (1.84)

So if we know how the physical system responds to the input δ(x−ξ), we can use this to workout how it will respond to p(x) which is simply a linear combination of these delta functioninputs.

Let us suppose that we can find the solution h(x|ξ) to the problem

Lh(x|ξ) = δ(x − ξ), h(0|ξ) = h(l|ξ) = 0 (1.85)

where the input to the system is a delta function. Then by linearity the output for inputp(x) must be given by

u(x) =

∫ ∞

−∞p(ξ)h(x|ξ) dξ (1.86)

which is simply (1.84) with the delta function replaced by the response that it produces whenit is “processed” by the system.

The function h(x|ξ) is called the impulse response of the system. It is also known as theGreen’s function of the boundary value problem.

Example: Calculating the Green’s function for

−T d2h(x|ξ)dx2

= δ(x− ξ), h(0|ξ) = h(l|ξ) = 0 (1.87)

This corresponds to the application of a concentrated force at x = ξ. In this case h(x|ξ) canbe found by direct integration (figure 1.9). We see that

d2h

dx2= − 1

Tδ(x− ξ) (1.88)

dh

dx= c1 −

1

Tu(x− ξ) (1.89)

h(x|ξ) =

c2 + c1x for x < ξc2 + c1x− 1

T (x− ξ) for x > ξ(1.90)

22

The quantities c1 and c2 are constants of integration. For all values of c1 and c2, the functionh (x|ξ) satisfies the differential equation, but the boundary conditions are only satisfied for aspecial choice of the constants. Applying the boundary conditions, we find that c2 = 0 andc1 = (l − ξ)/(lT ). Hence

h(x|ξ) =

x (l − ξ)lT

for x < ξ

ξ (l − x)lT

for x > ξ(1.91)

Notice that

Figure 1.9: Calculation of Green’s function by direct integration. Values of constants arefound by satisfying the boundary conditions.

1. In regions x < ξ and x > ξ, the Green’s function satisfies the homogeneous equation

d2h

dx2= 0 (1.92)

2. At x = ξ, h(x|ξ) is continuous and ∂h/∂x has a jump discontinuity to give the deltafunction in the second derivative

dh

dx(ξ+|ξ)− dh

dx(ξ−|ξ) = − 1

T(1.93)

3. The solution of the boundary value problem for an arbitrary p(x) is given by

u(x) =

∫ l

0p(ξ)h(x|ξ) dξ (1.94)

=

∫ x

0p(ξ)

ξ (l − x)lT

dξ +

∫ l

xp(ξ)

x (l − ξ)lT

dξ (1.95)

23

Exercise: Check that this is a solution of the boundary value problem. You may find thefollowing result (which you should also prove if you have not seen it before) useful:

d

dx

∫ b(x)

a(x)f(t, x) dt =

∫ b(x)

a(x)

∂f

∂x(t, x) dt− f(a(x), x) a′(x) + f(b(x), x) b′(x) (1.96)

Green’s function for regular n’th order linear differential equations withhomogeneous initial/boundary conditions

The above example may be generalized to the following. Consider the linear n’th orderdifferential equation on the interval I

Lg = an(x)dng

dxn+ an−1(x)

dn−1g

dxn−1+ ...+ a1(x)

dg

dx+ a0(x)g = f(x) (1.97)

together with n homogeneous boundary or initial conditions (which constrain u and/or itsderivatives at the endpoints of I). We assume that an(x) 6= 0 for all x ∈ I so that theequation is regular.

The Green’s function h(x|ξ) is the solution of

Lh(x|ξ) = δ(x− ξ) (1.98)

where

1. h(x|ξ) satisfies the homogeneous equation (with zero right-hand side) for x < ξ and forx > ξ,

2. h(x|ξ) satisfies all the boundary or initial conditions for each ξ,

3. The functions h, ∂h/∂x,..., ∂n−2h/∂xn−2 are continuous throughout I and in particularat x = ξ.

4. ∂n−1h/∂xn−1 has a jump discontinuity at x = ξ and

dn−1h

dxn−1(ξ+|ξ)− dn−1h

dxn−1(ξ−|ξ) =

1

an(ξ)(1.99)

The solution to the original problem can be found for any f(x) and is

g(x) =

∫

If(ξ)h(x|ξ) dξ (1.100)

Example: Calculate the Green’s function for the boundary value problem

d2u

dx2+ k2u = f(x), u(0) = u(l) = 0 (1.101)

and use the Green’s function to find the solution for f(x) = x.

24

The Green’s function h (x|ξ) is a solution of

d2h(x|ξ)dx2

+ k2h(x|ξ) = δ(x − ξ), h(0|ξ) = h(l|ξ) = 0 (1.102)

Where the “spike” of the delta function is located at x = ξ. In order to find h, we considerthe two regions x < ξ and x > ξ separately. In each of the two regions, h is a solution of thehomogeneous differential equation, but with different constants. Hence

h (x|ξ) =

A cos kx+B sin kx for x < ξC cos kx+D sin kx for x > ξ

(1.103)

We have to choose the constants A, B, C and D (which all depend on ξ) to satisfy theboundary conditions and the “joining conditions” at x = ξ. The boundary condition at zerotells us that

A = 0, (1.104)

since x = 0 < ξ, and we have to use the upper expression for h. The boundary condition atx = l > ξ imposes constraints on the lower expression for h. In particular

C cos kl +D sin kl = 0. (1.105)

The joining conditions tell us firstly that h (x|ξ) is continuous at x = ξ since h is differentiablethere. Approaching x = ξ from below and above and equating the results shows us that

A cos kξ +B sin kξ = C cos kξ +D sin kξ. (1.106)

The first derivative ∂h/∂x is however discontinuous at x = ξ since the second derivative hasa delta function at ξ. The size of the jump discontinuity is equal to the area under the deltafunction, so that ∂h (ξ+|ξ) /∂x− ∂h (ξ−|ξ) /∂x = 1. In terms of the expressions above

(−kC sin kξ + kD cos kξ)− (−kA sin kξ + kB cos kξ) = 1 (1.107)

Solving equations (1.104) through (1.107) simultaneously yields

A = 0, B = −sin k (l − ξ)k sin kl

, C = −sin kξ

kand D =

sin kξ cos kl

k sin kl(1.108)

Substituting back into the expression for h (x|ξ) and simplifying, we find

h (x|ξ) =

−sin k (l − ξ) sin kx

k sin klfor x < ξ

−sin kξ sin k (l − x)k sin kl

for x > ξ. (1.109)

In this case, we note that the Green’s function is symmetric, i.e., that h (x|ξ) = h (ξ|x) . Thisis the case if the boundary value problem is self-adjoint. Symmetric Green’s functions canbe written more compactly in terms of the variables x< = min (x, ξ) and x> = max (x, ξ) . Inthe example,

h (x|ξ) =− sin kx< sin k (l − x>)

k sin kl(1.110)

This Green’s function is plotted in Figure 1.10.

25

0 0.2 0.4 0.6 0.8 1−0.05

0

0.05

0.1

0.15

x

h(x|

0.6)

Figure 1.10: Green’s function h (x|ξ) for ξ = 0.6, k = 7.5 and l = 1. Notice that the functionis continuous at x = ξ, but that the derivative has a jump of unit size there.

We now wish to solve the boundary value problem Lu = f with f (x) = x. By linearity, wehave that

u (x) =

∫ l

0h (x|ξ) f (ξ) dξ =

∫ l

0ξh (x|ξ) dξ (1.111)

Since h (x|ξ) has different functional forms depending on whether x < ξ or x > ξ, we splitthe range of integration into two. Notice carefully that the integration variable is ξ, so thatthe interval needs to be split into [0, x] and [x, l] . Thus,

u (x) =

∫ x

0ξh (x|ξ) dξ +

∫ l

xξh (x|ξ) dξ. (1.112)

Consider the first integral. In the interval ξ ∈ [0, x] , we have that ξ < x and so we mustuse the lower of the two expressions for h (x|ξ). For the second integral, in the intervalξ ∈ [x, l] , we have that ξ > x and so we must use the upper of the two expressions ofh (x|ξ) . Substituting the appropriate formulae gives

u (x) = −∫ x

0ξsin kξ sin k (l − x)

k sin kldξ −

∫ l

xξsin k (l − ξ) sin kx

k sin kldξ

= −sin k (l − x)k sin kl

∫ x

0ξ sin kξ dξ − sin kx

k sin kl

∫ l

xξ sin k (l − ξ) dξ

=x sin kl − l sin kx

k2 sin kl.

It is easy to check that this satisfies the boundary conditions u (0) = u (l) = 0 and u′′ (x) +k2u (x) = x, as required.

Exercise: Calculate the Green’s function for the following problems

26

1.d2u

dx2+ k2u = f(x), u(0) = u′(l) = 0

and use the Green’s function to find the solution for f(x) = 1.

2.

RCdvC

dt+ vC = vI , vC(−∞) = 0

This is the impulse response of a RC quasi-integrator circuit. Calculate the response ifvI(t) = cos 2πνt.

3.

LCd2vC

dt2+RC

dvC

dt+ vC = vI , vC(−∞) = v′C(−∞) = 0

This is the impulse response of a series LCR circuit when the output is taken acrossthe capacitor. Calculate the response when the circuit is excited with a unit step inputvoltage.

References:

Distribution theory is covered in many texts with classification number 515.782. Two readableaccounts are

Theory of Distributions, A non-technical introduction by J.I. Richards and H.K. Youn, (Cam-bridge University Press, 1990).

Distribution Theory and Transform Analysis by A.H. Zemanian, (McGraw-Hill, 1965).

A useful book for mathematical background is

Mathematical Analysis by T.M. Apostol, (Addison-Wesley, 1974).

27

2

Systems

2.1 The concept of a system

A system transforms an input into an output. The input and output may be functionswhich are usually thought of as depending on time, but can depend on space coordinates orany other “independent” variable(s). The values that the input and output can take may bereal or complex and they can be scalars or vectors. A vector-valued input or output maysometimes simply be a mathematical representation of several inputs or outputs.

Examples of systems are

1. The input is the voltage applied to an electrical network as a function of time and theoutput is the voltage across some component in the network, which is also a functionof time.

2. The input is the distribution of charge density in space (a scalar-valued function of thespace coordinates) and the output is the distribution of electric field (a vector-valuedfunction of the space coordinates).

3. The input is the position of the steering wheel, brake and accelerator pedals of a car (athree component vector function of time) and the the output is position of the car (athree component vector function of time).

4. The input is the wave function ψ(r, 0) of a particle at time t = 0 and the output is thewave function ψ(r, T ) at some later time t = T . In this case both input and output arecomplex-valued functions of space.

5. The input is the distribution of light and shade of a scene and the output is the dis-tribution of silver grains on a photographic negative taken of the scene through someimaging system (such as a camera). If the scene and image are divided into discrete pic-ture elements, the input and output are large vectors which are related by the physicsof the imaging process.

We shall mainly be concerned in the first part of this course with systems with scalar inputsand scalar outputs which are functions of a single independent variable which will usually betime. However, many of the methods developed are useful for more general systems as well.

29

2.2 Classes of systems

Consider a scalar-input scalar-output system with input f(t) and output g(t). In general, theoutput function g can depend on the entire input function f (i.e., its values at all times).

System- -f(t) g(t)

1. The system is said to be causal if g(t) does not depend on the values of f(τ) for τ > t.

2. The system is said to be linear if the output due to a linear combination of inputsis the same linear combination of the outputs due to each of the inputs separately.That is, if inputs f1(t) and f2(t) produce outputs g1(t) and g2(t) respectively, the inputc1f1(t) + c2f2(t) produces the output c1g1(t) + c2g2(t) for all constants c1 and c2.

Linear systems satisfy the superposition principle which is simply the above withc1 = c2 = 1. The importance of linearity is that it allows us to write the input asa sum of simpler inputs and to consider the effect of the system on each componentseparately. In particular, if we can write the input as a sum of the eigenfunctions of thelinear system, the output is easily found since the effect of a system on an eigenfunctionis simply to multiply it be the corresponding eigenvalue.

3. The system is said to be memoryless if the output at time t depends only on t andon the value of the input at time t (and not on the value of f at any other time).

4. The system is said to be shift-invariant or time-invariant if the response to a time-shifted version of the original input is the original output shifted by the same time.That is if f(t) produces g(t), then f(t− τ) produces g(t− τ).

2.2.1 Characterization of linear systems

Even though many practical systems are non-linear, it is often possible to linearize theircharacteristics for sufficiently small fluctuations about an operating point.

Example: Consider the relationship between pressure and density in a gas. Under adiabaticconditions P ∝ ργ which is a non-linear relationship. However when we consider soundpropagation for which the pressure and density fluctuations about the steady-state values aresmall, we find

(P0 + ∆P ) = k(ρ0 + ∆ρ)γ (2.1)

≈ kργ0 + (kγργ−1

0 )∆ρ (2.2)

This leads to the following linear (and memoryless) relationship between ∆P and ∆ρ

∆P = (kγργ−10 )∆ρ (2.3)

We can thus often apply linear systems theory as an approximation to the actual behaviour.

30

• The output of a linear system for an arbitrary input may be determined if we knowhow it responds to an impulse at a given time. This is based on the same argument aswe used for Green’s functions and differential equations.

– Since any input f (t) can be written as a linear combination of shifted delta func-tions,

f (t) =

∫ ∞

−∞f (τ) δ (t− τ) dτ (2.4)

– If the input δ (t− τ) produces the output h (t|τ) which is called the impulseresponse of the system at t to an impulse at τ,

– then the output g (t) must be the same linear combination of the impulse responses,namely

g (t) =

∫ ∞

−∞f (τ)h (t|τ) dτ (2.5)

• If the system is linear and causal, then h(t|τ) = 0 for t < τ .

• If the system is linear and memoryless, then h(t|τ) = c(t)δ(t − τ), and so g(t) =c(t)f(t).

• If the system is linear and time-invariant, then the response to a delta function at t−τis a shifted version of the response to a delta function at t = 0. i.e., h(t|τ) = h(t− τ |0).For time-invariant systems we often write h(t − τ) in place of h(t − τ |0) and so theoutput is

g (t) =

∫ ∞

−∞f (τ)h (t− τ) dτ (2.6)

This is called the convolution of f and h and is denoted by

g = f ∗ h (2.7)

Note that this is a relationship between functions. It is often (loosely) written asg(t) = f(t) ∗h(t) but this can be confusing since the value of g at time t depends on allthe values of f and h, not only their values at time t. In this case h (t) is the responseto an impulse at the origin, which by time-invariance allows us to know how the systemresponds to an impulse located at any other time.

• If the system is linear, memoryless and time-invariant we find that h(t) = cδ(t)and so g(t) = cf(t) for some constant c.

2.2.2 Graphical Interpretation

The result for a linear time-invariant system is of particular importance since many sys-tems encountered in physics have these properties. Here we shall illustrate graphically thesignificance of the derivation given above (see Figure 2.1).

Instead of using delta functions, let us first fix on a (large, positive) integer N and define

δN (t) =

N if |t| < 1/ (2N) ,0 otherwise.

(2.8)

31

Figure 2.1: Response of a linear time-invariant system to an input signal expressed as a sumof responses to scaled and shifted pulses. In the continuum limit, the pulses turn into Diracdelta functions and responses become shifted impulse responses.

This function is of unit area and is concentrated on the interval [−1/ (2N) , 1/ (2N)] aroundt = 0. It is easy to check that δN tends distributionally to δ as N →∞.

Let us now consider a function f (t) which we wish to apply as the input to our system. Ifwe consider the function values at the times k∆t∞k=−∞ where ∆t ≡ 1/N is the width of thesupport of δN , and define

fN (t) =

∞∑

k=−∞

1

Nf

(

k

N

)

δN

(

t− k

N

)

≡∞∑

k=−∞

[∆t f (k∆t)] δN (t− k∆t) , (2.9)

it is not difficult to see that fN (t) is a staircase approximation to the function f (t) . Youshould be able to show that if fN (t) is locally integrable in the Riemann sense, the distri-

32

butional limit of fN as N → ∞ is just f. (i.e., show that given any test function φ ∈ D,〈fN , φ〉 → 〈f, φ〉 as N →∞.)We wish to interpret the representation of fN given by the above expression as being asum of the “basis” functions δN (t− k∆t) weighted by the coefficients ∆t f (k∆t) . If weintroduce fN into our linear time-invariant system, by linearity, the output may be foundonce we know how each component of the sum is transformed by the system.

Suppose that the system responds to an input of δN (t) by producing the output hN (t) . Bytime-invariance, the response to the input δN (t− k∆t) must simply be hN (t− k∆t) . Wenow know how the system responds to all the basis functions. By linearity, when the inputis fN (t) , the output is the linear combination of the responses hN (t− k∆t) with the samecoefficients as in the expansion of fN in terms of the basis functions. The output is thus

∞∑

k=−∞

[∆t f (k∆t)]hN (t− k∆t) . (2.10)

If we now consider the distributional limit as N → ∞, we see that this is the convolutionintegral

g (t) =

∫ ∞

−∞f (τ)h (t− τ) dτ. (2.11)

We thus regard this as a linear combination of delayed impulse responses h (t− τ) withweights f (τ) dτ, just as f (t) may be regarded as a linear combination of delayed impulsesδ (t− τ) with the same weights.

Technical note: If we wish to be more general and consider locally integrable functions f (t)in the Lebesgue sense, we would define instead

fN (t) =∞∑

k=−∞

[

∫ (k+ 1

2)∆t

(k− 1

2)∆tf (τ) dτ

]

δN (t− k∆t) (2.12)

where the integral is a Lebesgue integral. It is easy to check that fN converges distributionallyto f (t) as N →∞.

Example: Direct Computation of a Convolution Integral

Consider the convolution of the functions

f (t) =

1 if |t| ≤ 10 otherwise

and

h (t) =

1− t if 0 ≤ t ≤ 10 otherwise

which are shown in Figure.2.2.

By definition, the convolution of these functions is

(f ∗ h) (t) =

∫ ∞

−∞f (τ)h (t− τ) dτ.

33

Figure 2.2: Functions to be convolved

Most of the complexity in this problem arises from having to consider the limits in the integralcarefully. First we note that the integration is carried out as a function of τ and that it isuseful to plot the integrand as a function of τ. The first factor f (τ) is straightforward, butwhen regarded as a function of τ, h (t− τ) involves reflecting the graph of h about the verticalaxis and shifting the result by t. Depending on the value of t, the functions f (τ) and h (t− τ)overlap in different ways. We can identify five separate cases as shown in Figure 2.3(a)–(e).

Figure 2.3: Computation of integrand of the convolution integral when (a) t < −1, (b)−1 ≤ t < 0, (c) 0 ≤ t < 1, (d) 1 ≤ t < 2 and (e) t ≥ 2.

34

Case (a) t < −1

Refering to Figure 2.3(a), we see that when t < −1, the product f (τ)h (t− τ) is identicallyzero for all τ. The convolution integral (f ∗ h) (t) is thus equal to zero.

Case (b) −1 ≤ t < 0

Refering to Figure 2.3(b), we see that when −1 ≤ t < 0, the product f (τ)h (t− τ) is non-zeroonly over the interval τ ∈ [−1, t] . Thus the limits of the convolution integral need to be setto these values

(f ∗ h) (t) =

∫ t

−1f (τ)h (t− τ) dτ =

∫ t

−11× [1− (t− τ)] dτ

=

∫ t

−1(1− t+ τ) dτ =

1− t22

Case (c) 0 ≤ t < 1

Refering to Figure 2.3(c), we see that when 0 ≤ t < 1, the product f (τ)h (t− τ) is non-zeroover the interval τ ∈ [t− 1, t] . Thus the limits of the convolution integral need to be set tothese values

(f ∗ h) (t) =

∫ t

t−1f (τ)h (t− τ) dτ =

∫ t

t−11× [1− (t− τ)] dτ

=

∫ t

t−1(1− t+ τ) dτ =

1

2

Case (d) 1 ≤ t < 2

Refering to Figure 2.3(d), we see that when 1 ≤ t < 2, the product f (τ)h (t− τ) is non-zeroover the interval τ ∈ [t− 1, 1] . Thus the limits of the convolution integral need to be set tothese values

(f ∗ h) (t) =

∫ 1

t−1f (τ)h (t− τ) dτ =

∫ 1

t−11× [1− (t− τ)] dτ

=

∫ 1

t−1(1− t+ τ) dτ =

(t− 2)2

2

Case (e) t ≥ 2

Refering to Figure 2.3(e), we see that when t < −1, the product f (τ)h (t− τ) is identicallyzero for all τ. The convolution integral (f ∗ h) (t) is thus equal to zero.

35

Summarizing the results of all the above cases, we see that the convolution integral is equalto

(f ∗ h) (t) =

1− t22

for − 1 ≤ t < 012 for 0 ≤ t < 1

(t− 2)2

2for 1 ≤ t < 2

0 otherwise

A graph of this function is shown in Figure 2.4.

Figure 2.4: Result of convolution

2.3 Properties of the Convolution Integral

In the following, we regard f , g and h to be functions of time. You should be able to fill inthe proofs of these from the definition of the convolution.

Technical note: We shall also assume that these results hold for generalized functions aswell, although this is not always true since the convolution of two generalized functions is notalways a generalized function. (e.g., u(t) ∗u(−t) is undefined). It can be shown however thatit is possible to define convolution of two generalized functions if one of them is of boundedsupport or if both of them have supports bounded to the left or if both of them havesupports bounded to the right. A similar problem arises if we try to multiply togetherarbitrary generalized functions as the result is not always a generalized function (e.g., δ(t)2

is undefined). We shall not consider these issues in further detail, although many of thetechnical mathematical issues associated with the theory of distributions are precisely in thisarea.

2.3.1 Commutativity

f ∗ g = g ∗ f (2.13)

Proof: By definition

f ∗ g =

∫ ∞

−∞f(τ)g(t− τ) dτ

Change variable of integration to u = t− τ and the result follows.

Thus the output of a linear time-invariant system with impulse response h(t) for an inputf(t) is the same as that of a linear time-invariant system with impulse response f(t) for aninput h(t).

36

2.3.2 Associativity

(f ∗ g) ∗ h = f ∗ (g ∗ h) (2.14)

provided that the following (triple convolution) integral exists (and is finite)

∫ ∞

−∞

∫ ∞

−∞f(u)g(v − u)h(t− v) dudv (2.15)

Proof: Do as an exercise, using Fubini’s theorem (i.e., double integrals can be evaluated asiterated integrals in either order).

It is thus possible to write f ∗ g ∗ h unambiguously.

(Technical note: Counterexample showing that associativity is false if the triple convolutionintegral diverges.

(u ∗ δ′) ∗ 1 = δ ∗ 1 = 1

u ∗ (δ′ ∗ 1) = u ∗ 0 = 0

The triple convolution integral diverges because u ∗ 1 is divergent. Associativity is guaran-teed if at most one of the terms in the multiple convolution vanishes outside some boundedinterval.)

2.3.3 Linearity

For any constants c1 and c2,

(c1f1 + c2f2) ∗ g = c1(f1 ∗ g) + c2(f2 ∗ g) (2.16)

f ∗ (c1g1 + c2g2) = c1(f ∗ g1) + c2(f ∗ g2) (2.17)

Proof: This follows from linearity of the integral.

2.3.4 Shifting

Suppose that we have a system with the property that whenever a signal f (t) is fed in, theoutput g (t) is a delayed version of the input, i.e.,

g (t) = f (t− τ) (2.18)

where τ is the delay time. It is easy to see that such a system is both linear and shift-invariant.By our fundamental result, it must be possible to characterize this system completely byspecifying its impulse response h (t) . In order to find the impulse response, we need only askwhat the output is when an impulse δ (t) is fed into the system. Since the system delays anyinput by τ, the output for the impulsive input must be h (t) = δ (t− τ) which we may denoteby δτ (t).

We now expect to be able to calculate the response to an arbitrary input by means of theconvolution. If the input is f (t) , the output will be (f ∗δτ ) which should be equal to f (t− τ)by the above argument. Thus,

(f ∗ δτ )(t) = f(t− τ) (2.19)

37

which is sometimes loosely written as f(t) ∗ δ(t− τ) = f(t− τ).

You should check that you can prove this result formally as well, using the definition of thedelta function.

Exercise: Convince yourself that for any function f,

(f ∗ δ) (t) = f (t) , (2.20)

i.e., if one builds a linear, shift-invariant system whose response to an impulse δ (t) is simplyto produce the output δ (t) , then its output is always equal to its input.

2.3.5 Relationship with differentiation

(f ∗ g)′ = f ′ ∗ g = f ∗ g′ (2.21)

Proof:

(f ∗ g)′ =d

dt

∫ ∞

−∞f(τ)g(t− τ) dτ (2.22)

=

∫ ∞

−∞f(τ)

∂

∂tg(t− τ) dτ (2.23)

=

∫ ∞

−∞f(τ)g′(t− τ) dτ (2.24)

= f ∗ g′ (2.25)

The other equality follows from commutativity.

Repeated application of this result yields

(f ∗ g)(m+n) = f (m) ∗ g(n) (2.26)

where f (m) denotes the m’th derivative of f , etc. Note that this result applies for generalizedfunctions which are always infinitely differentiable.

A somewhat more physical way of proving this result involves considering a system whichis a differentiator. Such a system simply produces an output which is the derivative of itsinput. It is easy to see that the differentiator is a linear and shift-invariant system so itsaction on any input f must be the convolution f and the impulse response of the system.Since the response of a differentiator to δ (t) is simply to produce δ′ (t) , it follows that δ′ (t)is the impulse response, and hence that

f ′ = f ∗ δ′ (2.27)

for any function f. The required result then follows straightforwardly from the associativityof the convolution.

38

2.3.6 Relationship with integration

If

F (t) =

∫ t

−∞f(τ) dτ, and (2.28)

G(t) =

∫ t

−∞g(τ) dτ (2.29)

then

(F ∗ g)(t) = (f ∗G)(t) =

∫ t

−∞(f ∗ g)(τ) dτ (2.30)

Proof: Do as an exercise. Start from the definition and be careful about the integrationlimits.

Alternatively we may regard the definite integral of f from −∞ to t as the convolution of fwith a unit step. i.e.,

F (t) =

∫ t

−∞f(τ) dτ = f ∗ u (2.31)

(to see this, just think about the impulse response of a system which integrates its input).From associativity and commutativity of the convolution, it follows that the integral of theconvolution of two functions can be found by integrating either function and convolving withthe other. i.e.,

(f ∗ g) ∗ u = (f ∗ u) ∗ g = f ∗ (g ∗ u) (2.32)

Exercise: Show that u ∗ δ′ = δ. Hence show that if F (t) is given by (2.31), F ∗ g′ = f ∗ g.

2.4 Example: Indirect Computation of a ConvolutionIntegral

Here we reconsider the problem of convolving

f (t) =

1 if |t| ≤ 10 otherwise

and

h (t) =


using some of the properties of the convolution integral discussed. In particular, we notethat:

• Convolving any function with a delta function is a particularly simple operation due tothe shifting property.

• It is possible to reduce a piecewise polynomial function to a collection of delta functionsby differentiation.

• Convolution is a linear operation.

39

Figure 2.5: Indirect convolution of two functions f and h by first computing f ′ ∗ h andintegrating the result.

• Differentiating f (t) with respect to time reduces it to two delta functions (see Figure2.5(c)):

f ′ (t) = δ (t+ 1)− δ (t− 1) (2.33)

Convolving f ′ with h can be done using the shifting property and linearity (see Figure2.5(d)):

(

f ′ ∗ h)

(t) = h (t+ 1)− h (t− 1) (2.34)

=

−t for − 1 ≤ t < 0t− 2 for 1 ≤ t < 20 otherwise

(2.35)

This is the derivative of the desired convolution f ∗ h. It thus remains to compute theintegral, which may readily be found as follows:

∫ t

−∞

(

f ′ ∗ h)

(τ) dτ =

∫ t

−1− τ dτ for − 1 ≤ t < 0

12 for 0 ≤ t < 1

12 +

∫ t

1(τ − 2) dτ for 1 ≤ t < 2

0 otherwise

=

1− t22

for − 1 ≤ t < 012 for 0 ≤ t < 1

(t− 2)2

2for 1 ≤ t < 2

0 otherwise

(2.36)

This is the same result as found earlier by direct evaluation of the convolution integral.

2.5 Stability of linear time-invariant systems

A linear system is said to be stable if the output is bounded for every bounded input.

40

i.e., if there exists M > 0 such that |f(t)| < M for all time t, then when f(t) is applied tothe linear system, we must be able to find K > 0 such that |g(t)| < K for all t.

Theorem: A linear time-invariant system is stable if and only if its impulse response h(t) isabsolutely integrable, i.e.,

I =

∫ ∞

−∞|h (t)| dt <∞ (2.37)

Proof: (⇐) We assume that the integral I is finite and have to show that the output forevery bounded input is bounded.

Let the input f(t) be bounded, so that there is some M > 0 such that |f(t)| < M for all t.Then since

|g (t)| =∣

∣

∣

∣

∫ ∞

−∞f (t− τ)h (τ) dτ

∣

∣

∣

∣

(2.38)

≤∫ ∞

−∞|f (t− τ)| |h (τ)| dτ (2.39)

< M

∫ ∞

−∞|h (τ)| dτ = MI (2.40)

we see that |g (t)| is bounded above by MI.

(⇒) We now suppose that the integral for I diverges. We shall construct a bounded inputfunction such that the output for this particular input diverges at t = 0. (i.e., we prove thecontrapositive statement: I diverges ⇒ system is not stable.)

Define the input function in terms of the impulse response,

f (t) =

|h (−t)| /h (−t) if h (−t) 6= 00 otherwise

(2.41)

This is clearly bounded since |f(t)| ≤ 1 for all t. The output for this input is

g (t) =

∫ ∞

−∞f (t− τ)h (τ) dτ (2.42)

=

∫ ∞

−∞

|h (τ − t)|h (τ − t) h (τ) dτ (2.43)

At t = 0, the output is

g (0) =

∫ ∞

−∞|h (τ)| dτ (2.44)

which is infinite, by assumption. The system is thus not stable.

Exercise: Is the system consisting of the series LCR circuit (where the input voltage isapplied across all three components and the output voltage is taken across the capacitor)stable in the limit as R→ 0?

2.6 Eigenfunctions and eigenvalues of linear time-invariantsystems

Linear time-invariant systems have the important property that whenever a complex ex-ponential is applied as the input, the output is always a complex exponential of the same

41

frequency. The amplitude of the output is in general a frequency-dependent complex number.Mathematically, it is no problem to consider feeding a complex input signal into a system,but it is also possible to interpret the operation physically by considering feeding the realand imaginary parts of the signal separately into two identical copies of the system. If thecomplex signal is written in terms of its real and imaginary parts f (t) = fr (t) + jfi (t), andif the outputs produced by inputs fr (t) and fi (t) are gr (t) and gi (t) respectively, then bylinearity the response to f (t) must be gr (t) + jgi (t) .

Consider a linear time-invariant system with impulse response h(t). If the input to the systemis the complex exponential of frequency ν,

f (t) = exp (j2πνt) , (2.45)

then the output is given by the convolution

g (t) = (f ∗ h) (t) = (h ∗ f) (t) (2.46)

=

∫

h (τ) f (t− τ) dτ (2.47)

=

∫

h (τ) exp (j2πν [t− τ ]) dτ (2.48)

=

(∫

h (τ) exp (−j2πντ) dτ

)

exp (j2πνt) (2.49)

= H (ν) exp (j2πνt) . (2.50)

We see that g (t) is simply a multiple of the input. H (ν) is called the system function orthe transfer function. If we regard the system as a linear transformation which maps finto g, this means that exp (j2πνt) is an eigenfunction of any linear, time-invariant systemand the associated eigenvalue is H (ν) . As we shall see, H (ν) is the Fourier transform of theimpulse response h (t).

This is the basis of the result used in a.c. circuit theory that whenever a linear (time-invariant)electrical network is excited with sinusoidal generators of a single frequency, all the currentsand voltages in the network are sinusoidal with the same frequency. They only differ in theiramplitudes and phases.

Exercise: Show that if h (t) is real-valued and if f (t) = cos 2πνt, then

g (t) = |H (ν)| cos (2πνt+ arg [H (ν)])

so that the modulus of H (ν) gives the gain and arg [H (ν)] gives the phase shift.

More generally, exp(st) for any complex s is an eigenfunction of a linear time-invariant system(provided that the output remains finite, which will usually be the case for at least some valuesof s). This can be demonstrated using only the definitions of linearity and time-invariance.Suppose we write the output of the linear time-invariant system when the input is exp(st) asg(s, t).

If the input is changed to exp[s(t+ τ)], we can calculate the new output in one of two ways:

1. By time-invariance, the output must be given by g(s, t+ τ), or

42

2. if we write exp[s(t + τ)] as exp(sτ) exp(st), then by linearity, the output must beexp(sτ)g(s, t).

These must be the same and so

g (s, t+ τ) = exp(sτ)g(s, t), (2.51)

for every t and τ. If we set t = 0, we find that

g (s, τ) = exp(sτ)g(s, 0), (2.52)

and since τ is a dummy variable, we may relabel it as t to obtain

g (s, t) = g(s, 0) exp(st). (2.53)

Thus the output when the input is exp (st) is the input multiplied by a quantity which isa function of s alone. This includes the previous result as a special case when s = j2πν ispurely imaginary.

2.7 Cascaded systems

Two systems are said to be cascaded when the output of the first is used as the input to thesecond. If the impulse responses h1(t) and h2(t) of the two systems are known, it is easy tocompute the impulse response h(t) of the cascaded system.

Consider an input function f . The output of the first system is f ∗ h1 which is used as theinput for the second system. The output of the second system is thus (f ∗ h1) ∗ h2. By theassociativity of the convolution, we see that

h (t) = (h1 ∗ h2) (t) . (2.54)

h1(t)

H1(ν)- -f(t)

h2(t)

H2(ν)- g(t)

h(t)

H(ν)-f(t) - g(t)

If we now consider the corresponding transfer functions H1(ν) and H2(ν), we see that aninput of exp(j2πνt) yields an output from the first system of H1(ν) exp(j2πνt) since com-plex exponentials are eigenfunctions of the first system. The second system in turn mul-tiplies this complex exponential by H2(ν) so that the output of the cascaded system is

43

H1(ν)H2(ν) exp(j2πνt). We conclude that complex exponentials are also eigenfunctions ofthe cascaded system and that the eigenvalue at frequency ν is given by

H (ν) = H1 (ν)H2 (ν) . (2.55)

From these results, we see that convolution of the impulse responses corresponds to multipli-cation of the associated transfer functions. We shall return to this result in connection withFourier transforms.

Exercises:

1. Calculate the convolution of

f(t) =


and

h(t) =

sin(πt) 0 ≤ t ≤ 20 otherwise

2. Show that the convolution of f(t) = exp(−at2) and h(t) = exp(−bt2) is also a Gaussian.In fact

(f ∗ h)(t) =

√

π

a+ bexp

(

− ab

a+ bt2)

which is wider than either of the original Gaussians.

3. Calculate the convolution of sinc(t) with itself, directly from the definition of the con-volution integral, where

sinc(t) =

sin(πt)

πtif t 6= 0

1 if t = 0

4. Calculate the convolution of sinc(at) and sinc(bt) from the definition. The result shouldbe

sinc [min (a, b) t]

max(a, b).

44

3

The Fourier Transform

3.1 Introduction

There are two main approaches to Fourier transform theory

1. Define the Fourier transform of a function f(t) as

F (ν) =

∫ ∞

−∞f(t) exp(−j2πνt) dt (3.1)

and show that under suitable conditions f(t) can be recovered from F (ν) via the inversetransform relationship

f(t) =

∫ ∞

−∞F (ν) exp(+j2πνt) dν (3.2)

This can be motivated in terms of finding the eigenvalues of a linear time-invariantsystem, as discussed in the previous chapter. The calculation of F (ν) from f (t) is calledFourier analysis, while the recovery of f (t) from F (ν) is called Fourier synthesis.

2. Start from the Fourier series which expresses a periodic function of t as a sum ofcosinusoidal and sinusoidal functions and let the period T become large.

The first approach is more satisfactory mathematically but the second is somewhat moreeasily motivated physically. We shall adopt the first approach and derive the second viageneralized function theory.

3.2 The Fourier transform and its inverse

Historically, people were interested whether or not f(t) could be successfully recovered fromF (ν) on a point-by-point basis. Thus if we define the Fourier transform by (3.1) and thencalculate

fM(t) =

∫ M

−MF (ν) exp(j2πνt) dν (3.3)

we would like to show that fM(t)→ f(t) as M →∞.

45

Substituting (3.1) into the equation for fM(t) yields

fM(t) =

∫ M

−M

[∫ ∞

−∞f(τ) exp(−j2πντ) dτ

]

exp(j2πνt) dν (3.4)

=

∫ ∞

−∞f(τ)

[∫ M

−Mexp [j2πν (t− τ)] dν

]

dτ (3.5)

=

∫ ∞

−∞f(τ)

sin [2πM (t− τ)]π (t− τ) dτ (3.6)

This is called a Dirichlet integral. It is a weighted average of f(τ) with a sin (Mx) /xweighting centred about τ = t. Equivalently, we may regard it as the convolution of f withthe function

hM (t) =sin (2πMt)

πt

If we can show that as M tends to infinity hM (t) → δ (t) , then (f ∗ hM ) → (f ∗ δ) = f, sothe inverse transform will recover f(t) (see Figure 3.1).

−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−2

0

2

4

6

8

10

12

14

16The function sin(2πMt)/(πt) for M=8

t

sin(

2πM

t)/(

πt)

Figure 3.1: Function hM (t) with which f (t) is convolved when evaluating the inverse trans-form integral using integration limit M = 8

(Technical note: It is easy to show that the Dirichlet integral evaluates to f(t) provided fis well behaved, e.g. if f is differentiable at t (as we shall show later). However, we usuallywant f to be less well behaved than this and the result becomes harder to prove. In fact,even as strong a condition as the continuity of f at t is not sufficient to make the inverseFourier transform converge at t.)

Notes:

1. The Fourier transform transforms one complex function of a real variable into anothercomplex function of a (different) real variable.

46

2. Other definitions of the Fourier transform are also in use which differ in scaling. Acommon alternative, particularly when associated with the Laplace transform, is

F (ω) =

∫ ∞

−∞f(t)e−jωt dt (3.7)

f(t) =1

2π

∫ ∞

−∞F (ω)ejωt dω (3.8)

and yet another version is similar but with a 1/√

2π factor in front of both integrals.Our version keeps most of the 2π’s in the exponential factor where they belong andsimplifies and rationalises the scaling in many other places. However one must be ableto cope with the other forms because is no general agreement as to which definitionshould be used. Another (less common) variation is to associate the negative sign inthe exponent with the inverse rather than the forward transform.

3.2.1 Existence and invertibility of the Fourier transform

Many conditions can be used to describe classes of functions which have classical Fouriertransforms (i.e., the Fourier transform should exist everywhere and be finite). It is necessaryto further restrict this class if we want to be able to recover the original function by usingthe inverse transform formula.

One sufficient (though not necessary) condition (due to Jordan) is that if f is in L1 (i.e., fis absolutely integrable) and if it is of bounded variation on every finite interval, thenF (ν) exists and f(t) can be recovered from the inverse Fourier transform relationship at eachpoint at which f is continuous.

The first condition (that f is in L1) means that

∫ ∞

−∞|f(t)|dt <∞ (3.9)

while the second (bounded variation) means that f(t) can be expressed as the difference oftwo bounded, monotonic increasing functions and excludes functions like t sin(1/t).

If f(t) is discontinuous at t = t0, the inverse Fourier transform integral still converges at t0and its value there is

1

2

[

f(

t+0)

+ f(

t−0)]

(3.10)

(Technical note: The right and left-hand limits are guaranteed to exist for functions ofbounded variation.)

Notes:

1. The existence of the Fourier transform is guaranteed if f is just absolutely integrable.Bounded variation in a neighbourhood of a point is needed for the inverse transform torecover the value of the function at that point (or the middle of the jump as in (3.10)if the function is discontinuous there).

47

2. The above approach to Fourier transforms is asymmetrical in the sense that the Fouriertransforms of many absolutely integrable functions are not absolutely integrable andso do not lie in the original space of functions. For example, as we shall see later, theFourier transform of the absolutely integrable “top-hat” function Π(t) defined by

Π(t) =

1 if |t| < 12

0 otherwise(3.11)

is sinc(ν). Although sinc(ν) is bounded, it is not absolutely integrable. The inversetransform integral in this case has to be interpreted somewhat differently from that inthe Fourier transform. Technically, when the integral in the Fourier transform is takenas a Lebesgue integral, that in the inverse Fourier transform is an improper Riemannintegral which may only exist in the sense of the Cauchy principal value.

Of course, if the Fourier transform of the function does happen to be absolutely inte-grable, the inverse transform integral can be taken as a standard Lebesgue integral aswell.

3. An alternative approach is to restrict the class of functions to be the square integrablefunctions (i.e. the so called L2 functions) for which

∫ ∞

−∞|f(t)|2 dt <∞ (3.12)

It can be shown that the Fourier transform of an L2 function is guaranteed to be anotherL2 function. This leads to a more symmetrical theory but in this case the transformand the inverse transform do not necessarily converge pointwise but may do so only inthe L2 sense.

4. Both of the above approaches are too restrictive for many practical applications becausewe want to be able to take Fourier transforms of a wider class of functions and to besure that the spaces of the original and transformed functions coincide. As we shall seelater, considering generalized functions allows us to do just this and to simplify greatlythe conditions for existence of the Fourier transform.

3.3 Basic properties

If F (ν) is the Fourier transform of f(t) we will write f(t) ↔ F (ν) etc. It can generally beassumed that a function denoted by a capital (upper case) letter is the Fourier transform ofthe function denoted by the corresponding small (lower case) letter.

Proofs of the following properties, where omitted, should be done as exercises. A commonstep is to interchange the order of integration and we will assume that this is allowed, butwith the understanding that in each case the functions must be sufficiently well behaved.These properties also apply for Fourier transforms of generalized functions (to be definedlater) except where explicitly stated.

3.3.1 Linearity

c1f1(t) + c2f2(t)↔ c1F1(ν) + c2F2(ν) (3.13)

for any real or complex constants c1 and c2.

48

3.3.2 Behaviour under complex conjugation

f∗(t)↔ F ∗(−ν) (3.14)

Note the reversal of the frequency axis.

3.3.3 Duality

F (t)↔ f(−ν) (3.15)

For every forward Fourier transform there is a corresponding dual inverse transform which isalmost the same except for a sign reversal.

Note that if we are not working with L2 functions or generalized functions, we must add thecondition that the appropriate transforms exist.

3.3.4 Shifting

f(t− t0)↔ exp(−j2πνt0)F (ν) (3.16)

exp(+j2πν0t)f(t)↔ F (ν − ν0) (3.17)

Shifting a function in one domain has no effect on the magnitude of the correspondingfunction in the other domain but affects only its phase. The amount of phase shift varieslinearly and the slope depends on the amount of shift in the other domain.

3.3.5 Modulation

f(t) cos(2πν0t)↔1

2[F (ν + ν0) + F (ν − ν0)] (3.18)

f(t) sin(2πν0t)↔j

2[F (ν + ν0)− F (ν − ν0)] (3.19)

3.3.6 Interference

f(t− t0) + f(t+ t0)↔ 2 cos(2πνt0)F (ν) (3.20)

3.3.7 Scaling

f(at)↔ 1

|a| F(ν

a

)

(3.21)

1

|b| f(

t

b

)

↔ F (bν) (3.22)

The form is the same in each direction. Factors a and b must be real. Note in particularthat

f(−t)↔ F (−ν) (3.23)

49

3.3.8 Differentiation

df(t)

dt↔ j2πν F (ν) (3.24)

−j2πt f(t)↔ dF (ν)

dν(3.25)

Exercise: Show that if tkf(t) is absolutely integrable for all 0 ≤ k ≤ n then the Fouriertransform F (ν) is differentiable n times.

3.3.9 Integration

Provided that∫ ∞

−∞f(t) dt = 0 (3.26)

we find that∫ t

−∞f(τ) dτ ↔ 1

j2πνF (ν) (3.27)

Similarly provided that∫ ∞

−∞F (ν) dν = 0, (3.28)

− 1

j2πtf(t)↔

∫ ν

−∞F (µ) dµ (3.29)

If the integrals over the entire range of the variables do not vanish, we must use generalizedfunctions in the transforms, as will be discussed later.

3.3.10 Symmetry

Let us write the real and imaginary parts of the transform pair f(t) and F (ν) explicitly asfollows:

f(t) = fr(t) + jfi(t)

F (ν) = Fr(ν) + jFi(ν)

In these, fr, fi, Fr and Fi are all real functions,

1. f(t) real =⇒ F (−ν) = F ∗(ν) , i.e. F is hermitian (meaning that Fr is even and Fi isodd).

2. f(t) imaginary =⇒ F (−ν) = −F ∗(ν) , i.e. F is antihermitian (meaning that Fr is oddand Fi is even).

3. f(t) even =⇒ F (−ν) = F (ν) , i.e., F (ν) is even.

4. f(t) odd =⇒ F (−ν) = −F (ν) , i.e., F (ν) is odd.

50

Some combinations of these cases are also meaningful. For example, the Fourier transformof a real, odd function must be both hermitian and odd, which means that it must be purelyimaginary.

Any arbitrary function f(t) can be written as the sum of an even function fe(t) and an oddfunction fo(t).

f(t) = fe(t) + fo(t) (3.30)

fe(t) =f(t) + f(−t)

2(3.31)

fo(t) =f(t)− f(−t)

2(3.32)

Therefore, if f(t) is real,

fe(t)↔ Fr(ν) (3.33)

fo(t)↔ jFi(ν) (3.34)

Thus for a real function f(t), the real part of the Fourier transform is due to the even partof f(t) and the imaginary part of the transform is due to the odd part of f(t).

For a causal real function (i.e., f(t) = 0 for t < 0), we have in addition that

fe(t) = fo(t) = 12f(t) if t > 0

fe(t) + fo(t) = 0 if t < 0(3.35)

This dependence between fe and fo means that there is also a dependence between Fr andFi.

3.3.11 Convolution

f(t)

F (ν)- h(t)

H(ν)- g(t)

G(ν)

Theorem: If g(t) = (f ∗ h)(t) then G(ν) = F (ν)H(ν).

Proof:

G(ν) =

∫ ∞

−∞exp(−j2πνt)g(t) dt (3.36)

=

∫ ∞

−∞exp(−j2πνt)

[∫ ∞

−∞f(τ)h(t− τ) dτ

]

dt (3.37)

Assuming that we can interchange the order of the integrations,

G(ν) =

∫ ∞

−∞f(τ)

[∫ ∞

−∞exp(−j2πνt)h(t− τ) dt

]

dτ (3.38)

51

The term in the brackets is the Fourier transform of h(t− τ). By the time shifting property,∫ ∞

−∞exp(−j2πνt)h(t− τ) dt = exp(−j2πντ)H(ν) (3.39)

so that

G(ν) =

∫ ∞

−∞f(τ) exp(−j2πντ)H(ν) dτ = F (ν)H(ν) (3.40)

Because of the symmetry in the forward and inverse Fourier transform relationships, we alsosee that if f(t) = f1(t) f2(t) then F (ν) = (F1 ∗ F2)(ν). i.e.,

f1(t) f2(t)↔ (F1 ∗ F2)(ν) (3.41)

3.3.12 Parseval’s theorem

Theorem:∫ ∞

−∞f∗(t)g(t) dt =

∫ ∞

−∞F ∗(ν)G(ν) dν (3.42)

Proof:∫ ∞

−∞f∗(t)g(t) dt =

∫ ∞

−∞f∗(t)

[∫ ∞

−∞G (ν) exp (j2πνt)

]

dt (3.43)

=

∫ ∞

−∞G(ν)

[∫ ∞

−∞f∗ (t) exp (j2πνt) dt

]

dν (3.44)

=

∫ ∞

−∞G(ν)

[∫ ∞

−∞f (t) exp (−j2πνt) dt

]∗

dν (3.45)

=

∫ ∞

−∞F ∗(ν)G(ν) dν (3.46)

The expression∫∞−∞ f∗(t)g(t) dt can be thought of as an inner product of the two functions

f(t) and g(t), and is written in Dirac notation as 〈f(t)|g(t)〉. Parseval’s theorem thus statesthat the inner product is invariant under a Fourier transformation.

〈f(t)|g(t)〉 = 〈F (ν)|G(ν)〉 (3.47)

Note however that there are some definitions of the Fourier transform in which differentscalings are used and for which the equality of the inner products has to be replaced by theproportionality of the inner products between the two spaces.

Parseval’s theorem can also be written in the alternative forms∫ ∞

−∞f(t)g(t) dt =

∫ ∞

−∞F (−ν)G(ν) dν =

∫ ∞

−∞F (ν)G(−ν) dν (3.48)

∫ ∞

−∞f(−t)g(t) dt =

∫ ∞

−∞f(t)g(−t) dt =

∫ ∞

−∞F (ν)G(ν) dν (3.49)

both of which follow from the behaviour of the Fourier transform under conjugation. Usingthe notation

〈f(t), g(t)〉 =

∫ ∞

−∞f(t)g(t) dt (3.50)

52

just as when we defined the functional induced by a locally integrable function, Parseval’stheorem may also be written as

〈f(t), g(t)〉 = 〈F (−ν), G(ν)〉 = 〈F (ν), G(−ν)〉 (3.51)

〈f(−t), g(t)〉 = 〈f(t), g(−t)〉 = 〈F (ν), G(ν)〉 (3.52)

As we shall see, these form the basis for defining the Fourier transform of generalized func-tions.

3.3.13 Energy invariance

Putting f(t) = g(t) in Parseval’s theorem gives

∫ ∞

−∞|f(t)|2 dt =

∫ ∞

−∞|F (ν)|2 dν (3.53)

Thus the energy (or its equivalent) is the same in each domain. This result is calledRayleigh’s theorem.

3.4 Examples of Fourier transforms

3.4.1 The rectangular pulse

f(t) = AΠ

(

t

T

)

=

A if |t| < T/20 otherwise

(3.54)

Using the definition of the Fourier transform

F (ν) =

∫ T/2

−T/2A exp(−j2πνt) dt (3.55)

= A

[

exp (−j2πνt)

−j2πν

]T/2

−T/2

(3.56)

= AT

(

sinπνT

πνT

)

(3.57)

= AT sinc(νT ) (3.58)

Notice that

1. The zeros of F (ν) are at integer multiples of 1/T . Thus the wider is the pulse in the tdomain, the narrower is the transform in the ν domain.

2. The value of F (0) is equal to the area under the graph of f(t). Similarly, the value off(0) is equal to the area under F (ν). These are general results which follow immediatelyfrom the definitions and are useful for checking.

F (0) =

∫ ∞

−∞f(t) dt and f(0) =

∫ ∞

−∞F (ν) dν (3.59)

53

3.4.2 Two pulses

f(t) =

1 if − T < t < 0−1 if 0 < t < T0 otherwise

(3.60)

This can be written as

f(t) = Π

(

t+ 12T

T

)

−Π

(

t− 12T

T

)

(3.61)

Using linearity, the shifting theorem and the previous result,

F (ν) = T exp

(

j2πνT

2

)

sinc(νT )− T exp

(

−j2πνT

2

)

sinc(νT ) (3.62)

=2j

πνsin2(πνT ) (3.63)

3.4.3 Triangular pulse

f(t) =

T + t

Tif − T < t < 0

T − tT

if 0 < t < T

0 otherwise

(3.64)

This is T−1 multiplied by the integral of the previous example. Since the area under thefunction in the previous example was zero, we can make use of (3.27) to conclude that

F (ν) =1

j2πνT

2j

πνsin2(πνT ) (3.65)

= T sinc2(νT ) (3.66)

Alternatively, we see that

f(t) =1

T(p ∗ p)(t) (3.67)

where p(t) = Π(t/T ). Using the convolution theorem,

F (ν) =1

T[T sinc(νT )]2 = T sinc2(νT ) (3.68)

3.4.4 The exponential pulse

f(t) = u(t) exp(−αt) (3.69)

Substituting into the definition of the Fourier transform,

F (ν) =

∫ ∞

0exp[(−α− j2πν)t] dt (3.70)

=1

α+ j2πν(3.71)

=α

α2 + 4π2ν2− j

2πν

α2 + 4π2ν2(3.72)

=1√

α2 + 4π2ν2exp

(

−j tan−1 2πν

α

)

(3.73)

54

3.4.5 The Gaussian

f(t) = exp(−αt2) (3.74)

Substituting into the definition of the Fourier transform,

F (ν) =

∫ ∞

−∞exp[−αt2 − j2πνt] dt (3.75)

=

∫ ∞

−∞exp

[

−α(

t2 + j2πνt

α

)]

dt (3.76)

Completing the square in the exponential,

F (ν) = exp

(

−π2ν2

α

)∫ ∞

−∞exp

[

−α(

t+ jπν

α

)2]

dt (3.77)

= exp

(

−π2ν2

α

)∫ ∞+jπν/α

−∞+jπν/αexp

(

−αu2)

du (3.78)

where we have substituted u = t+ jπν/α in the last integral, so that du = dt. It remains tocompute the integral which is a contour integral in the complex plane. To do this, considerthe integral over the following rectangular contour, where R is later going to be taken to ∞.

∫ R+jπν/α

−R+jπν/α+

∫ R

R+jπν/α+

∫ −R

R+

∫ −R+jπν/α

−Rexp

(

−αu2)

du (3.79)

where in each integral, the straight line path between the limits is taken. Since the integrandis analytic throughout the complex plane, Cauchy’s theorem states that the integral over anyclosed contour is zero. As R becomes large, the integrals along the lines [R+ jπν/α,R] and[−R,−R + jπν/α] become small since the integrand falls off rapidly while the length of thecontour stays fixed. Hence in the limit as R→∞, these integrals vanish and we find that

∫ ∞+jπν/α

−∞+jπν/αexp

(

−αu2)

du+

∫ −∞

∞exp

(

−αu2)

du = 0 (3.80)

or∫ ∞+jπν/α

−∞+jπν/αexp

(

−αu2)

du =

∫ ∞

−∞exp

(

−αu2)

du =

√

π

α(3.81)

Thus we see that the desired Fourier transform is

F (ν) =

√

π

αexp

(

−π2ν2

α

)

(3.82)

A convenient way of remembering this result is to apply the scaling property of Fouriertransforms to the special case

exp(−πt2)↔ exp(−πν2) (3.83)

Exercise: Find the Fourier transforms of the following functions. Sketch the functions and(the real and imaginary parts of) their Fourier transforms.

1. f(t) = exp(−α|t|)

55

2. f(t) = Π(t/n) sin(2πt)

3. f(t) = t3 exp(−αt2)

4. f(t) = exp(−α|t|) cos(βt)

5. f(t) =∑N

k=−N Π((t− k)/w)

3.5 Convergence of the Dirichlet integrals

(See T.M. Apostol, Mathematical Analysis for further details.)

In order to show that the inverse Fourier transform formula allows us to recover f(t), we needto consider the Dirichlet integral (3.6)

limM→∞

∫ ∞

−∞f(τ)

sin [2πM (t− τ)]π (t− τ) dτ (3.84)

The inverse Fourier transform at t converges to the value of this integral. Without loss ofgenerality, let us just consider the point t = 0. Given δ > 0 we can split the integration intothree intervals

limM→∞

∫ −δ

−∞f(τ)

sin (2πMτ)

πτdτ+ lim

M→∞

∫ δ

−δf(τ)

sin (2πMτ)

πτdτ+ lim

M→∞

∫ ∞

δf(τ)

sin (2πMτ)

πτdτ

(3.85)In the following, we show that the first and third integrals tend to zero wheras the secondintegral tends to f(0) if f is sufficiently well behaved.

3.5.1 The Riemann-Lebesgue lemma

Theorem: If f is absolutely integrable on a general interval (a, b) which may be bounded orunbounded,

limM→∞

∫ b

af(t) exp(j2πMt) dt = 0 (3.86)

Proof: It is a result of integration theory that any absolutely integrable function f can beapproximated arbitrarily accurately by a step function in the sense that for any ǫ > 0 thereis a step function s(t) such that

1. s(t) vanishes outside some bounded interval, and

2. the absolute difference between s(t) and f(t) satisfies

∫ b

a|s(t)− f(t)|dt < ǫ (3.87)

(Note: A function on (a, b) is a step function if there is a partition of (a, b) such that thefunction is constant on the open subintervals.)

We first show that the Riemann-Lebesgue lemma holds for a constant function on an arbitraryinterval and hence also for step functions. The relationship (3.87) is then used to extend theresult to all absolutely integrable functions.

56

1. If f(t) = c is a constant on (a, b),

limM→∞

∫ b

ac exp(j2πMt) dt = lim

M→∞

[

cexp(j2πMt)

j2πM

]b

a

= 0 (3.88)

since the numerator is bounded while the denominator becomes large. Note that thisproof works whether or not the interval is bounded.

2. If f(t) is a step function, it may be expressed as a sum of functions which are constanton disjoint intervals. The above proof may be applied to each of these disjoint intervals.Since the overall integral is the sum of the integrals over the intervals, the result is truefor step functions.

3. Now suppose that f(t) is absolutely integrable and that s(t) is a step function suchthat (3.87) holds

∣

∣

∣

∣

∫ b

af(t) exp(j2πMt) dt

∣

∣

∣

∣

≤∣

∣

∣

∣

∫ b

a[f(t)− s(t)] exp(j2πMt) dt

∣

∣

∣

∣

+

∣

∣

∣

∣

∫ b

as(t) exp(j2πMt) dt

∣

∣

∣

∣

≤∫ b

a|[f(t)− s(t)] exp(j2πMt)|dt+

∣

∣

∣

∣

∫ b


∣

∣

∣

∣

=

∫ b

a|f(t)− s(t)|dt+

∣

∣

∣

∣

∫ b


∣

∣

∣

∣

(3.89)

By (3.87) the first integral is less than ǫ and by the result for step functions the secondintegral tends to zero as M → ∞. Since ǫ can be chosen to be arbitrarily small, wehave established the required result.

From the Riemann-Lebesgue lemma, it is easy to see that for any absolutely integrablefunction f(t),

limM→∞

∫ b

af(t) cos(2πMt) dt = 0 (3.90)

and

limM→∞

∫ b

af(t) sin(2πMt) dt = 0 (3.91)

In equation (3.85), if f(τ) is absolutely integrable, then so is f(τ)/(πτ) on [δ,∞) and(−∞,−δ]. Thus, the Riemann-Lebesgue lemma may be applied to the first and third in-tegrals to show that they tend to zero as M →∞.

3.5.2 The integral over (−a, a) in (3.85)

It can be shown that if f(τ) is continuous and of bounded variation on (−a, a), the secondintegral in (3.85) tends to f(0). However, this is quite difficult (see Apostol for details).

(Technical note: It is a curious and important result from functional analysis that it is notsufficient for f to be continuous at 0, i.e., there exists an absolutely integrable functionwhich is continuous at 0 but whose value at 0 cannot be recovered from an inverse Fouriertransform.)

57

However if f(τ) is differentiable at 0, the result follows quite readily. We may write

limM→∞

∫ a

−af(τ)

sin (2πMτ)

πτdτ = lim

M→∞

∫ a

−a

f (τ)− f (0)

πτsin(2πMτ) dτ+f(0) lim

M→∞

∫ a

−a

sin (2πMτ)

πτdτ

(3.92)If f ′(0) exists, the first integral on the right-hand side tends to zero as M → ∞ by theRiemann-Lebesgue lemma provided a is sufficiently small. For this value of a, the secondterm is

f(0) limM→∞

∫ a

−a

sin (2πMτ)

πτdτ =

f (0)

π

∫ ∞

−∞

sinx

xdx = f(0) (3.93)

This establishes the desired result.

Note:

We have shown that if f is absolutely integrable and differentiable at zero,

limM→∞

∫ ∞

−∞f(τ)

sin (2πMτ)

πτdτ = f(0) (3.94)

Thus in particular, for any test function φ (which certainly satisfies these conditions),

limM→∞

⟨

sin (2πMτ)

πτ, φ(τ)

⟩

= φ(0). (3.95)

We may write this as a distributional limit

limM→∞

sin (2πMτ)

πτ= δ(τ) (3.96)

3.6 Fourier transforms of generalized functions

We now wish to extend Fourier transform theory to allow us to find the Fourier transforms ofgeneralized functions or distributions. To do this, it will turn out that we shall always wantour test functions to have invertible Fourier transforms (in the classical sense) and that theFourier transform of a test function should always be another test function. Since the Fouriertransform of a function that vanishes outside a bounded interval does not vanish outside abounded interval, we cannot use exactly the same set of test functions D that we introducedin the first chapter. Instead we define a new set of test functions (called open support testfunctions) D′ as follows:

A function φ is an open support test function if it is infinitely differentiable on the real line(i.e., it is C∞) and if for all integers k ≥ 0, the kth derivative of φ is rapidly decreasing,i.e., for all integers N ≥ 0,

tNφ(k)(t) (3.97)

is bounded for all t.

This means that all derivatives of φ tend to zero more quickly than |t|−N for any N as |t|becomes large.

Convergence in D′: A sequence of open support test functions φn(t) converges to zero

in D′ if for each pair of non-negative integers N and k, the sequence of functions tNφ(k)n (t)

approaches the zero function uniformly as n→∞.

58

Theorem: The Fourier transform of an open support test function is an open support testfunction.

Proof: Suppose that φ(t) is an open support test function.

1. It is easy to show (e.g. by the comparison test with 1/(1+t2)) that all rapidly decreasingfunctions are absolutely integrable and so the Fourier transform Φ(ν) exists.

2. In order to show that Φ(ν) is infinitely differentiable, we note that Φ(k)(ν) is the Fouriertransform of (−j2πt)kφ(t). Since φ(t) is rapidly decreasing, (−j2πt)kφ(t) is also rapidlydecreasing and so it has a Fourier transform.

3. In order to show that Φ(k)(ν) is rapidly decreasing, we need to show that for all non-negative natural numbers N , νNΦ(k)(ν) is bounded. This is proportional to the Fouriertransform of the N ’th derivative of (−j2πt)kφ(t). Since φ is C∞ and rapidly decreasing,the N ’th derivative of (−j2πt)kφ(t) is also C∞ and rapidly decreasing, which meansthat its Fourier transform is bounded for all ν.

An exactly analogous theory of distributions as in the first chapter can be built up using theclass of open support test functions, except that the locally integrable functions are replacedby locally integrable functions which are slowly increasing. A function f(t) is said to beslowly increasing if there exists some non-negative integer N such that f(t)/tN → 0 as |t| →∞. The distributions induced by this construction are known as tempered distributionsand they form a subset of the distributions considered in the first chapter.

From this point onwards, a “test function” refers to an open support test function, “general-ized functions” and “distributions” refer to tempered distributions.

3.6.1 Definition of the Fourier transform and its inverse

Let f(t) be a generalized function. The Fourier transform F (ν) is also a generalized functionwhose action on a test function Φ(ν) ∈ D′ is

〈F (ν),Φ(ν)〉 = 〈f(t), φ(−t)〉 (3.98)

where φ is the inverse Fourier transform of Φ. By the definition of D′, we know that φ is alsoa test function and so the action of f(t) on φ(−t) is well-defined.

Similarly, given a generalized function F (ν), the inverse Fourier transform is a generalizedfunction f(t) whose action on a test function φ(t) ∈ D′ is

〈f(t), φ(t)〉 = 〈F (ν),Φ(−ν)〉 (3.99)

Theorem: Using the above definitions, if F (ν) is the Fourier transform of f(t), then f(t) isthe inverse Fourier transform of F (ν).

Proof: Suppose that F (ν) is the Fourier transform of f(t) and that g(t) is the inverse Fouriertransform of F (ν). Let φ(t) ∈ D′ be a test function and Φ(ν) be its Fourier transform. Using(3.99), the action of g on φ is

59

〈g(t), φ(t)〉 = 〈F (ν),Φ(−ν)〉 (3.100)

Writing Ψ(ν) = Φ(−ν), and using (3.98),

〈F (ν),Ψ(ν)〉 = 〈f(t), ψ(−t)〉 (3.101)

where ψ(t) is the inverse Fourier transform of Ψ(ν). Using the symmetry properties of theclassical Fourier transform on the space of test functions we see that

ψ(−t)↔ Ψ(−ν) = Φ(ν)↔ φ(t) (3.102)

and so ψ(−t) = φ(t). This shows that 〈g(t), φ(t)〉 = 〈f(t), φ(t)〉 for all test functions φ andso f = g distributionally.

Notes:

1. The above definitions show that every generalized function has a Fourier transformwhich is another generalized function, and that the inverse Fourier transform alwaysrecovers the original distribution.

2. The duality result that f(t) ↔ F (ν) iff F (t) ↔ f(−ν) works without exception if weconsider generalized functions.

3. The definitions are consistent with (and motivated by) Parseval’s theorem and so theclassical Fourier transform of a generalized function which is in fact a well-behaved“ordinary” function is also a distributional Fourier transform in the above sense.

3.7 Examples of Fourier transforms of generalized functions

3.7.1 The delta function

If we substitute f(t) = δ(t) into the definition of the Fourier transform (3.1) and carry outthe formal operation of setting t = 0 in the exponential (ignoring the fact that exp(−j2πνt)is not in the class of test functions D′), we might expect that F (ν) = 1. Let us now see howthis is made rigorous through the use of (3.98).

According to the definition, F (ν) is a generalized function. It is well-defined provided thatwe can specify its action on a test function Φ(ν). Using the definition,

〈F (ν),Φ(ν)〉 = 〈δ(t), φ(−t)〉 (3.103)

= φ(0) =

∫ ∞

−∞Φ(ν) dν = 〈1,Φ(ν)〉 (3.104)

Since this is true for all Φ ∈ D′, F (ν) = 1 distributionally.

Similarly, if f(t) = δ(t− T ),

〈F (ν),Φ(ν)〉 = 〈δ(t − T ), φ(−t)〉 (3.105)

= φ(−T ) =

∫ ∞

−∞Φ(ν) exp[j2πν(−T )] dν = 〈exp(−j2πνT ),Φ(ν)〉 (3.106)

60

Hence we may write the Fourier transform pair

δ(t− T )↔ exp(−j2πνT ) (3.107)

Exercise: Show using (3.98) that if g(t) = f(t− T ) then G(ν) = F (ν) exp(−j2πνT ) so thatthe above is a special case of this relationship.

3.7.2 The complex exponential

By duality we expect the Fourier transform of f(t) = exp(j2πν0t) to be δ(ν − ν0). We canalso see this directly since for any test function Φ,

〈F (ν),Φ(ν)〉 = 〈exp(j2πν0t), φ(−t)〉 (3.108)

=

∫ ∞

−∞exp(j2πν0t)φ(−t) dt = Φ(ν0) = 〈δ(ν − ν0),Φ(ν)〉 (3.109)

In this case, if we had formally substituted f(t) into the original definition of the Fouriertransform, we would have obtained

F (ν) =

∫ ∞

−∞exp(j2π(ν0 − ν)t) dt

This improper integral does not have a limit in the usual sense. However, on the basis of theabove, we may formally write

∫ ∞

−∞exp(j2π(ν0 − ν)t) dt = δ(ν − ν0)

provided that this is understood in the distributional sense.

3.7.3 Cosine and sine

By writing the cosine and sine functions in terms of complex exponentials we see that

1. cos(2πν0t)↔1

2[δ(ν − ν0) + δ(ν + ν0)]

2. sin(2πν0t)↔1

2j[δ(ν − ν0)− δ(ν + ν0)]

Note that the cosine function is real and even and so its Fourier transform is Hermitian andeven (and consequently real). On the other hand, the sine function is real and odd and so itsFourier transform is Hermitian and odd (and consequently purely imaginary).

61

3.7.4 The signum function

This is defined by

sgn t =

1 if t > 0−1 if t < 0

(3.110)

One way of calculating the Fourier transform of this function is by considering it as the(distributional) limit of a sequence of functions. This relies on the following theorem:

Theorem: If fk is a convergent sequence of generalized functions which converge to thegeneralized function f , then the sequence of Fourier transforms Fk converge distributionallyto the Fourier transform F of f .

Proof: Do as an exercise using the definitions of Fourier transforms and limits of generalizedfunctions.

Consider the sequence of functions

fk(t) =

exp(−t/k) if t > 0− exp(t/k) if t < 0

(3.111)

As k → ∞ it is easy to see that this tends to sgn t pointwise and distributionally. For eachk, fk(t) is absolutely integrable and so its Fourier transform exists as an ordinary function

Fk(ν) =1

j2πν + k−1− 1

j2π (−ν) + k−1(3.112)

= − j4πν

k−2 + 4π2ν2(3.113)

As k →∞ this tends to 1/(jπν). Thus we have the transform pair

sgn t↔ 1

jπν(3.114)

3.7.5 The unit step – Fourier transform of an integral

Since the unit step u(t) may be written in terms of the signum function

u(t) =1

2(sgn(t) + 1) (3.115)

we may use linearity and the previous result to obtain the Fourier transform pair

u(t)↔ 1

j2πν+

1

2δ(ν) (3.116)

Exercise: Consider the sequence of functions fk(t) = u(t) exp(−t/k) which tends distribu-tionally to u(t) as k →∞. Calculate Fk(ν) and show that these tend distributionally to theFourier transform of u(t). Hint: Consider the real and imaginary parts of Fk(ν) separately.(Technical note: This shows that pointwise convergence does not imply distributional con-vergence.)

62

Recall that given a function f(t), the integral function

g(t) =

∫ t

−∞f(τ) dτ (3.117)

can be written as the convolution (f ∗u)(t). By the convolution theorem and the above result,the Fourier transform of g(t) is

G(ν) = F (ν)

(

1

j2πν+

1

2δ(ν)

)

=F (ν)

j2πν+

1

2F (0)δ(ν) (3.118)

This is the promised generalization of (3.27) which is required when F (0) is non-zero.

Exercise: What are the Fourier transforms of sgn t cos(2πν0t), u(t) sin(2πν0t) and of tk?

63

4

Sampling in Frequency and Time

4.1 Introduction

As a major application of using Fourier transforms of generalized functions, we consider theproblems of Fourier series and of representing a continuous waveform by uniformly sampling itin time. Fourier series are used in the analysis of periodic waveforms while sampling theory isimportant in computer-aided processing of sampled physical data (other applications includecompact disc players, digital audio etc.) Rather surprisingly, these topics are related and thetheory is based on finding the Fourier transform of a train of delta functions.

4.2 Fourier transform of a train of delta functions

4.2.1 A train of 2N + 1 delta functions

Consider the generalized function consisting of 2N + 1 equally-spaced delta functions sepa-rated by time T

fN(t) =

N∑

k=−N

δ(t− kT ) (4.1)

The Fourier transform is found directly via linearity and the result for the Fourier transformof a single delta function

FN (ν) =

N∑

k=−N

exp(−j2πνkT ) (4.2)

=sin[

2π(

N + 12

)

νT]

sin (πνT )(4.3)

where we have summed the geometric series and written the complex exponential in trigono-metric form. We note that

1. This is a “periodic” function (in ν) with period 1/T

2. The major peaks are at k/T for integer k and the heights of the peaks are 2N + 1.

3. The zeros of the function are at ν = k/[(2N + 1)T ] where k is any integer not divisibleby 2N + 1.

65

4.2.2 An infinite train of delta functions

Let us now consider what happens as N →∞. The sequence fN (t) tends in the distributionalsense to

f(t) =

∞∑

k=−∞

δ(t− kT ) (4.4)

You should check that this distributional limit does in fact hold by considering the action ona test function in D′. Note that the definition of D′ ensures that all results are well-definedand never infinite.

The Fourier transform of f(t) is then the distributional limit of FN (ν). Since FN is periodicwith period 1/T for every value of N , this periodicity is shared by the limit F . It thus sufficesto consider a single period from −1/(2T ) to 1/(2T ).

We look at the action of FN (ν) on a test function Φ(ν) in the interval −1/(2T ) to 1/(2T ),i.e.,

∫ 1/(2T )

−1/(2T )

sin[

2π(

N + 12

)

νT]

sin (πνT )Φ(ν) dν (4.5)

The denominator is nonzero except at ν = 0. Thus as N → ∞, the Riemann-Lebesguelemma ensures that the contribution to the integral vanishes except within a small interval(−δ, δ) around ν = 0. Since the test function is assumed to be differentiable at zero, there isa neighbourhood in which its value is not appreciably different from Φ(0). The integral maybe approximated by

Φ(0) limN→∞

∫ δ

−δ

sin[

2π(

N + 12

)

νT]

sin (πνT )= Φ(0) lim

N→∞

1

(2N + 1) πT

∫ (2N+1)πTδ

−(2N+1)πTδ

sinu

sin [u/ (2N + 1)]du

(4.6)AsN becomes large, we may approximate sin[u/(2N+1)] by u/(2N+1). Since

∫∞−∞ sin(u)/udu =

π, the result is Φ(0)/T .

Thus within the interval −1/(2T ) to 1/(2T ), FN (ν) → δ(ν)/T . Using the periodicity ofFN (ν), we obtain the transform pair

∞∑

k=−∞

δ(t− kT )↔ 1

T

∞∑

k=−∞

δ

(

ν − k

T

)

(4.7)

Thus the Fourier transform of an infinite train of delta functions is also an infinite train ofdelta functions. The spacing in ν space is the reciprocal of the spacing in t space.

4.3 The Fourier transform of a periodic signal

Consider a function fp(t) which is periodic with period T . Define f(t) to be a single periodof fp(t) and which is zero outside the range [−T/2, T/2], i.e.,

f(t) =

fp(t) if − T/2 < t < T/20 otherwise

(4.8)

66

We see that fp(t) may be regarded as the convolution of f with an infinite train of deltafunctions

fp(t) =

∞∑

k=−∞

f(t− kT ) = (f ∗ h)(t) (4.9)

where

h(t) =∞∑

k=−∞

δ(t− kT ) (4.10)

Taking the Fourier transform and using the convolution theorem

Fp(ν) = F (ν)H(ν) =1

TF (ν)

∞∑

k=−∞

δ

(

ν − k

T

)

(4.11)

=1

T

∞∑

k=−∞

F

(

k

T

)

δ

(

ν − k

T

)

(4.12)

Thus the spectrum of fp(t) consists of discrete components at multiples of 1/T . The compo-nent of the periodic signal at frequency 1/T is called the fundamental and that at frequencyk/T is called the k’th harmonic. We see that this harmonic relationship is a consequenceof the periodicity of the original signal.

Let us now consider the form of the inverse Fourier transform which recovers fp(t) from Fp(ν):

fp(t) =

∫ ∞

−∞Fp(ν) exp(j2πνt) dν (4.13)

=

∫ ∞

−∞

1

T

∞∑

k=−∞

F

(

k

T

)

δ

(

ν − k

T

)

exp(j2πνt) dν (4.14)

=∞∑

k=−∞

1

TF

(

k

T

)

exp

(

j2πk

Tt

)

(4.15)

=∞∑

k=−∞

ck exp

(

j2πk

Tt

)

(4.16)

where

ck =1

TF

(

k

T

)

(4.17)

=1

T

∫ T/2

−T/2f(t) exp

(

−j2πk

Tt

)

dt (4.18)

where we have used the fact that f(t) vanishes outside [−T/2, T/2].We recognize (4.16) as a Fourier series and (4.18) as the equations for the coefficients. Byusing the theory of generalized functions, we see that Fourier series can be recovered naturallyfrom the theory of Fourier transforms.

Example: Find the Fourier series for fp(t) which is periodic of period T and which is definedwithin [−T/2, T/2] by

fp(t) = 1− 2 |t|T

if − T

2< t <

T

2(4.19)

67

The function f(t) is defined by

f(t) =

1− 2|t|T if − T

2 < t < T2

0 otherwise(4.20)

The Fourier transform of f(t) is

F (ν) =T

2sinc2

(

νT

2

)

(4.21)

The coefficients of the Fourier series are thus

ck =1

TF

(

k

T

)

(4.22)

=1

2sinc2

(

k

2

)

(4.23)

=

12 if k = 0

2

π2k2if k is odd

0 otherwise

(4.24)

The Fourier series is thus

fp(t) =1

2+

∞∑

n=1

4

π2(2n− 1)2cos

(

2π (2n− 1) t

T

)

(4.25)

Note that if we set t = 0, we find that

1 +1

32+

1

52+

1

72+ ... =

π2

8(4.26)

Exercises:

1. Use this to show that

1 +1

22+

1

32+

1

42+ ... =

π2

6(4.27)

This is the value of the Riemann-zeta function at 2, denoted by ζ (2) .

2. Find the Fourier series of the following functions which are periodic of period T

gp(t) =t

Tif 0 < t < T (4.28)

hp(t) =

1 if |t| < T/40 if T/4 < |t| < T/2

(4.29)

68

4.4 Sampling in time

Often a function is known only by its values at particular (usually equally spaced) times.This is the case when we regularly sample a continuous-time waveform (e.g. digitized music).Mathematically we may represent this process of sampling at equally spaced points in timeas multiplying the continuous-time function f(t) by a train of delta functions to give a new(generalized) function fs(t).

fs(t) = f(t)

∞∑

k=−∞

δ(t − kT ) =

∞∑

k=−∞

f(kT )δ(t− kT ) (4.30)

The second form shows explicitly that the values of f between sampling instants do not affectfs(t).

Corresponding to this product in the time domain, the spectrum F (ν) of f(t) is convolvedby the spectrum of the train of delta functions. Using the transform pair

∞∑

k=−∞

δ(t− kT )↔ 1

T

∞∑

k=−∞

δ

(

ν − k

T

)

(4.31)

we see that the Fourier transform Fs(ν) of fs(t) is the convolution

Fs(ν) = F (ν) ∗ 1

T

∞∑

k=−∞

δ

(

ν − k

T

)

(4.32)

=1

T

∞∑

k=−∞

F

(

ν − k

T

)

(4.33)

The resulting spectrum now spans an infinite range of frequencies with repetitions of theoriginal spectrum every νs = 1/T apart.

If the original spectrum of f(t) is band-limited to ±νc and if νc <12νs, the repetitions of the

spectra will not overlap and we can recover the original spectrum F (ν) from Fs(ν).

On the other hand, if if νc >12νs, the repetitions of the spectra will overlap and the original

spectrum is lost.

This tells us how rapidly we need to sample a waveform in order to allow the continuous-timewaveform from the samples. These considerations are made precise in the following section.

4.5 The sampling theorem

If a function f(t) is band limited so that its Fourier transform F (ν) vanishes for |ν| ≥ νc,then f(t) can be completely reconstructed from its values sampled at intervals of T providedthat νs = T−1 ≥ 2νc.

Proof: The set of samples of f(t) taken with a sampling interval of T can be represented bythe function fs(t) as defined by (4.30). The spectrum Fs(ν) is given by (4.33). If T−1 ≥ 2νc,

69

the copies of the spectrum do not overlap and it is possible to recover the original spectrumF (ν) from Fs(ν) via

F (ν) = TFs(ν)Π (νT ) (4.34)

By calculating the inverse Fourier transform of this relationship, we can recover f(t) fromfs(t). Using the transform pair

sinc

(

t

T

)

↔ TΠ(νT ) (4.35)

we find that

f(t) = fs(t) ∗ sinc

(

t

T

)

(4.36)

=

∫ ∞

−∞fs(τ) sinc

(

t− τT

)

dτ (4.37)

Substituting the expression for fs(τ) as a sum of delta functions,

f(t) =

∫ ∞

−∞

(

∞∑

k=−∞

f (kT ) δ (τ − kT )

)

sinc

(

t− τT

)

dτ (4.38)

=∞∑

k=−∞

f(kT ) sinc

(

t− kTT

)

(4.39)

This is the desired formula which expresses f(t) for any time t in terms of the values of f atthe sample instants kT .

1. The reconstructed value of f(t) is a weighted mean of all the sample values f(kT ). Theweight associated with those samples which are far away from t decreases according tothe sinc function. If t happens to be a multiple of T the sinc function has a weight ofone at that sample point and zero for all other samples.

2. In the frequency domain, the reconstruction of f(t) from fs(t) involves an ideal low-passfiltering which removes all frequencies above half the sampling frequency. The impulseresponse of such an ideal low-pass filter is a sinc function, indicating that the filteris non-causal. In practice, we can only do this filtering approximately. Compact discplayers use a combination of digital and analogue techniques (oversampling by digitalinterpolation followed by analogue postfiltering) to do this reconstruction.

3. The highest frequency which can be reconstructed from its samples using (4.39) is12νs = 1/(2T ). This is called the Nyquist frequency. When digitizing a signal, allfrequencies outside the Nyquist frequency band (−1

2νs,12νs) must be removed (using an

analogue filter) before the digitization. If this is not done, any frequency componentsν outside the Nyquist frequency band (i.e., |ν| > 1

2νs) will reappear in the reconstruc-tion at a frequency ν + mνs where m is an integer such that |ν + mνs| ≤ 1

2νs. Thisphenomenon is called aliasing.

4. Aliasing can sometimes be usefully employed to make one frequency appear like another.It is the principle used by the stroboscope in which a system is sampled at such a ratethat its motion is apparently slowed down or “frozen” (e.g., the angular frequency of arotating shaft is aliased to zero frequency) by the sampling.

70

5. Equation (4.39) can be interpreted as saying that a band limited signal can be writtenas a linear combination of sinc functions. Thus the (countable) family of functions

sinc

(

t− kTT

)∞

k=−∞

(4.40)

may be regarded as a set of basis functions for the space of bandlimited functions whosespectra vanish outside (−1/(2T ), 1/(2T )).

Exercise:

1. Show that these basis functions are orthogonal. In fact,

∫ ∞

−∞sinc

(

t−mTT

)

sinc

(

t− nTT

)

dt = T δmn (4.41)

Hence show that for bandlimited functions f(t) and g(t) such that F (ν) = 0 andG(ν) = 0 for |ν| ≥ 1/(2T ),

∫ ∞

−∞f∗(t)g(t) dt = T

∞∑

k=−∞

f∗(kT )g(kT ) (4.42)

assuming that the appropriate integrals and sums converge. Thus we can calculateinner products of bandlimited signals in terms of their sampled values.

2. Instead of ideal reconstruction of a sampled signal by means of convolution with a sincfunction, reconstruction using a zero-order hold is commonly used. This involvesholding the value of the signal constant at the value of the last sample until the nextsample arrives so that if f(kT ) are the sample values, the reconstruction using zero-order hold is

fZOH(t) = f(kT ) for kT ≤ t < (k + 1)T (4.43)

Show that fZOH(t) is related to fs(t) via a convolutional relationship and hence calcu-late the spectrum FZOH(ν) of the reconstruction.

4.6 Bernstein’s Theorem

If f(t) is a bounded, bandlimited function, i.e.,

1. There exists fmax such that |f(t)| ≤ fmax for all t, and

2. There exists νmax such that the Fourier transform F (ν) of f(t) vanishes for |ν| > νmax.

then

|f ′(t)| ≤ 2πνmaxfmax for all t (4.44)

This is a quantitative expression of the idea that bandlimited functions cannot change tooquickly. The function cos(2πνmaxt+ φ) shows that the bound is achieved.

71

Proof:

The sampling theorem shows that for T = 1/(2νmax) and for any τ we can write

f(t) =

∞∑

k=−∞

f(kT + τ) sinc

(

t− τ − kTT

)

(4.45)

(if this is not obvious, consider the bandlimited function g(t) = f(t+ τ) and write Eq. 4.39for this function).

The derivative of this is

f ′(t) =1

T

∞∑

k=−∞

f(kT + τ) sinc′(

t− τ − kTT

)

(4.46)

where sinc′(x) = [πx cos(πx)− sin(πx)]/(πx2). Notice how we have expressed the derivativeof f in terms of equally-spaced samples of f itself.

We now choose τ = t+ 12T . With this choice

f ′(t) =1

T

∞∑

k=−∞

f(

t+ kT + 12T)

sinc′(

−k − 12

)

(4.47)

and so

|f ′(t)| ≤ 1

T

∞∑

k=−∞

∣

∣f(

t+ kT + 12T)∣

∣

∣

∣sinc′(

−k − 12

)∣

∣ (4.48)

≤ fmax

T

∞∑

k=−∞

∣

∣sinc′(

−k − 12

)∣

∣ (4.49)

But∞∑

k=−∞

∣

∣sinc′(

−k − 12

)∣

∣ =1

π

∞∑

k=−∞

(

k + 12

)−2=

4

π

∞∑

k=−∞

1

(2k + 1)2= π (4.50)

where we have used the sum in Eq. (4.26).

Hence

|f ′(t)| ≤ π

Tfmax = 2πνmaxfmax (4.51)

4.7 The discrete-time Fourier transform

We have seen in (4.33) that the Fourier transform Fs(ν) of a sampled function

fs(t) =∞∑

k=−∞

f(kT )δ(t− kT ) (4.52)

can be written as a convolution of F (ν) with a train of delta functions. We may alternativelycalculate the Fourier transform of fs(t) directly by substituting it into the definition. This

72

yields

Fs(ν) =

∫ ∞

−∞

∞∑

k=−∞

f(kT )δ(t − kT ) exp(−j2πνt) dt (4.53)

=∞∑

k=−∞

f(kT ) exp(−j2πνkT ) (4.54)

Instead of regarding f(kT ) as a sampled version of some continuous-time function f(t), wemay think of them simply as a sequence of numbers. This leads to the definition

Definition: The discrete-time Fourier transform (DTFT) maps a sequence of numbersx[k] for integer k into a continuous function X(Ω) defined by

X(Ω) =

∞∑

k=−∞

x[k] exp(−jΩk) (4.55)

The function X(Ω) is periodic with period 2π. The inverse discrete-time Fourier transformrelationship is

x[k] =1

2π

∫ π

−πX(Ω) exp(jΩk) dΩ (4.56)

Proof: Substituting the expression for X(Ω) into (4.56) yields

1

2π

∫ π

−π

∞∑

n=−∞

x[n] exp(−jΩn) exp(jΩk) dΩ =1

2π

∞∑

n=−∞

x[n]

∫ π

−πexp[jΩ(k − n)] dΩ (4.57)

=1

2π

∞∑

n=−∞

2π sinc(k − n)x[n] (4.58)

= x[k] (4.59)

Exercises:

1. If the sequence x[k] does happen to come from sampling a bandlimited signal, i.e.,x[k] = f(kT ), show that we can recover the inverse DTFT relationship (4.56) from theusual inverse Fourier transform together with the relationship (4.34).

2. Show that the DTFT of the sequence

x[k] = exp(−α|k|) (4.60)

is

X(Ω) =1− e−2α

1− 2e−α cos Ω + e−2α(4.61)

3. Use the inverse DTFT relationship (4.56) to recover x[k] from (4.61).

73

5

The Discrete Fourier Transform

5.1 Definition

When computing spectra on a computer it is not possible to carry out the integrals involved inthe continuous time Fourier transform. Instead a related transform called the discrete Fourier

transform is used. We shall examine the properties of this transform and its relationship tothe continuous time Fourier transform.

The discrete Fourier transform (also known as the finite Fourier transform) relates two finitesequences of length N . Given a sequence with components x[k] for k = 0, 1, ..., N − 1, thediscrete Fourier transform of this sequence is a sequence X[r] for r = 0, 1, ..., N − 1 definedby

X[r] =1

N

N−1∑

k=0

x[k] exp

(

− j2πrk

N

)

(5.1)

The corresponding inverse transform is

x[k] =

N−1∑

r=0

X[r] exp

(

j2πrk

N

)

(5.2)

Notice that we shall adopt the definition in which there is a factor of 1/N in front of theforward transform. Other conventions place this factor in front of the inverse transform (thisis used by MatLab) or a factor of 1/

√N in front of both transforms.

Exercise: Show that these relationships are indeed inverses of each other. It is useful first toestablish the relationship

N−1∑

r=0

exp

(

j2πrk

N

)

=

N if k is a multiple of N0 otherwise

(5.3)

5.2 Discrete Fourier transform of a sampled complex

exponential

A common application of the discrete Fourier transform is to find sinusoidal componentswithin a signal. The continuous Fourier transform of exp(j2πν0t) is simply a delta function

75

δ(ν − ν0) at the frequency ν0. We now calculate the discrete Fourier transform of a sampledcomplex exponential

x[k] = A exp (j2πν0kτ) , for k = 0, 1, . . . , N − 1 (5.4)

The sampling interval is τ and we denote the duration of the entire sampled signal by T = Nτ .

Substituting (5.4) into the definition of the discrete Fourier transform (5.1) yields

X[r] =1

N

N−1∑

k=0

A exp

(

j2πk

N

)

[ν0T − r] (5.5)

If ν0T is an integer, i.e., if there are an integer number of cycles in the frame of duration T ,we see that

X[r] =

A if r − ν0T is a multiple of N0 otherwise

(5.6)

In this case, there is only a single non-zero term in the discrete Fourier transform at anindex which depends on the frequency ν0. The value of this non-zero term is A, the complexamplitude of the component.

If ν0T is not an integer however, we find that

X[r] =A

Nexp (−jπ(N − 1)(r − ν0T )/N)

sin[π(r − ν0T )]

sin[π(r − ν0T )/N ](5.7)

This is nonzero for all values of r. Thus even though there is only a single frequency compo-nent in the signal, it can affect all the terms in the discrete Fourier transform.

A plot of the last factor in the equation (5.7) is shown in Figure 5.1. Since r only takeson integer values, the terms in the output sequence are samples from this function taken atunit spacing. Unless the centre of the function at ν0T coincides with an integer, all sampleswill be non-zero. The sidelobes of the underlying “circular sinc” function cause the singlefrequency component to be smeared out into adjacent samples. This phenomenon is knownas “spectral leakage”.

Figure 5.1: Values of discrete Fourier transform of a complex exponential when ν0T is not aninteger. Crosses show values of sin [π (r − ν0T )] / sin [π (r − ν0T ) /N ] for integer values of r.

76

Note that if N is large and we are considering those components for which r − ν0T ≪ N ,we can approximate sin[π(r − ν0T )/N ] by π(r − ν0T )/N and (N − 1)/N ≈ 1 so that (5.7)becomes

X[r] ≈ A exp (−jπ(r − ν0T )) sinc(r − ν0T ) (5.8)

5.3 Sinusoidal and Cosinusoidal components

Suppose that we have

x[k] = A cos

(

2πmk

N

)

for k = 0, 1, . . . , N − 1 (5.9)

where m is an integer lying between 1 and N/2 − 1 so that there are m cycles within thesequence of length N . Writing this in terms of complex exponentials, we have

x[k] =1

2A exp

(

j2πmk

N

)

+1

2A exp

(−j2πmk

N

)

(5.10)

Taking the discrete Fourier transform of this leads to two nonzero terms in X[r]. The firstoccurs at r = m and is of amplitude A/2 corresponding to the first complex exponentialwhereas the second is at r = N −m since

X[N −m] =1

N

N−1∑

k=0

[

1

2A exp

(

j2πmk

N

)

+1

2A exp

(−j2πmk

N

)]

exp

(−j2π(N −m)k

N

)

=A

2N

N−1∑

k=0

[

exp

(

j2πmk

N

)

+ exp

(−j2πmk

N

)]

exp

(

j2πmk

N

)

=A

2N

N−1∑

k=0

[

exp

(

j4πmk

N

)

+ 1

]

=A

2(5.11)

Similarly if

x[k] = B sin

(

2πmk

N

)

for k = 0, 1, . . . , N − 1 (5.12)

it is easy to show that there are again two nonzero elements in X[r], namely

X[r] =

−jB/2 if r = mjB/2 if r = N −m0 otherwise

(5.13)

We can interpret X[1] through X[N/2] as representing positive frequencies while X[N/2+ 1]through X[N − 1] represent negative frequencies.

So if

f(t) = A0 +A1 cos(ω0t) +A2 cos(2ω0t) + ...AN/2 cos(Nω0t/2)+

B1 sin(ω0t) +B2 sin(2ω0t) + ...BN/2−1 cos((N/2 − 1)ω0t) (5.14)

77

where ω0 = 2π/T and x[k] = f [kT/N ] for k = 0, 1, ..., N − 1, the discrete Fourier transformX[r] satisfies

X0 = A0

X1 =1

2(A1 − jB1) XN−1 =

1

2(A1 + jB1)

...

XN/2 = AN/2

5.4 Relationship to the continuous time Fourier transform

There are two distinct but related ways of expressing the discrete Fourier transform to thecontinuous time Fourier transform. In the first approach, starting with the sequence withterms x[k] for k = 0, 1, ..., N − 1 we define the function

xc(t) =N−1∑

k=0

x[k]δ(t − kτ) (5.15)

where τ is the sampling interval. The continuous time Fourier transform of xc(t) is given by

Xc(ν) =

∫ ∞

−∞

N−1∑

k=0

x[k]δ(t − kτ) exp(−j2πνt) dt

=

N−1∑

k=0

x[k] exp(−j2πνkτ) (5.16)

Xc(ν) is periodic with period 1/τ . Comparing this with the definition of the discrete Fouriertransform we see that

X[r] =1

NXc

( r

Nτ

)

(5.17)

So in this approach the discrete Fourier transform is interpreted as a sampled version of thecontinuous Fourier transform of xc(t).

In the alternative view, we consider the continuous time signal which is the infinite periodicextension of xc(t), i.e.,

xp(t) =

∞∑

k=−∞

x[k mod N ]δ(t − kτ) (5.18)

where k mod N denotes the non-negative remainder when k is divided by N . Since xp(t) isthe convolution of xc(t) with

∑

r δ(t− rNτ), the Fourier transform of xp(t) is

Xp(ν) = Xc(ν)×1

Nτ

∞∑

r=−∞

δ(

ν − r

Nτ

)

=1

Nτ

∞∑

r=−∞

Xc

( r

Nτ

)

δ(

ν − r

Nτ

)

=1

τ

∞∑

r=−∞

X[r mod N ]δ(

ν − r

Nτ

)

(5.19)

78

In this alternative approach, the terms in the discrete Fourier transform are seen to give theareas of the delta functions in Xp(ν). This picture has the advantage of symmetry betweenthe time and frequency domains since the functions xp(t) and Xp(ν) are each sampled andperiodic. In particular

In the time domain, the sampling interval is τ and the function repeatsevery Nτ

In the frequency domain, the sampling interval is 1/ (Nτ) and the function repeatsevery 1/τ

This reciprocal relationship between the sampling in the time and frequency spaces of thediscrete Fourier transform is very useful and should be memorized.

If x[k] originally comes from sampling a continuous signal x(t), i.e., x[k] = x(kτ), it is easyto relate xc(t) to x(t) since

xc(t) =

N−1∑

k=0

x(kτ)δ(t − kτ)

= x(t)×Π

(

t− (N − 1)τ/2

Nτ

)

×∞∑

k=−∞

δ(t− kτ)

The Fourier transform of this thus involves two convolutions

Xc(ν) = X(ν) ∗ T sinc(νT ) exp(−jπ(N − 1)νT/N) ∗ 1

τ

∞∑

k=−∞

δ

(

ν − k

τ

)

(5.20)

where T = Nτ . The first convolution is responsible for the spectral leakage whereas thesecond is responsible for aliasing.

We see that spectral leakage arises from taking the Fourier transform of a rectangular window

Π(

t−(N−1)τ/2Nτ

)

which multiplies the original function x(t). In the frequency domain, its effect

is to convolve the true spectrum with a sinc function which has relatively large sidelobes. It ispossible to reduce the spectral leakage considerably by introducing a non-rectangular windowfunction which has a Fourier transform with small sidelobes. For example, the Hanning

window is defined on [0, Nτ ] by

h(t) =

1− cos

(

2πt

Nτ

)

if 0 < t < Nτ

0 otherwise(5.21)

The Fourier transform of the Hanning window and the rectangular window are shown inFigure 5.2. For the purposes of drawing the graph, each window is shifted so that it is anreal even function which leads to a real even Fourier transform. The sidelobes of the Hanningwindow are much smaller than those of the rectangular window although its central peakis somewhat wider. In order to use a Hanning window, it is only necessary to define thesequence x[k] whose discrete Fourier transform we compute by

x[k] = h(kτ)x(kτ) = (1− cos (2πk/N))x(kτ) (5.22)

Most digital spectrum analyzers have the option of selecting some form of window to reducespectral leakage.

79

Figure 5.2: Fourier transform of a rectangular window and a Hanning window showing thereduction of spectral leakage with the Hanning window

5.5 The fast Fourier transform (FFT) algorithm

The discrete Fourier transform of a sequence ofN points requiresO(N2) arithmetic operationsto compute if a straightforward implementation of its definition is carried out. For large N ,this can become prohibitive. In 1965, Cooley and Tukey rediscovered (it was previouslyused by Gauss and Runge) a very efficient way of calculating the discrete Fourier transformwhich involves O(N logN) operations. This is known as the fast Fourier transform (FFT)algorithm.

Many variants of the FFT algorithm exist. We shall discuss the simplest form known asthe decimation in time algorithm. The central insight which leads to this algorithm is therealization that a discrete Fourier transform of a sequence of N points can be written in termsof two discrete Fourier transforms of length N/2. Thus if N is a power of two, it is possibleto recursively apply this decomposition until we are left with discrete Fourier transforms ofsingle points.

Consider an unnormalized discrete Fourier transform of N points which we can write as

X[r] =

N−1∑

k=0

x[k]W−rkN (5.23)

where WN = exp(j2π/N). The normalization factor 1/N can always be applied at the endof the algorithm

We split the sum into terms with even indices and with odd indices yielding

X[r] =

N/2−1∑

k=0

x[2k]W−2rkN +

N/2−1∑

k=0

x[2k + 1]W−r(2k+1)N (5.24)

80

Using the fact that W 2N = WN/2, we may write this as

X[r] =

N/2−1∑

k=0

x[2k]W−rkN/2 +W−r

N

N/2−1∑

k=0

x[2k + 1]W−rkN/2

= A[r] +W−rN B[r] (5.25)

where A[r] is the N/2 point Fourier transform of the even terms of x and Br is the N/2 pointFourier transform of the odd terms of x. The above result is valid for r = 0, 1, ..., N/2 − 1.In order to obtain the rest of X we note that

X[r +N/2] =

N/2−1∑

k=0

x[2k]W−(r+N/2)kN/2 +W

−(r+N/2)N

N/2−1∑

k=0

x[2k + 1]W−(r+N/2)kN/2

=

N/2−1∑

k=0

x[2k]W−rkN/2 −W

−rN

N/2−1∑

k=0

x[2k + 1]W−rkN/2

= A[r]−W rNB[r] (5.26)

Equations (5.25) and (5.26) express the N point FFT of x[k] in terms of two FFTs of lengthN/2.

Consider the calculation for the situation in which N = 8. The diagram in Figure 5.3 iscalled a “butterfly diagram” and it shows the flow of data through the algorithm. Startingat the right, notice how the terms X[0] through X[3] are calculated from A[r] and B[r] viathe relationship (5.25). Similarly, X[4] through X[7] are calculated from A[r] and B[r] viathe relationship (5.26). A[r] is the sequence which is the fast Fourier transform of the eventerms of x[k] namely a[k] = x[0], x[2], x[4], x[6]. Similarly, B[r] is the sequence which is thefast Fourier transform of the odd terms of x[k] namely b[k] = x[1], x[3], x[5], x[7].We can similarly express A[r] and B[r] in terms of fast Fourier transforms of length 2. Movingto the left in the diagram, we have denoted by C[r] and D[r] the two point Fourier transformsthat contribute to A[r]. We see that C[r] is the Fourier transform of the even points of a[k]which are a[0] = x[0] and a[2] = x[4]. These are the “even of evens” of the sequence x[k].Similarly D[r] is the Fourier transform of the odd points of a[k] or the “odds of evens” of thesequence x[k]. These are a[1] = x[2] and a[3] = x[6]. These two point transforms are finallyexpressed as sums and differences of the single point transforms which are the original data.

All calculations can be done in-place if the original data is shuffled into the correct orderinitially. If we look at the binary form of the indices of the data points, it is clear thatthe data must be arranged in bit-reversed order. The complex exponentials required in thecalculation may be precomputed and accessed by table lookup in order to speed up thealgorithm.

In the straightforward implementation of the discrete Fourier transform, there areN2 complexmultiplications and N2 complex additions. In the fast Fourier transform algorithm, there arelog2N stages, each of which involves 1

2N complex multiplications and N complex additions orsubtractions. The operation count (excluding computation of the complex exponentials andthe bit-reversed indexing) is thus 1

2N log2N complex multiplications and N log2N complexadditions. If additions and multiplications take about the same time, the speed up which canbe achieved relative to the straightforward implementation is 4N/(3 log2N). This is about137 when N = 1024. Larger transforms show even greater factors of improvement.

81

*HHHHHHHHHj

*HHHHHHHHHj

*HHHHHHHHHj

*HHHHHHHHHj

-

-

-

-

-

-

-

-

-

-

-

-

@@

@@

@@

@@@R-

@@

@@

@@

@@@R-

@@

@@

@@

@@@R-

@@

@@

@@

@@@R-

-

-

-

-

AAAAAAAAAAAAAAAAAAU-




X0

X1

X2

X3

X4

X5

X6

X7

A0

A1

A2

A3

B0

B1

B2

B3

C0

C1

D0

D1

E0

E1

F0

F1

x0

x4

x2

x6

x1

x5

x3

x7

W 08

W−18

W−28

W−38

−W 08

−W−18

−W−28

−W−38

W 04

W−14

−W 04

−W−14

W 04

W−14

−W 04

−W−14

W 02

−W 02

W 02

−W 02

W 02

−W 02

W 02

−W 02

000

100

010

110

001

101

011

111

EVENS

ODDS

EVENS

ODDS

EVENS

ODDS

Figure 5.3: Butterfly diagram for N = 8 fast Fourier transform

82

6

The Hilbert Transform and Linear Modulation Theory

6.1 The Hilbert transform

In AC circuit theory we use complex exponentials to represent real sinusoidally varying quan-tities with the understanding that the real part of the complex exponential gives the physicalquantity of interest. The analytic signal of a real function plays the same role for moregeneral waveforms. Given a real function f(t) with Fourier transform F (ν), the analyticsignal fa(t) is defined by

fa(t) = 2

∫ ∞

0F (ν) exp(j2πνt) dν (6.1)

This is just the inverse Fourier transform of the positive frequency part of F (ν). The Fouriertransform Fa(ν) of fa(t) is given by

Fa(ν) = 2u(ν)F (ν) (6.2)

Lemma: The real part of fa(t) is equal to f(t).

Proof: The real part of fa(t) is

1

2[fa(t) + f∗a(t)] =

∫ ∞

0F (ν) exp(j2πνt) dν +

∫ ∞

0F ∗(ν) exp(−j2πνt) dν (6.3)

=

∫ ∞

0F (ν) exp(j2πνt) dν +

∫ ∞

0F (−ν) exp(−j2πνt) dν (6.4)

since F (ν) is a Hermitian function. Substituting−ν for ν in the second integral and combiningthe two integrals yields the inverse Fourier transform of F (ν) which is just f(t).

The Hilbert transform of f(t) is defined to be the imaginary part of fa(t) and is denotedby f(t). Thus

fa(t) = f(t) + jf(t) (6.5)

The Fourier transform of this relationship is

2u(ν)F (ν) = F (ν) + jF (ν) (6.6)

where F is the Fourier transform of f(t). Solving this yields

F (ν) = −j sgn(ν)F (ν) (6.7)

83

The envelope of f(t) is |fa(t)| =√

f(t)2 + f(t)2.

Note: Unlike the Fourier transform, the Hilbert transform of a function of t is also a functionof t.

Exercise: Show that the analytic signal of R cos(ωt+φ) is A exp(jωt) where A = R exp(jφ).

Using the transform pair1

πt↔ −j sgn(ν) (6.8)

we can take the inverse Fourier transform of (6.7) to obtain

f(t) = f(t) ∗ 1

πt(6.9)

=

∫ ∞

−∞

f (τ)

π (t− τ) dτ (6.10)

where the integral is to be interpreted as a Cauchy principal value. This convolution canbe interpreted as a filtering operation with a quadrature filter which shifts all sinusoidalcomponents by a phase shift of −π/2.

Similarly, starting from

F (ν) = j sgn(ν)F (ν) (6.11)

we find that

f(t) = −∫ ∞

−∞

f (τ)

π (t− τ)dτ (6.12)

Note that the amplitudes of the Fourier transforms of f(t) and of f(t) are identical, theydiffer only in phase. The ear is largely insensitive to the phases of the Fourier components ina signal and so the Hilbert transform of a speech signal is still understandable.

Exercises:

1. Show from the integral (6.10) that the Hilbert transform of cos(ωt) is sin(ωt).

2. Find the Hilbert transform of Π(t) and sketch the result.

3. If m(t) is a nonnegative real-valued band-limited function and νc is larger than thebandwidth of m(t), show that the envelope of m(t) cos(2πνct + φ) is m(t). Thus thedefinition of the envelope given above coincides with the intuitive concept.

6.2 The Fourier transform of a real causal function

Given a function h(t) which is causal in the sense that h(t) = 0 for t < 0 and which has nosingularities at the origin, we see that

h(t) = u(t)h(t) (6.13)

84

Taking the Fourier transform of this relationship yields

H(ν) =

(

1

2δ (ν) +

1

j2πν

)

∗H(ν) (6.14)

=1

2H(ν)− 1

2π

∫ ∞

−∞

jH (ν ′)

ν − ν ′ dν ′ (6.15)

Writing H(ν) in terms of its real and imaginary parts leads to

ℑH(ν) = − 1

π

∫ ∞

−∞

ℜH(ν ′)

ν − ν ′ dν ′ (6.16)

ℜH(ν) =1

π

∫ ∞

−∞

ℑH(ν ′)

ν − ν ′ dν ′ (6.17)

Thus if h is causal and regular at the origin, given the real part of the Fourier transform ofh, it is possible to deduce the imaginary part and vice versa.

If in addition h is real so that H is Hermitian,

ℑH(ν) = −2ν

π

∫ ∞

0

ℜH(ν ′)

ν2 − ν ′2 dν ′ (6.18)

ℜH(ν) =2

π

∫ ∞

0

ν ′ℑH(ν ′)

ν2 − ν ′2 dν ′ (6.19)

These relationships may be generalized to include cases in which h has singularities. They areoften known as dispersion relations because similar equations were first derived in optics(by Kronig and Kramers) to relate the real and imaginary parts of the frequency-dependentrefractive index of a material. The real part of the refractive index determines the speed ofpropagation of waves and the dispersion whereas the imaginary part determines the absorp-tion. From general physical constraints such as causality, it is possible to derive relationshipsbetween these quantities. For example, a frequency range of anomalous dispersion (nearresonances in the atomic levels) coincides with a range of large absorption.

Exercise: Show that if h(t) is a real causal function and ℑH(ν) = −ν/(a2 + ν2) where a isreal, then ℜH(ν) = a/(a2 + ν2).

6.3 Transmission and reception of information

When sending information from one place to another, a suitable long-range physical process(such as electromagnetic radiation of the appropriate frequency) is used as a carrier. Themessage is used to modify some characteristic of the carrier in a process called modulation.At the receiver, the process of demodulation is used to recover the message.

Typically, the carrier is a sinusoidally varying signal at some frequency νc which is much higherthan the frequencies contained in the message. For example, for medium-wave broadcastradio, νc is in the range 0.5–1.6 MHz. Short-wave radio extends to about 30 MHz andVHF television to about 300 MHz. For a pure sinusoidal signal, the spectrum consists of(arbitrarily narrow) delta functions at ±νc. As we shall see, the process of modulation givesthe signal a nonzero bandwidth. However the bandwidth usually is much smaller than νc

85

and so the signal may be regarded as a narrow band process. This is an important feature ofmodulation since the dispersion and differential phase-shift introduced by the medium maylead to distortion unless the fractional bandwidth is small.

As a general rule, when independent (non-cooperating) signals are sent together, they mustoccupy non-overlapping frequency bands in order not to interfere with each other. Whenconsidering different modulation techniques, several issues arise

1. How easy is it to do the modulation and demodulation?

2. What bandwidth is occupied by the transmitted signal?

3. How much power is required to send the transmitted signal?

4. How tolerant is the scheme to added noise which may corrupt the signal on its way tothe receiver?

At this stage, we can consider the first three of these issues. We shall be considering a fewexamples to illustrate how the transform methods developed so far help us to understand theprocesses involved. Each of these examples are based on linear amplitude modulation. Othermodulation techniques include frequency modulation and many forms of pulse modulation.

We shall use the following notation:

1. c(t) = A cos(2πνct) is the carrier waveform of amplitude A > 0 and frequency νc,

2. m(t) is the modulating waveform which contains the message. The Fourier transform ofthis is M(ν) and we shall assume that its bandwidth is νm, i.e., M(ν) = 0 for |ν| > νm.Without loss of generality we shall assume that |m(t)| ≤ 1 for all t.

3. f(t) shall denote the transmitted signal and F (ν) is its spectrum.

6.4 Double sideband (DSB) modulation

The transmitted signal is the product of the modulating and carrier functions

f(t) = m(t)c(t) = Am(t) cos(2πνct) (6.20)

Taking the Fourier transform of this yields

F (ν) = AM(ν) ∗ 1

2[δ(ν − νc) + δ(ν + νc)] =

1

2A[M(ν − νc) +M(ν + νc)] (6.21)

The spectrum of DSB consists of two copies of the spectrum of the modulating signal, oneshifted to νc and the other to −νc. The frequencies on either side of the carrier are called thesidebands. The bandwidth of a DSB signal is thus twice the bandwidth of the modulatingsignal. If νc > νm, the analytic signal of a DSB signal is

fa(t) = A exp(j2πνct)m(t) (6.22)

86

and so the envelope is A|m(t)|. This absolute value means that if m(t) goes negative, themodulating signal cannot be recovered simply using envelope detection.

Exercise: Sketch the DSB signal, its envelope and spectrum when m(t) = cos(2πνmt).Notice how there are phase reversals in f(t) at the zeros of m(t).

Demodulation of DSB involves multiplying f(t) by a reconstructed carrier cos(2πνt) followedby a low-pass filter with cutoff frequency νm. This reconstructed carrier must be providedby the receiver.

f(t) cos(2πνct) = Am(t) cos2(2πνct) (6.23)

=1

2Am(t) +

1

2Am(t) cos(4πνct) (6.24)

The low-pass filter removes the second term leaving a copy of the modulating signal. The de-modulation process can also be understood in the frequency domain in terms of an additionalconvolution.

Exercise: What happens if the reconstructed carrier is not exactly correct so that it differsfrom the true carrier in phase or frequency?

The need to reconstruct an exact copy of the carrier is a major problem with DSB. Onesolution is add a small amount of the carrier (called a pilot carrier) to the transmittedsignal so that it can be recovered at the receiver and used to demodulate the signal. Thisalso alleviates the inconvenience that when m(t) = 0 the transmitted signal f(t) also vanishes,making it impossible to tune to a transmission when there is no modulation.

Modulators and demodulators for DSB are essentially multipliers which are often called bal-anced modulators or demodulators for historical reasons. These are often based on thequarter-square multiplier principle in which the signals c(t) +m(t) and c(t) −m(t) are eachpassed through devices with matched quadratic nonlinearities. The DSB signal is obtainedusing

1

4[(c(t) +m(t))2 − (c(t)−m(t))2] = c(t)m(t) (6.25)

They are moderately difficult to build using analogue electronics.

6.5 Amplitude modulation (AM)

This is the form of modulation most commonly used for commercial broadcasting on themedium-wave and short-wave radio bands. The signal is the same as DSB except that thecarrier is injected and transmitted along with the signal

f(t) = (1 +m(t))c(t) = A(1 +m(t)) cos(2πνct) (6.26)

Since |m(t)| ≤ 1 by assumption, the factor A(1 +m(t)) is always non-negative. It is easy toshow that if νc > νm, A(1 +m(t)) is the envelope of f(t) and so m(t) can be recovered usinga simple envelope detector. This ease of demodulation largely accounts for the popularity ofAM for commercial broadcasting since receivers can be made very inexpensively.

In terms of the spectra, it is easy to show that

F (ν) =1

2A[δ(ν − νc) + δ(ν + νc) +M(ν − νc) +M(ν + νc)] (6.27)

87

which is just the same as DSB except for the addition of the delta functions representing thecarrier. The bandwidth is the same as for DSB but additional power is required to transmitthe carrier.

Exercise: Show that if m(t) = B cos(2πνmt) where |B| < 1 the fraction of the total powerin f(t) that is contained within the carrier is 2/(2 +B2). Note that this is always at least 2

3 .

Amplitude modulators are relatively easy to make and can work at the very high powers usedfor broadcasting. They are usually based on passing c(t) +m(t) through some nonlinearityand then filtering the result (e.g. with a resonant tuned circuit) around νc.

6.6 Single sideband (SSB) modulation

Amplitude modulation is very inefficient since at least two-thirds of the total signal poweris contained in the carrier component which does not contain any information. Furthermoreboth sidebands are transmitted, although one is essentially a mirror-image of the other (sinceM(ν) is Hermitian) and so only one is strictly necessary. Single sideband avoids both of theseproblems at the cost of a more complicated modulator. Its high efficiency however has madeit the most popular technique for voice-based communication (radio-telephones etc.) on theHF bands.

The SSB modulated signal is

f(t) = Aℜ(ma(t) exp(j2πνct)) (6.28)

= A[m(t) cos(2πνct)− m(t) sin(2πνct)] (6.29)

where ma denotes the analytic signal and m the Hilbert transform of m. Taking the Fouriertransform of this result we see that

ma(t) exp(j2πνct)↔ 2u(ν − νc)M(ν − νc) (6.30)

f(t) = Aℜ[ma(t) exp(j2πνct)]↔ A[u(ν − νc)M(ν − νc) + u(−ν − νc)M∗(−ν − νc)] (6.31)

Graphically, the positive frequency components of m are shifted by νc and the negativefrequency components are shifted by −νc. Comparing this with the spectrum of AM foundearlier, we see that the spectrum of f only contains the upper sideband (frequencies |ν| >νc). Both the lower sideband and the carrier are not transmitted and so all the power inthe transmitted signal sends information about m. Note that the components at negativefrequencies ν < −νc cannot be suppressed since we want f(t) to be a real signal.

The above signal is more precisely called an USB (upper sideband) modulated signal. Simi-larly, we can define a LSB (lower sideband) modulated signal by

f(t) = Aℜ(ma(t) exp(−j2πνct)) (6.32)

= A[m(t) cos(2πνct) + m(t) sin(2πνct)] (6.33)

SSB signals can be demodulated by multiplying them with a reconstructed carrier cos(2πνct)and low-pass filtering the result, just as for DSB. Unlike DSB however, the precise phase andfrequency of the reconstructed carrier is not so critical for telephony (voice) transmission.

88

Exercise: Show that if the reconstructed carrier is cos(2πν ′ct+φ), the result of demodulatingthe USB signal (6.29) is

A

2ℜ[ma(t) exp(−j[2π∆νt+ φ])] (6.34)

where ∆ν = ν ′c − νc is the detuning. In particular, if m(t) = cos(2πνmt) show that thedemodulated signal is proportional to cos[2π(νm −∆ν)t− φ].

From the exercise, we see that a detuning simply shifts the frequency of each sinusoidalcomponent within m(t) by a fixed amount. Speech that is distorted in this way is still readilyunderstandable for detunings of a few Hz. In DSB however, the corresponding detuningcauses the demodulated signal to fade in and out at a rate ∆ν.

(Application note: The ability to shift the frequency of a voice signal by a small amountis sometimes used in public address systems to reduce the likelihood of howl-around or oscil-lation when the volume is turned up.)

6.6.1 The phasing method for generating SSB

This is a direct implementation of (6.29) or (6.33). The modulating signal m(t) is passed intoa Hilbert transformer and m(t) and m(t) are fed into two balanced modulators (multipliers)driven with cos(2πνct) and sin(2πνct). By adding or subtracting the outputs of the balancedmodulators, either LSB or USB modulation can be performed.

HilbertTransformer

×

×-

-

?

m(t)

m(t)

±?

6

- f(t)

6

?

A cos(2πνct)

A sin(2πνct)

The main technical difficulty with this approach is the implementation of the Hilbert trans-former for m(t) which must operate over a wide range of frequencies (about 300 Hz to 3 kHzfor telephone-quality voice). One common solution is to build two RC networks with matchedamplitude characteristics but such that one shifts all frequencies by 90 more than the other.If m(t) is fed into both networks, an approximate Hilbert transform relationship holds be-tween the outputs of the two networks and these are fed into the balanced modulators. Suchnetworks require high-precision components with non-preferred values to give reasonable re-sults.

6.6.2 Weaver’s method

This requires a more complex modulator but avoids the use of the wide-band phase-shifter.

89

×

×

+?

6

- f(t)

6

?

cos(2πνpt)

− sin(2πνpt)

×

×6

?

cos[2π(νc + νp)t]

− sin[2π(νc + νp)t]

-

-

Low-passfilter

Low-passfilter

-

-

-m(t)

ℜa(t)

ℑa(t)

ℜb(t)

ℑb(t)

-

-

The operation of this modulator is best understood by thinking in terms of complex-valuedsignals and considering the spectra at different stages. Starting with m(t), we define thecomplex signal

a(t) = m(t) exp(−j2πνpt) (6.35)

where νp = 12νm and νm is the bandwidth ofm(t). In the frequency domain, A(ν) = M(ν+νp)

which is a shifted version of M(ν). The positive frequency part of M(ν) now straddles theorigin and falls in the band [−νp, νp]. If we now low-pass filter a(t) using a filter withcutoff frequency νp, this will extract those components which originally formed the positivefrequency part of M(ν). Let this filtered version of a(t) be denoted b(t). It is easy to see that

B(ν) = u(ν + νp)M(ν + νp) (6.36)

We now consider b(t) exp(j2π(νc +νp)t). This shifts the spectrum of B by νc +νp resulting in

B(ν − νc − νp) = u(ν − νc)M(ν − νc) (6.37)

If we now consider the real part ℜ[b(t) exp(j2π(νc + νp)t)], the spectrum is

1

2[B(ν−νc−νp)+B

∗(−ν−νc−νp)] =1

2[u(ν−νc)M(ν−νc)+u(−ν−νc)M

∗(−ν−νc)] (6.38)

This is the spectrum of a USB signal, showing that the modulator works.

We now need to consider the physical implications of working with complex signals. Acomplex number may be regarded simply as a pair of real numbers (the real part and theimaginary part) together with appropriate rules for addition and multiplication. In the dia-gram, the upper half implements the real part of the system and the lower half the imaginarypart. Thus for example, the real and imaginary parts of (6.35) are

ℜa(t) = m(t) cos(2πνpt) and ℑa(t) = −m(t) sin(2πνpt) (6.39)

Low-pass filtering of the complex signal a(t) corresponds to filtering the real and imaginaryparts separately. In the final stage,

ℜ[b(t) exp(j2π(νc + νp)t)] = ℜ[b(t)] cos(2π(νc + νp)t)−ℑ[b(t)] sin(2π(νc + νp)t) (6.40)

which is implemented using the last two balanced modulators and the summer.

90

6.7 Quadrature modulation/demodulation – The complexenvelope

It is possible to send two real modulating signals using a single carrier by using quadraturemodulation. If m1(t) and m2(t) are real-valued modulating signals, we define the complexmodulating signal as

m(t) = m1(t) + jm2(t) (6.41)

Quadrature modulation involves the transmission of the signal

f(t) = ℜ[m(t) exp(j2πνct)] (6.42)

The spectrum of the quadrature modulated signal is

F (ν) =1

2[M(ν − νc) +M∗(−ν − νc)] (6.43)

This consists of sidebands on either side of the carrier νc, but unlike DSB, the upper andlower sidebands are not mirror images of each other. The output is a narrowband processaround the carrier frequency νc.

The complex envelope of a narrowband process g(t) with respect to frequency ν0 is definedto be the signal

ga(t) exp(−j2πν0t) (6.44)

where ga(t) is the analytic signal of g(t). It is easy to see that the complex envelope of thequadrature modulated signal f(t) with respect to the carrier frequency is m(t).

Quadrature modulation is demodulated by multiplying the signal with exp(−j2πνct) andlow-pass filtering the result. The real and imaginary parts of the low-pass filter output arem1(t) and m2(t) respectively. Just as in DSB demodulation, it is essential that the phaseand frequency of this reconstructing carrier be the same as that at the transmitter.

Exercise: Splitting real and imaginary parts as discussed above, sketch a block diagramshowing how to perform quadrature modulation and demodulation.

6.8 The optical homodyne detector

Extracting the complex envelope of an optical signal presents special challenges because thecarrier frequency νc is very high. This is of course also the main advantage of using lightfor communications since the fractional bandwidth occupied by the modulated signal is verysmall.

Typically we will wish to determine m(t) from a low-intensity light field whose electric fieldis given by

f(t) = ℜ[m(t) exp(j2πνct)] (6.45)

The optical homodyne detector consists of a beam splitter which linearly combines theinput signal f(t) and a strong coherent local oscillator for which the field is

c(t) = ℜ[A exp(j[2πνct− φ])] (6.46)

91

where A is real. If the amplitude transmission coefficient of the beam-splitter is T and theamplitude reflection coefficient is R, the conditions for a lossless beam-splitter are

|T |2 + |R|2 = 1 (6.47)

TR∗ +RT ∗ = 0 (6.48)

Filter

Photodetector

-g1(t)

6

g2(t)

f(t)

c(t)

Beam splitter

- Output

6

-

There are two output ports of the beam splitter. The fields in each are

g1(t) = ℜ[RA exp(−jφ) + T m(t)] exp(j2πνct) (6.49)

g2(t) = ℜ[TA exp(−jφ) +Rm(t)] exp(j2πνct) (6.50)

In an unbalanced homodyne detector, a photodetector which measures the intensity of thelight is placed in one of the output ports and its output is low-pass filtered. If this is in thefirst port, we see that the intensity is proportional to

I1(t) ∝ |RA exp(−jφ) + T m(t)|2 (6.51)

= |R|2A2 + 2Aℜ(R∗Tm(t) exp(jφ)) + |T |2|m(t)|2 (6.52)

The second term is the desired output. By adjusting φ appropriately we can recover ℜ[m(t)],ℑ[m(t)] or any intermediate quadrature. Notice that the homodyne detector has amplifiedthe weak input signal by A|RT | which in principle can be made as large as desired. The firstterm is a constant background due to the local oscillator and the third is usually small since|m(t)| ≪ A.

In a balanced homodyne detector, a second photodetector is placed in the other output portof the beam splitter and the intensity difference is computed before filtering. The reflectionand transmission coefficients are adjusted so that |R|2 = |T |2 = 0.5. We find that

I1(t)− I2(t) ∝ 4Aℜ(R∗Tm(t) exp(jφ)) (6.53)

which consists of only the desired term.

The above discussion is a purely classical description of the homodyne detector. In theclassical theory, it is possible to produce a light signal in which m(t) is a constant (complex)quantity. This corresponds to a light wave of known amplitude and phase. In quantummechanics however, it turns out that there is always some residual fluctuation in m(t) nomatter how well the light source is stabilized. The Heisenberg uncertainty principle shows

92

that the product of the root-mean-square fluctuations in ℜ[m(t)] and ℑ[m(t)] cannot be lessthan a certain amount. For a coherent light source, the r.m.s. fluctuations in ℜ[m(t)] andℑ[m(t)] are equal to each other whereas for a squeezed light source, these fluctuations areunequal (although they still satisfy the uncertainty principle.) Note that since the fluctuationsin m(t) have components at all frequencies, it is usually necessary to describe the behaviourof the fluctuations in different frequency bands corresponding to the placement of differentfilters after the photodetector. This gives rise to the concept of the squeezing spectrum.

6.9 The television video signal

In a television set, an electron beam scans across the screen and its intensity is altered inorder to produce a picture (see Figure C.1). In 1/50 s, the spot traces out a field of 3121

2 linesstarting at the top left of the screen and finishing at the bottom centre. On the next field, thespot starts at the top centre and ends at the bottom right, the lines being interlaced withthe lines of the previous field. The two fields together make a frame of 625 lines in 1/25 s.A single line is completed in 64µs which corresponds to a horizontal scan frequency offh = 15625 Hz. The vertical field frequency is fv = 50 Hz and the frame rate isff = 25 Hz. The frame rate was chosen to give the illusion of continuous motion while thefield rate makes the flicker imperceptible.

The intensity of the scene is encoded as a luminance signal xY (t). Between adjacent linesand fields, synchronization information is added to allow the scan generators in the receiverto lock onto those in the transmitter. The resulting signal is called the composite videosignal, and waveform of a single line is as shown in figure C.2.

In a monochrome television system, the composite video signal is amplitude modulated ontothe VHF or UHF carrier. A vestigial sideband (VSB) system is used in which the carrier istransmitted but the lower sideband is partially suppressed to reduce the overall bandwidthof the signal. Since the composite video signal is purely real, this does not cause any loss ofinformation.

6.9.1 The spectrum of the composite video signal

It is relatively easy to analyze the spectrum of a composite video signal when there is nomotion in the scene. Instead of retraced scanning, we consider a periodically repeated imagein two dimensions and a single unbroken scanning path as shown in figure C.3. Each newscan line corresponds to the scanning path entering the image in the next column whileeach new field corresponds to the scanning path entering the image in the next row. Thesynchronization information may be thought of as occurring in the borders around eachimage.

At time t the coordinates of the scanning spot are

x = sxt, y = syt (6.54)

where sx = fhH and sy = fvV so that the spot traverses a horizontal distance of fhH and avertical distance of fvV each second.

93

Let I(x, y) denote the distribution of intensity. Since this is a two-dimensional periodic image,it is expressible as a two-dimensional Fourier series

I(x, y) =

∞∑

k=−∞

∞∑

l=−∞

ckl exp

[

j2π

(

kx

H+ly

V

)]

(6.55)

where the Fourier coefficients ckl are given by

ckl =1

HV

∫ H

0dx

∫ V

0dy I(x, y) exp

[

−j2π

(

kx

H+ly

V

)]

(6.56)

In general, we expect ckl to fall to zero as k and l become large as these correspond to veryrapid variations of intensity across the image (i.e., by the Riemann-Lebesgue lemma). Theluminance signal is given by substituting (6.54) into (6.55)

xY (t) = I(sxt, syt) =

∞∑

k=−∞

∞∑

l=−∞

ckl exp

[

j2π

(

kx

H+ly

V

)]

(6.57)

=

∞∑

k=−∞

∞∑

l=−∞

ckl exp[j2π(kfh + lfv)t] (6.58)

Thus xY (t) is a sum of complex exponentials at the discrete frequencies kfh + lfv where kand l are integers. This leads to a spectrum consisting of clusters of lines spaced by fv aroundmultiples of fh since fv ≪ fh (Figure C.4).

When we consider a moving picture, the lines broaden slightly, but not by much since thetimescale of changes in the picture is much slower than the line and field rates. Note that if anyspectral lines are present which are not of the form kfh + lfv, they will necessarily representa non-stationary pattern on the screen. The significant spectral lines for a broadcast-qualitytelevision picture extend to about 5.5 MHz.

6.9.2 The NTSC colour system

This system is used in the USA, Japan, Canada and Mexico. It was developed in 1954 asa fully compatible colour TV signal which fits into the bandwidth of a monochrome signal.The PAL system used in Germany, Britain, Australia and New Zealand is an improvement ofthe NTSC system. We shall discuss the NTSC-4.43 system based on a 50 Hz field frequencyrather than the NTSC-3.58 system based on the 60 Hz field frequency actually used in theUS.

The human visual response to any colour can be simulated using a mixture of red, green andblue lights. A straightforward approach to colour television would be to send three separatevideo signals xR(t), xG(t) and xB(t) , one for each primary colour. (Assume without loss ofgenerality that each of these is of modulus no greater than one). This would occupy a muchgreater bandwidth than a monochrome signal and would not be compatible with preexistingmonochrome equipment.

To simulate the luminance signal of a monochrome camera, the colour signals are combinedaccording to

xY (t) = 0.30xR(t) + 0.59xG(t) + 0.11xB(t) (6.59)

94

the coefficients having been chosen to match the different sensitivities of the eye to theprimary colours. The spectrum of this luminance signal is given by the analysis above. Itconsists of clusters of spectral lines around multiples of the line frequency separated by gapsin which the spectrum is almost zero. The key idea of the compatible colour systems is to fitthe colour information in the gaps between the clusters in such a way that this does not causean objectional patterning on a monochrome receiver. Subjectively, such additional frequencycomponents do not produce objectionable effects if they are of sufficiently high frequency thatthe intensity fluctuations introduced are fine-grained and change rapidly between successivelines or fields so that their effects tend to average out. In the NTSC system, two additionalsignals xI and xQ are defined as

xI(t) = 0.60xR(t)− 0.28xG(t)− 0.32xB(t) (6.60)

xQ(t) = 0.21xR(t)− 0.52xG(t) + 0.31xB(t) (6.61)

These are called the chrominance signals. Given xI , xQ and xY it is easy to regenerate thesignals xR, xG and xB .

Another fact about the human visual system which is exploited in colour television is that theeye is much more sensitive to resolution in the luminance information than in the chrominanceinformation. In fact, before colour photography was widely used, pseudo-colour photographswere made by hand-painting black and white photographs. If the original black and whitephotograph was clear and in-focus, the fact that the colours were not applied very carefullydid not matter very much. This means that it is possible to restrict the bandwidth of thechrominance signals xI and xQ quite severely (to about 1–2 MHz) compared with the 5.5 MHzbandwidth of xY without significant perceived loss of picture quality. (Recall that restrictingthe bandwidth in frequency corresponds to convolving in time with the impulse responseof a filter whose temporal width is inversely dependent on the bandwidth. The narrowerthe bandwidth in frequency, the more a sharp transition is smeared out by this convolution.Similarly for a picture, restricting the bandwidth corresponds to defocusing the camera andsmearing out fine detail).

The signals xI and xQ are band-limited and quadrature modulated onto a colour subcarriersignal whose frequency fs is chosen to lie exactly at 283.5 times the line frequency so thatfs = 283.5 fh = 4.4296875 MHz. The sideband structure of this quadrature modulated signalalso consists of clusters spaced by fh around fs (the same analysis as for xY applies exceptthat the modulating signal xI + jxQ is complex and so the sidebands are not Hermitiansymmetric about fs). By choosing fs half way between two multiples of fh, the chrominancesidebands interleave between the luminance sidebands with minimal interference.

Since quadrature modulation requires the receiver to have an accurate copy of the carrier inorder to demodulate the two channels of chrominance information, a colour burst consistingof about ten cycles of the colour subcarrier is included in the transmitted signal immediatelyafter the line synchronization pulse. The receiver has a phase-locked loop which is lockedprecisely to the subcarrier frequency and phase and this is used to reconstitute xI and xQ.The presence of the colour burst signal is used by colour receivers to distinguish betweencolour transmissions and monochrome transmissions so that compatibility is achieved.

The main problem with the NTSC system is that various propagation conditions can causethe phase of the decoded chrominance information to be shifted relative to the original. Thenet effect of this is to shift all the hues of the picture to incorrect values. NTSC receivers

95

usually have a hue control to allow manual adjustment of the subcarrier phase so that thecolours look natural.

6.9.3 The PAL (Phase Alternate Line) system

In the PAL system, the chrominance signals are defined as slightly different linear combina-tions of the primary colours (Table 5.1, Figure 5.2)

xV (t) = 0.877(xR(t)− xY (t)) (6.62)

xU (t) = 0.493(xB(t)− xY (t)) (6.63)

These signals are used to form a complex modulating signal and this is quadrature modulatedonto a colour subcarrier just as in the NTSC system. Unlike the NTSC system however, thecomplex modulating signal is

m(t) =xU (t) + jxV (t) if t is within an odd numbered linexU (t)− jxV (t) if t is within an even numbered line

(6.64)

This phase-reversal in xV (t) on alternate lines gives the system its name. If propagationconditions introduce a phase shift, the phase-reversal on alternate lines cause opposite colourshifts which tend to cancel each other out (Figures 5.4, 5.5).

It is easy to determine the effect of the phase alternation on the spectrum of the chrominancesignal. Using a similar construction to that used for finding the spectrum of the monochromesignal we consider a two-dimensional infinite periodic repetition of the image. The phasereversals make adjacent columns of images no longer identical and so the effective horizontalperiod is 2H rather than H. Repeating the analysis shows that instead of clusters of spectrallines around multiples of fh, the clusters are separated by 1

2fh. It is no longer appropriateto place the subcarrier frequency half-way between two clusters of the luminance signal as inthe NTSC system since the closer chrominance cluster frequency spacing would cause majorchrominance sidebands to interfere with the luminance. Instead, the subcarrier is shifted toapproximately 283.75 fh. A more detailed analysis of the sideband structure indicates that thesubcarrier frequency should be of the form (n± 3

8 )fv for some integer n. Applying this smallcorrection leads to the PAL subcarrier frequency of 283.75 fh + 0.5 fv = 4.43361875 MHz.

As in the NTSC system, a colour burst is included to allow the receiver to reconstruct thesubcarrier required for quadrature demodulation of the chrominance information. This colourburst is deliberately shifted by ±45 relative to the true subcarrier phase on alternate linesso that the receiver can distinguish between odd and even lines. The PAL colour burst isthus called a swinging burst.

References

Carlson, A.B., Communication Systems, (second edition) 1975, McGraw-Hill.

King, G.J., Beginner’s Guide to Colour Television, (second edition) 1973, Newnes-Butterworths.

96

7

The Laplace transform

7.1 Introduction

The Laplace transform is closely related to the Fourier transform and we shall be consideringthis relationship in some detail. This will allow many of the properties of the Laplace trans-form to be deduced from those of the Fourier transform. You should already be familiar withthe use of Laplace transforms for solving initial value differential equations and for carryingout transient analysis of electrical circuits and so these applications will not be covered indetail.

7.2 The one-sided Laplace transform

Given a function f(t) which is assumed to be causal so that f(t) = 0 for t < 0, the (one-sided)Laplace transform of f(t) is

FL(s) =

∫ ∞

0f(t) exp(−st) dt (7.1)

In this definition s is assumed to be a complex variable s = σ + jω. The set of values of sfor which the integral converges absolutely is called the region of absolute convergenceof the Laplace transform.

Theorem If the one-sided Laplace transform of the locally integrable function f(t) convergesabsolutely at some complex number s0, then it converges at every point s satisfying ℜs ≥ ℜs0.Proof: We make use of the fact that the improper integral of a locally integrable functionφ(t),

∫ ∞

0φ(t) dt (7.2)

converges if and only if for any given ǫ > 0 there exists M > 0 with the property thatwhenever t1 and t2 are chosen so that t2 > t1 > M ,

∣

∣

∣

∣

∫ t2

t1

φ(t) dt

∣

∣

∣

∣

< ǫ (7.3)

(This is called the Cauchy condition for the existence of the improper integral.)

99

Let us now consider the Laplace transform integral at s for arbitrary limits t2 > t1 > 0

∣

∣

∣

∣

∫ t2

t1

f(t) exp(−st) dt

∣

∣

∣

∣

≤∫ t2

t1

|f(t) exp(−s0t) exp[−(s− s0)t]|dt (7.4)

=

∫ t2

t1

|f(t) exp(−s0t)| exp[−ℜ(s− s0)t] dt (7.5)

≤∫ t2

t1

|f(t) exp(−s0t)|dt (7.6)

where the last inequality follows from ℜs ≥ ℜs0 and t2 > t1 > 0. Since the Laplace transformconverges absolutely at s0, the integral

∫ ∞

0|f(t) exp(−s0t)|dt (7.7)

exists and satisfies the Cauchy condition. Thus for a given ǫ > 0, there exists M > 0 suchthat choosing t2 > t1 > M will ensure that (7.6) is less than ǫ. This shows that the Laplacetransform integral at s satisfies the Cauchy condition and thus must exist. Indeed the abovealso shows that the convergence of the Laplace transform integral at s is absolute.

This result shows that the region of absolute convergence of a one-sided Laplace transformis either

1. An open right half-plane ℜs > σa

2. A closed right half-plane ℜs ≥ σa

where σa is a real number (which may be±∞) called the abscissa of absolute convergenceof the Laplace transform.

Exercise: Fill in the details required to establish the above claim. In particular considerwhy it is that the Laplace transform cannot be absolutely convergent at any point to the leftof s = σa and why it is that the line s = σa itself must either be entirely inside or entirelyoutside the region of absolute convergence.

(Technical note: It is possible to work with the region of simple convergence of Laplacetransforms where we only require convergence rather than absolute convergence of the in-tegral. However this complicates the theory somewhat. The region of simple convergenceneed not coincide with the region of absolute convergence. In fact there are functions forwhich the Laplace transform is convergent everywhere but absolutely convergent nowhere.For more details see for example Introduction to the Theory and Application of the Laplace

Transformation by Gustav Doetsch, Springer-Verlag 1974.)

If we now write s = σ + jω in the Laplace transform integral, we see that

FL(σ + jω) =

∫ ∞

0f(t) exp(−σt) exp(−jωt) dt (7.8)

If we compare this with the Fourier transform for a function g(t) (using the ω form ratherthan the ν form)

G(ω) =

∫ ∞

−∞g(t) exp(−jωt) dt (7.9)

100

it is clear that we can consider the Laplace transform of f on the vertical line σ + jω as aFourier transform of the function

g(t) = f(t) exp(−σt) (7.10)

With this definition,

G(ω) = FL(σ + jω) (7.11)

In the region of absolute convergence of the Laplace transform, we see that f(t) exp(−σt) isabsolutely integrable and so its Fourier transform is well-defined in the classical sense. Theinverse Fourier transform allows us to recover g(t)

g(t) =1

2π

∫ ∞

−∞G(ω) exp(jωt) dω (7.12)

=1

2π

∫ ∞

−∞FL(σ + jω) exp(jωt) dω (7.13)

Using (7.10) this tells us how to recover f(t) from FL since

f(t) = exp(σt)g(t) (7.14)

=1

2π

∫ ∞

−∞FL(σ + jω) exp[(σ + jω)t] dω (7.15)

This is the inverse Laplace transform relationship. It is more commonly written as a contourintegral over s = σ + jω

f(t) =1

j2π

∫ σ+j∞

σ−j∞FL(s) exp(st) ds (7.16)

where this is simply a different notation for exactly the same integral as in (7.15). The valueof σ must be chosen to lie within the region of absolute convergence of FL. Notice that theintegral is carried out on a line parallel to the imaginary axis in the complex plane. There areusually many such lines within the region of absolute convergence. Performing the integralalong any of these lines must give the same result. In this sense, we may regard the Laplacetransform as a family of Fourier transforms of f(t) exp(−σt) for many different values of σ.The effect of the damped exponential is to control the function at large values of t so thatthe result is absolutely integrable.

Example: Calculate the inverse Laplace transform of

FL(s) =exp (−s)s2 + a2

(7.17)

whose region of absolute convergence is ℜs > 0.

Substituting into the inverse Laplace transform relationship, we see that

f(t) =1

j2π

∫ σ+j∞

σ−j∞

es(t−1)

s2 + a2ds (7.18)

The integral is over a line L in the right-hand half-plane parallel to the imaginary axis startingat σ − j∞ and ending at σ + j∞. We wish to use complex integration methods to evaluatethis integral. The function FL(s) and hence the integrand is only defined on the region of

101

absolute convergence but we can easily extend the integrand to the entire complex plane bydefining

I(s) =es(t−1)

s2 + a2(7.19)

This function is meromorphic, which means that it is analytic everywhere on the complexplane except at a set of isolated singularities at which there are poles. Since this functionagrees with the required integrand on the line L, it will have the same integral over L. Byextending the function in this way, however, it is possible to use the techniques of complexvariable theory to evaluate the integral by means of residues and contour integration.

(Informal note: If you like, think of the contour integration as a mathematical trick used toevaluate the integral over the line L. As part of this trick, we need to extend the integrandto I(s) over the whole complex plane so that we can draw a contour through a region outsidethe region of convergence of the Laplace transform. The extended integrand I(s) needs toagree with the true integrand only along the line L so that it will give the correct integralwhich is required to recover f(t).)

The extended integrand I(s) has simple poles at s = ja and s = −ja. We need to considertwo cases:

• If t < 1, I(s) becomes large in the left-hand half-plane (as ℜs → −∞) and small inthe right-hand half-plane (as ℜs→∞). Let us consider completing the contour in theright-hand half plane with a large semicircle CR whose radius we will ultimately taketo infinity.

This contour encloses no singularities of I(s) and so the integral vanishes, i.e.,

(∫

L+

∫

CR

)

I (s) ds = 0 (7.20)

It is easy to see that as the radius of CR increases, the second integral tends to zerosince the integrand I(s) becomes small more rapidly than the arc length. Hence

f(t) =1

j2π

∫

LI(s) ds = 0 for t < 1. (7.21)

• If t > 1, I(s) becomes small in the left-hand half-plane (as ℜs→ −∞) and large in theright-hand half-plane (as ℜs→∞). Thus, let us consider complete the contour in theleft-hand half plane with a large semicircle CL whose radius we will ultimately take toinfinity.

This contour encloses two singularities of I(s). By the residue theorem

(∫

L+

∫

CL

)

I (s) ds = j2π [Res (I (s) , ja) + Res (I (s) ,−ja)] (7.22)

Recall that the residue at a pole s = p is the coefficient of the term 1/(s − p) in theLaurent expansion of the function. Since

I(s) =1

j2a

(

1

s− ja+

1

s+ ja

)

es(t−1) (7.23)

102

we see that

Res(I(s), ja) =1

j2aeja(t−1), Res(I(s),−ja) = − 1

j2ae−ja(t−1) (7.24)

Hence(∫

L+

∫

CL

)

I (s) ds =π

a

[

eja(t−1) − e−ja(t−1)]

=j2π

asin [a (t− 1)] (7.25)

Combining the results yields

f(t) =1

au(t− 1) sin[a(t− 1)] (7.26)

Exercise: Ensure that you can recover u(t)tn−1/(n−1)! by explicitly calculating the inverseLaplace transform of s−n.

In practice, we usually do not use the explicit inverse transform relationship. Instead, wereduce the expression into a sum of standard forms (e.g. using partial fractions) and then usea table of Laplace transforms to work backwards. Under some conditions, it may be possibleto write an expression as a product of two familiar transforms and to use the convolutiontheorem to evaluate the desired inverse transform.

7.3 Properties of the one-sided Laplace transform

The following properties of the one-sided Laplace transform may readily be proven. We usethe notation f(t)↔ FL(s) to indicate a Laplace transform pair.

1. Linearity

f(t) + g(t)↔ FL(s) +GL(s) (7.27)

2. Differentiation

f ′(t)↔ sFL(s)− f(0) (7.28)

f (n)(t)↔ snFL(s)− sn−1f(0)− sn−2f ′(0)− ...− f (n−1)(0) (7.29)

−tf(t)↔ F ′L(s) (7.30)

(−1)ntnf(t)↔ F(n)L (s) (7.31)

3. Integration∫ t

0f(τ) dτ ↔ 1

sFL(s) (7.32)

4. Convolution∫ t

0f(τ)g(t− τ) dτ ↔ FL(s)GL(s) (7.33)

103

5. Scaling and translation

eatf(t)↔ FL(s− a) (7.34)

1

cf(t/c)↔ FL(cs) (c > 0) (7.35)

f(t− b)u(t− b)↔ e−bsFL(s) (7.36)

6. Initial value theorem

limt→0+

f(t) = lims→∞

sFL(s) (7.37)

7. Final value theorem

limt→∞

f(t) = lims→0

sFL(s) (7.38)

7.4 The two-sided Laplace transform

This is simply the Laplace transform defined for functions which need not be zero for negativearguments. The transform pair is

FL(s) =

∫ ∞

−∞f(t) exp(−st) dt (7.39)

f(t) =1

j2π

∫ σ+j∞

σ−j∞FL(s) exp(st) ds (7.40)

The region of absolute convergence of a two-sided Laplace transform consists of points in astrip bounded by two lines parallel to the imaginary axis. There are thus two abscissae ofabsolute convergence −∞ ≤ σ1 < σ2 ≤ ∞ such that the integral converges absolutely withinσ1 < ℜs < σ2. It is easy to see that this must be the case since we can write f(t) as asum of a causal part and an anticausal part. The region of absolute convergence of f is theintersection of the regions of absolute convergence of the two parts.

For the two-sided Laplace transform, it is essential to specify the region of absolute conver-gence together with the function FL(s) since different f(t) can have the same FL(s) but withdifferent regions of absolute convergence.

Example: What are the two-sided Laplace transforms of

• u(t) exp(−at)

• −u(−t) exp(−at)

Substituting these into the definition, we see that the Laplace transform of u(t) exp(−at) is

∫ ∞

0exp[−(a+ s)t] dt =

1

s+ aprovided that σ > −a (7.41)

The Laplace transform of −u(−t) exp(−at) is

∫ 0

−∞exp[−(a+ s)t] dt =

1

s+ aprovided that σ < −a (7.42)

104

These two different functions have the same expression for the Laplace transform but differentregions of absolute convergence.

Exercises:

1. Use the inverse Laplace transform relationship to recover each of the above functionsfrom 1/(s + a) using the appropriate region of absolute convergence. Ensure that youunderstand how to get the both forms in the regions t < 0 and t > 0.

2. Consider the function

FL(s) =exp (2s)

(s+ 2) (s+ 3)(7.43)

Calculate the inverse Laplace transform given that the region of absolute convergenceis

a) ℜs < −3

b) −3 < ℜs < −2

c) ℜs > −2

7.5 Example: Solution of the diffusion equation

Consider the differential equation for heat diffusion in a one-dimension

κ∂2θ

∂x2=∂θ

∂t

together with the initial and boundary conditions

θ(x, 0) is specified, (7.44)

θ(−∞, t) = θ(∞, t) = 0 for all t (7.45)

To solve this problem, we use a Fourier transform in x and a (one-sided) Laplace transformin t. In general, a Fourier transform is appropriate when the boundary conditions at ±∞ arehomogeneous and a Laplace transform is useful for initial-value problems. We introduce

ΘF (u, t) =

∫ ∞

−∞θ(x, t) exp(−j2πux) dx (7.46)

and

ΘFL(u, s) =

∫ ∞

0ΘF (u, t) exp(−st) dt (7.47)

Taking the Fourier transform (in x) of the differential equation yields

κ(j2πu)2ΘF (u, t) =∂ΘF

∂t(7.48)

Taking the Laplace transform (in s) of this yields

−4π2κu2ΘFL(u, s) = sΘFL(u, s)−ΘF (u, 0) (7.49)

105

Solving this yields

ΘFL(u, s) =ΘF (u, 0)

s+ 4π2κu2(7.50)

The inverse Laplace transform of this is

ΘF (u, t) = ΘF (u, 0) exp(−4π2κu2t) (7.51)

The desired solution is found by taking an inverse Fourier transform. The right-hand sidebecomes a convolution of θ(x, 0) with the inverse Fourier transform of exp(−4π2κu2t) whichis

1√4πκt

exp

(

− x2

4κt

)

(7.52)

Thus,

θ(x, t) =

∫ ∞

−∞

θ(ξ, 0)√4πκt

exp

(

−(x− ξ)24κt

)

dξ (7.53)

If the initial temperature distribution is a delta function, this spreads out in time into aGaussian of width proportional to

√t. This scaling of the width is characteristic of a diffusion

process.

Exercises:

1. Instead of a rod of infinite length, consider a rod of length L extending between x =0 and x = L. The initial temperature θ(x, 0) is again specified but the boundaryconditions are now θ(0, t) = θ(L, t) = 0. Using the fact that θ(x, t) can be expanded asa sine series

θ(x, t) =

∞∑

k=1

ck(t) sin

(

kπx

L

)

(7.54)

calculate an expression for θ(x, t) in terms of θ(x, 0).

2. The density of neutrons n in a cubical block of uranium 235 of side a is governed bythe equation

κ∇2n+ γn =∂n

∂t(7.55)

where κ and γ are constants. At the surfaces of the block (x = 0, x = a, y = 0, y = a,z = 0, z = a) the neutron density is zero. Find the value of a such that the blockbecomes critical.

Hint: Write n as a three-dimensional sine series

n(x, y, z, t) =∞∑

k=1

∞∑

l=1

∞∑

m=1

cklm(t) sin

(

kπx

a

)

sin

(

lπy

a

)

sin(mπz

a

)

(7.56)

7.6 The Laplace transform and the matrix exponential

Consider a system of coupled linear differential equations with constant coefficients whichcan be written in matrix form as

dx

dt= Ax where x (0) is given. (7.57)

106

In this equation, x is a column vector with components x1(t),..., xn(t) and A is an n × nmatrix. By analogy with the case where A is a scalar, the solution of this system is writtenas

x(t) = exp(At)x(0) (7.58)

This matrix exponential is interpreted in terms of a power series, namely

exp(At) = I + At+A2t2

2!+

A3t3

3!+ ... (7.59)

We can use Laplace transforms to find an explicit closed form for exp(At) when a specificmatrix is involved. For example suppose that

A =

(

0 1−2 −3

)

(7.60)

Taking the Laplace transform of the differential equation, we see that the Laplace transformX(s) of x(t) is given by X(s) = (sI−A)−1x(0). The Laplace transform of exp(At) is thus

(sI−A)−1 =

(

s −12 s+ 3

)−1

(7.61)

=1

s2 + 3s+ 2

(

s+ 3 1−2 s

)

(7.62)

Taking the inverse Laplace transform shows that

exp(At) =

(

2e−t − e−2t e−t − e−2t

−2e−t + 2e−2t −e−t + 2e−2t

)

(7.63)

An alternative way of calculating the matrix exponential involves the use of the eigenvaluesand eigenvectors of the matrix A. It is easy to check that the eigenvalues of A are −1 and

−2 with corresponding eigenvectors(

1−1

)

and(

1−2

)

. This means that we can write

A = VDV−1 =

(

1 1−1 −2

)(

−1 00 −2

)(

2 1−1 −1

)

(7.64)

where V is the matrix whose columns are the eigenvectors of A and D is the diagonal matrixof eigenvalues. For any power of A it is clear that An = (VDV−1)n = VDnV−1 and Dn

is simply found by raising the diagonal elements to the n’th power. From the power seriesdefinition we see that

exp(At) = V exp(Dt)V−1 =

(

1 1−1 −2

)(

e−t 00 e−2t

)(

2 1−1 −1

)

=

(

2e−t − e−2t e−t − e−2t

−2e−t + 2e−2t −e−t + 2e−2t

)

(7.65)

This coincides with the form found above. The second method fails if the matrix A isdefective, i.e., if the eigenvectors of A do not span the entire space.

107

8

Wave Propagation and Fourier Optics

Fourier optics describes propagation of light in optical systems using Fourier transform tech-niques. These techniques are useful since many operations are linear and spatially shift-invariant. They form the basis for analyzing and designing optical imaging and computationsystems.

8.1 Propagation of light in the paraxial approximation

Although the classical wave description of light is as a transverse electromagnetic wave, manyeffects can be studied using a scalar rather than the full vector wave equation. In free space,we have

∂2ψ

∂x2+∂2ψ

∂y2+∂2ψ

∂z2=

1

c2∂2ψ

∂t2(8.1)

In this equation ψ represents a component of the electric or magnetic field. For monochro-matic, coherent light, we can write

ψ(x, y, z, t) = ψ(x, y, z, 0)e−jωt. (8.2)

Substituting this into the wave equation, we obtain Helmholtz’s equation

∂2ψ

∂x2+∂2ψ

∂y2+∂2ψ

∂z2= −k2ψ, (8.3)

where ω/k = c.

Consider propagation which is nearly parallel to the z axis, so that

ψ(x, y, z, 0) = fz(x, y)ejkz, (8.4)

where fz(x, y) varies slowly with z. (Note, for example that for a plane wave travellingparallel to the z axis, fz(x, y) is constant).

Substituting (8.4) into Helmholtz’s equation (8.3) yields

ejkz

[

∂2fz

∂x2+∂2fz

∂y2+∂2fz

∂z2+ 2jk

∂fz

∂z− k2fz

]

= −k2fzejkz. (8.5)

109

The paraxial approximation neglects ∂2fz/∂z2 since fz is assumed to vary slowly with z. This

yields the paraxial wave equation

∂2fz

∂x2+∂2fz

∂y2+ 2jk

∂fz

∂z= 0 (8.6)

If we are given fz1(x, y), the solution of this partial differential equation will give the ampli-

tude distribution at z = z2.

8.2 Solving the paraxial wave equation

The paraxial wave equation is most easily solved by computing its two-dimensional Fouriertransform. The two-dimensional Fourier transform of a function g(x, y) is defined as

G(u, v) =

∫ ∞

−∞

∫ ∞

−∞g(x, y) exp[−j2π(ux+ vy)] dxdy (8.7)

The corresponding inverse Fourier transform is

g(x, y) =

∫ ∞

−∞

∫ ∞

−∞G(u, v) exp[+j2π(ux+ vy)] dudv (8.8)

Exercise: Prove that the above transforms are inverses, using the properties of the usualone-dimensional Fourier transform.

We now consider the paraxial wave-equation (8.6). Let Fz(u, v) denote the two-dimensionalFourier transform of fz(x, y) so that the Fourier transformed paraxial wave equation is

(j2πu)2Fz + (j2πv)2Fz + 2jk∂Fz

∂z= 0 (8.9)

or∂Fz

∂z=

(

2π2

jk

)

(u2 + v2)Fz(u, v) (8.10)

This may be integrated directly, yielding

Fz (u, v) = F0 (u, v) exp

[

− j2π2

k

(

u2 + v2)

z

]

(8.11)

Since the (two-dimensional) inverse Fourier transform of exp[−j2π2(u2 + v2)z/k] is

h(x, y) =1

jλzexp

[

jk

2z(x2 + y2)

]

(8.12)

where λ = 2π/k (check this!), we can take the inverse Fourier transform of the product in(8.11) to obtain the convolutional relationship

fz(x, y) = (f0 ∗ h)(x, y) (8.13)

110

or, writing out the convolution in full

fz (x, y) =1

jλz

∫ ∞

−∞

∫ ∞

−∞f0 (x0, y0) exp

[

jk

2z

(

(x− x0)2 + (y − y0)

2)

]

dx0dy0. (8.14)

,This is the paraxial diffraction integral. It states that the field amplitudes at the plane at zare related to those at the plane at z = 0 by a linear, shift-invariant filtering operation with“impulse response” h(x, y).

Let us now consider the physical interpretation of this result. The impulse response givesthe amplitude of the light on a plane at distance z away from a point source of light (e.g. apinhole in an opaque screen) located at the origin. If we put back the full dependence on zand t we find that in the paraxial approximation, a point source gives

ψ(x, y, z, t) =1

jλzexp

[

jk

2z

(

x2 + y2)

]

exp(jkz) exp(−jωt) (8.15)

The spatial dependence of the phase term is

exp

[

jkz

(

1 +x2 + y2

2z2

)]

(8.16)

On the other hand, by Huygen’s construction we expect it to be

exp[

jk√

x2 + y2 + z2]

(8.17)

However if z ≫ x, y (which is true in the paraxial approximation),

jk(x2 + y2 + z2)1

2 = jkz

(

1 +x2 + y2

z2

)

1

2

(8.18)

≈ jkz

[

1 +1

2

(

x2 + y2

z2

)

− 1

8

(

x2 + y2

z2

)2

+ . . .

]

(8.19)

and so the phasefronts calculated by the paraxial approximation are very close to the truespherical wavefronts.

Note that

• The 1/z factor in the amplitude represents spherical spreading (inverse square law forintensity).

• The additional phase term −j is actually present, although it is not predicted by asimple application of Huygen’s construction.

Exercises:

1. Show that if f0(x, y) = 1, then the diffraction formula predicts that fz(x, y) = 1 for allz. This means that a plane-wave propagating along the z axis is a solution.

111

2. Show that if f0(x, y) = exp(jky sin θ), the diffraction formula predicts that

fz(x, y) = exp(jky sin θ) exp

(

−jkz

2sin2 θ

)

.

Interpret this result physically, and compare it with the exact result (i.e., without theparaxial approximation).

8.3 Fresnel and Fraunhofer Diffraction

Whenever the paraxial diffraction integral is valid, the observer is said to be in the region ofFresnel diffraction. The quadratic terms in the exponential of the diffraction integral can berewritten as

exp

[

jk

2z

(

(x− x0)2 + (y − y0)

2)

]

= exp

[

jk

2z

(

x2 + y2)

]

exp

[

− jk

z(xx0 + yy0)

]

exp

[

jk

2z

(

x20 + y2

0

)

]

(8.20)Hence we may write

fz(x, y) =1

jλzPz (x, y)

∫ ∞

−∞

∫ ∞

−∞f0 (x0, y0)Pz (x0, y0) exp

[

−j2π

(

xx0 + yy0

zλ

)]

dx0dy0

(8.21)where

Pz(x, y) = exp

[

jk

2z

(

x2 + y2)

]

(8.22)

is a phase factor with unit modulus. Thus the paraxial diffraction integral for propagating afield through a distance z in free space may be interpreted as a sequence of three operations

• Multiplication of f0(x0, y0) by the phase factor Pz(x0, y0),

• Calculation of a two-dimensional Fourier transform with spatial-frequency variablesx/(zλ) and y/(zλ),

• Multiplication of the result by a further phase factor Pz(x, y).

In a diffraction experiment, we usually consider the field at the plane z = 0 to be non-zeroonly over a relatively small region, specified by the aperture or mask placed in that plane. Ifwe suppose that z is chosen to be so large that the phase factor Pz(x0, y0) ≈ 1 over the entireregion of the (x0, y0) plane in which f0(x0, y0) is non-zero. The equation (8.21) may then bewritten as

fz(x, y) =1

jλzPz (x, y)

∫ ∞

−∞

∫ ∞

−∞f0 (x0, y0) exp

[

−j2π

(

xx0 + yy0

zλ

)]

dx0dy0 (8.23)

In this regime, fz(x, y) is just the two-dimensional Fourier transform of f0(x0, y0) except fora multiplicative phase factor which does not affect the intensity of the diffracted light. Thisis called the Fraunhofer approximation. It is valid provided that z ≫ k(x2

0 + y20)max/2.

112

8.4 The diffraction grating

For simplicity, specialize to one dimension so that

fz(x, y) =1√jλz

exp

(

jkx2

2z

)∫ ∞

−∞f0(x0) exp

(

−j2πxx0

zλ

)

dx0. (8.24)

Consider a screen placed at z = 0 illuminated by a plane wave travelling along the z axis asshown in Figure 8.1

Figure 8.1: Diffraction configuration

If the transmission function of the screen is t(x0), this is also the amplitude of the fieldimmediately after the screen for an incident plane wave of unit amplitude. For a a diffractiongrating with 2N + 1 rectangular slits of width w separated by distance d,

f0(x0) = t(x0) =

(

N∑

k=−N

δ (x0 − kd))

∗Π(x0/w) (8.25)

where the asterisk denotes convolution and Π(x) denotes the “top hat” function which is onefor |x| ≤ 1

2 and zero otherwise. The Fourier transform of this is

F0 (u) =

1

d

∞∑

k=−∞

δ (u− k/d) ∗ (2N + 1) d sinc [(2N + 1) du]

w sinc (wu) (8.26)

= (2N + 1)w sinc (wu)

∞∑

k=−∞

sinc [(2N + 1) (du− k)] . (8.27)

The intensity of the Fourier diffraction pattern is

|fz(x)|2 =1

λz

∣

∣

∣F0

( x

λz

)∣

∣

∣

2. (8.28)

Figure 8.2 is a plot of F0(u). The diffraction pattern consists of bright lines positioned atxn = nλz/d. Each line is a sinc function whose first zero is at λz/[(2N + 1)d] away from thepeak. Hence if N is large, these lines are very sharp. If the incoming light consists of manywavelengths, these components are separated spatially by the grating.

113

Figure 8.2: Fraunhofer diffraction amplitude

The overall envelope of the diffraction pattern which determines the amount of light diffractedinto each order of the spectrum is determined by the width of each slit. Gratings can bedesigned to diffract most of the light into a particular order of the spectrum or to suppressunwanted orders.

Exercises:

1. Show that for red light (λ ≈ 600 nm) and an aperture width of 1 cm, we require z ≫120m to satisfy the Fraunhofer approximation.

2. Consider a sinusoidal amplitude grating illuminated by a plane wave travelling alongthe z axis for which the transmission function of the grating is

t(x, y) =1

2[1 +m cos (2πf0x)]Π(x/l)Π(y/l),

where Π(x) is the top-hat function which is unity if |x| < 12 . Show that the intensity of

the Fraunhofer diffraction pattern at z is given by

I(x, y) =

[

l2

2λz

]2

sinc2

(

ly0

λz

)

×

sinc

(

lx0

λz

)

+m

2sinc

(

l (x0 + f0λz)

λz

)

+m

2sinc

(

l (x0 − f0λz)

λz

)2

Sketch the form of I(x, y).

3. Compute the Fraunhofer diffraction pattern of a circular aperture of diameter d illu-minated by a plane wave along the z axis. Find where the first zero of the diffractionpattern occurs.

You may find the following identities useful

J0(z) =1

π

∫ π

0cos(z cos θ) dθ

∫ z

0tJ0(t) dt = zJ1(z).

114

8.5 Fresnel diffraction – numerical calculation

Numerical techniques for calculating Fresnel diffraction patterns are often the only feasiblemethods for practical problems. The two analytically equivalent methods turn out to beuseful in different regimes.

8.5.1 The convolutional approach

The paraxial diffraction integral is essentially a convolutional relationship

fz(x, y) =1

jλz

∫ ∞

−∞

∫ ∞

−∞f0(x0, y0) exp

[

jk

2z

(

(x− x0)2 + (y − y0)

2)

]

dx0 dy0. (8.29)

This convolution can be calculated by multiplying together the Fourier transforms

Fz(u, v) = F0(u, v) exp

[

− j2π2

k(u2 + v2)z

]

(8.30)

The following block diagram shows the steps involved in the computation. This is useful forsmall z since the variation in the phase term would cause aliasing if the change in 2π2(u2 +v2)z/k is too great between adjacent sample points. As z → 0, we see that fz(x, y)→ f0(x, y)which corresponds to the “geometrical shadow” of the diffracting obstacle.

FourierTransform

× exp[

− j2π2

k(u2 + v2)z

] InverseFourier

Transform

- - - -F0(u, v) Fz(u, v)

fz(x, y)f0(x, y)

8.5.2 The Fourier transform approach

By expanding the quadratic phase term, we saw that the paraxial diffraction integral can bewritten as

fz(x, y) =1

jλzPz (x, y)

∫ ∞

−∞

∫ ∞

−∞f0 (x0, y0)Pz (x0, y0) exp

[

−j2π

(

xx0 + yy0

zλ

)]

dx0 dy0

(8.31)

where

Pz(x, y) = exp

[

jk

2z

(

x2 + y2)

]

(8.32)

The block diagram shows the steps involved in the computation. This is useful for large zsince the phase terms change slowly when z is large. As z → ∞, we see that fz(x, y) tendsto the Fourier transform of f0(x, y) which is the usual result for Fraunhofer diffraction.

× exp[

jk2z

(x2 + y2)]

FourierTransform

× 1jλz

exp[

jk

2z(x2 + y2)

]

- - - - fz(x, y)f0(x, y)

115

8.6 Matlab code for computing Fresnel diffraction patterns

The following Matlab function computes a one-dimensional Fresnel diffraction pattern usingthe algorithms discussed above. A fast Fourier transform is used to compute a discreteapproximation to the Fourier transform.

function [f1,dx1,x1]=fresnel(f0,dx0,z,lambda)

%

% [fz,dx1,xbase1]=fresnel(f0,dx,z,lambda)

% computes the Fresnel diffraction pattern at distance

% ’’z’’ for light of wavelength ’’lambda’’ and an input

% vector ’’f0’’ (must have a power of two points)

% in the plane z=0, with intersample distance ’’dx0’’.

% Returns the diffraction pattern in ’’f1’’, with

% intersample separation in ’’dx1’’, and a baseline

% against which f1 may be plotted as ’’x1’’.

%

N = length(f0); k = 2*pi/lambda;

%

% Compute the critical distance which selects between

% the two methods

%

zcrit = N * dx0^2/lambda;

%

if z < zcrit

%

% Carry out the convolution with the Fresnel

% kernel by multiplication in the Fourier domain

%

du = 1./(N*dx0);

u = [0:N/2-1 -N/2:-1]*du; % Note order of points for FFT

H = exp(-i*2*pi^2*u.^2*z/k); % Fourier transform of kernel

f1 = ifft( fft(f0) .* H ); % Convolution

dx1 = dx0;

x1 = [-N/2:N/2-1]*dx1; % Baseline for output

else

%

% Multiply by a phase factor, compute the Fourier

% transform, and multiply by another phase factor

%

x0 = [-N/2:N/2-1] * dx0; % Input f0 is in natural order

g = f0 .* exp(i*0.5*k*x0.^2/z); % First phase factor

G = fftshift(fft(fftshift(g))); % Fourier transform

du = 1./(N*dx0); dx1 = lambda*z*du;

x1 = [-N/2:N/2-1]*dx1; % Baseline for output

f1 = G .* exp(i*0.5*k*x1.^2/z); % Second phase factor

f1 = f1 .* dx0 ./ sqrt(i*lambda*z);

116

end

Figure 8.3 shows the results of applying this program to the calculation of the diffractionpattern of a single slit of width 1 mm illuminated with red light of wavelength 600 nm. A gridof 1024 points is used in which the slit occupies the central 50 points. The Fresnel diffractionpattern is calculated at distances of 0.01 m, 0.05 m, 0.2 m, 1 m and 2m.

−3 −2 −1 0 1 2 3

x 10−3

0

1

2

3

4

5

6

Position on screen (m)

Inte

nsity

z = 0.01 m

z = 0.05 m

z = 0.2 m

z = 1 m

z = 2 m

Figure 8.3: One-dimensional Fresnel diffraction from a slit

8.7 Fourier transforming and imaging properties of lenses

Consider a thin convex lens of focal length f . Plane waves incident on the lens parallel tothe axis are converted into spherical wavefronts which collapse onto the focus. The actionof the lens is to introduce a position-dependent phase shift. Consider this phase shift as afunction of x and y. The portion of the wavefront at a distance r from the centre must travelan additional distance

d ≈ r2

2f(8.33)

relative to the centre. This corresponds to a phase shift at (x, y) of

exp

[

−jk(

x2 + y2)

2f

]

(8.34)

117

relative to the centre. Thus the effect of a thin lens is to multiply f(x, y) by exp(−jk(x2 +y2)/(2f)).

(Note: In fact the phase change introduced is φ0 − k(x2 + y2)/(2f) since we have to retardthe centre rather than advance the edges.)

We thus see that:

• Propagation through space by a distance z leads us to convolve the amplitude fz(x, y)with the propagation Green’s function

1

jλzexp

[

jk(

x2 + y2)

2z

]

. (8.35)

• Passing through a thin lens of focal length f leads us to multiply the amplitude fz(x, y)by the transmission function of the lens.

8.7.1 Transparency placed against lens

Consider a transparency with amplitude transmission function t(x, y) placed against a thinconvex lens of focal length f . The transparency is illuminated by plane waves travelling alongthe z axis and we wish to calculate the image g(x, y) on a screen at distance f from the lens.The transformations undergone by the light may be represented by the block diagram:

× exp[

− jk(x2+y2)2f

]

∗ 1jλf

exp[

jk(x2+y2)2f

]

- - - g(x, y)t(x, y)

From the block diagram we immediately write down the expression for g(x, y). This is

g (x1, y1) =1

jλf

∫ ∞

−∞

∫ ∞

−∞t (x0, y0) exp

[

−jk(

x20 + y2

0

)

2f

]

exp

jk(

(x1 − x0)2 + (y1 − y0)

2)

2f

dx0 dy0

=1

jλfexp

[

jk(

x21 + y2

1

)

2f

]

∫ ∞

−∞

∫ ∞

−∞t (x0, y0) exp

[

−j2π

(

x1x0 + y1y0

fλ

)]

dx0 dy0

=1

jλfexp

[

jk(

x21 + y2

1

)

2f

]

T

(

x1

fλ,y1

fλ

)

where t←→ T form a two-dimensional Fourier transform pair. This configuration allows usto achieve Fraunhofer diffraction conditions with relatively small object-to-screen distances.

118

8.7.2 Transparency placed in front of lens

Now consider the transparency with amplitude transmission function t(x, y) at a distance d1

in front of a thin convex lens of focal length f . If we place the screen at distance d2 behindthe lens, the block diagram for this configuration is

∗ 1jλd1

exp[

jk(x2+y2)2d1

]

∗ 1jλd2

exp[

jk(x2+y2)2d2

]

- - - g(x, y)t(x, y) × exp[

− jk(x2+y2)2f

]

-

From which we find

g (x2, y2) = − 1

λ2d1d2

∫ ∞

−∞

∫ ∞

−∞exp

jk(

(x2 − x1)2 + (y2 − y1)

2)

2d2

exp

[

jk(

x21 + y2

1

)

2f

]

×

(∫ ∞

−∞

∫ ∞

−∞t (x0, y0) exp

[

jk

2d1

(

(x1 − x0)2 + (y1 − y0)

2)

]

dx0 dy0

)

dx1 dy1

= − 1

λ2d1d2

∫ ∞

−∞

∫ ∞

−∞

∫ ∞

−∞

∫ ∞

−∞t (x0, y0) exp

[

jk

2

(

x20 + y2

0

d1

)

+

(

x22 + y2

2

d2

)

+

(

1

d2+

1

d1− 1

f

)

(

x21 + y2

1

)

− 2

(

x1x2 + y1y2

d2

)

− 2

(

x1x0 + y1y0

d1

)]

dx0 dy0dx1 dy1

Two interesting cases arise

1. d1 = d2 = f .

In this situation we can carry out the integrals over x1 and y1 to obtain

g(x2, y2) =1

jλf

∫ ∞

−∞

∫ ∞

−∞t (x0, y0) exp

[

−j2π

(

x0x2 + y0y2

fλ

)]

dx0 dy0 (8.36)

This is an exact two-dimensional Fourier transform.

2. 1/d1 + 1/d2 = 1/f .

Again we can carry out the integrations to obtain

g (x2, y2) =1

µt

(

x2

µ,y2

µ

)

exp

[

jk

2d2

(

x22 + y2

2

)

(

µ− 1

µ

)]

(8.37)

where µ = −d2/d1 is the magnification. Hence

|g(x2, y2)|2 =1

µ2

∣

∣

∣

∣

t

(

x2

µ,y2

µ

)∣

∣

∣

∣

2

(8.38)

The distribution of intensity on the screen is a magnified version of that of the transparencyfunction. Since µ is negative (if d1 > 0 and d2 > 0), the image is inverted and the sense ofthe coordinate system is reversed. The leading term 1/µ2 indicates that the intensity of theimage is inversely proportional to the square of the magnification.

Exercise: Carry out the manipulations which lead to the above results from the generalexpression for g(x2, y2).

119

8.7.3 Vignetting

Since a real lens has finite size, instead of multiplying the amplitude by exp[−jk(x2 +y2)/(2f)], it multiplies by p(x, y) exp[−jk(x2 + y2)/(2f)] where p(x, y) is called the pupil

function. This function is unity within the lens and zero outside.

Exercise: In the imaging configuration (1/d1 + 1/d2 = 1/f) show that if the pupil functionis not too small,

g(x2, y2) ≈ −1

λ2d1d2(t ∗ h)

(

x2

µ,y2

µ

)

exp

[

jk

2d2

(

x22 + y2

2

)

(

µ− 1

µ

)]

(8.39)

where

h(x, y) =

∫ ∞

−∞

∫ ∞

−∞p(x1, y1) exp

[

j2π

(

xx1 + yy1

d1λ

)]

dx1 dy1 (8.40)

Thus the lens smears each pixel in the original object into a region (called the point-spread

function) whose shape depends on the Fourier transform of the pupil function. For a largepupil, h(x, y) = d2

1λ2δ(x)δ(y) and so this simplifies to the previous result.

8.8 Gaussian Beams

Fresnel diffraction integrals are often difficult to solve analytically. However an importantspecial case is when fz(x, y) is Gaussian in x and y – these are called Gaussian beams.

Suppose that at z = 0, we have a Gaussian distribution of amplitude

f0(x, y) = A0 exp[−B0(x2 + y2)] (8.41)

To find the amplitude at z = z1, we convolve the amplitude with h(x, y), the Green’s functionfor free propagation, i.e.,

f1(x, y) = (f0 ∗ h)(x, y) = f0(x, y) ∗1

jλz1exp

[

jk

(

x2 + y2

2z1

)]

(8.42)

This may be evaluated using Fourier transforms. We see that

F0 (u, v) =πA0

B0exp

[

−π2(

u2 + v2)

B0

]

(8.43)

H (u, v) = exp

[

−j2π2z1k

(

u2 + v2)

]

(8.44)

Hence multiplying these together,

F1(u, v) =πA0

B0exp

[

−π2(

u2 + v2)

B0

(

1 +j2B0z1k

)

]

(8.45)

This can be written as

F1(u, v) =πA (z1)

B (z1)exp

[

−π2(

u2 + v2)

B (z1)

]

(8.46)

120

whose inverse Fourier transform (by analogy with (8.41) and (8.43)) is seen to be

f1(x, y) = A(z1) exp[−B(z1)(x2 + y2)] (8.47)

where we see that

B (z1) = B0

(

1 + j2B0z1k

)−1

(8.48)

A (z1) = A0

(

1 + j2B0z1k

)−1

Thus the Gaussian beam retains its Gaussian form as it propagates. However, even if A0 andB0 are real, the functions A(z1) and B(z1) become complex away from z = 0. We need toinvestigate the physical significance of this. The total field at z is

ψ(x, y, z, t) = A(z) exp[−B(z)(x2 + y2)] exp(jkz) exp(−jωt) (8.49)

Consider the width parameter B(z). We can split it into real and imaginary parts and write

B(z) = Br(z)− jBi(z) (8.50)

where the negative sign is used for convenience later. The exponentials in the expression forthe total field may be written

exp[−(Br − jBi)(x2 + y2)] exp(jkz) = exp[−Br(x

2 + y2)] exp[j(kz +Bi(x2 + y2))] (8.51)

From which it is clear that

1. The 1/e amplitude points of the Gaussian are at x2 + y2 = Br(z)−1. We thus define

the half-width of the beam to be

w(z) =1

√

Br (z)(8.52)

2. The surfaces of constant phase are where kz + Bi(x2 + y2) is a constant, or z =

constant−Bi(x2 + y2)/k. These wavefronts are curved, indicating that the beam is

converging or diverging. The radius of curvature of the wavefront is

R(z) =k

2Bi (z)(8.53)

If R > 0, this indicates that the beam is diverging, whereas if R < 0, this indicates thatthe beam is converging.

8.8.1 Gaussian beams in a homogeneous medium

From the way in which B(z) changes with z as the beam propagates, it is possible to determinehow the halfwidth w(z) and radius of curvature of the wavefronts R(z) change. Given that

B (z) = B0

(

1 + j2B0z

k

)−1

= B0

(

1− j2B0z

k

)/(

1 +4B2

0z2

k2

)

121

we see from the real part that

1

w (z)2= Br (z) =

B0

1 + 4B20z

2/k2=

1/w20

1 + (z/z0)2 (8.54)

or

w(z)2 = w20

[

1 +

(

z

z0

)2]

(8.55)

where z0 = 12kw

20. This gives the evolution of the halfwidth which is readily seen to form a

hyperbola. The waist of the beam is at the position where w(z) is a minimum. This occurswhen B(z) is purely real. In the example, this occurs at z = 0 and the halfwidth of the waistis w0.

Similarly, from the imaginary part,

R(z) =k

2Bi (z)=k

2

[

1 + 4B20z

2/k2

2B20z/k

]

= z0

[

(z0z

)

+

(

z

z0

)]

(8.56)

This is the evolution of the radius of curvature of the wavefronts. At the waist (where B(z)is real), the radius of curvature is infinite, indicating that the wavefronts are planes. Faraway from the waist where z ≫ z0, we find that R(z) ≈ z. Thus the wavefronts becomeapproximately spherical far away from the waist and are centred about the waist. Theasymptotic slope of the w(z) graph is w0/z0 or 2/(kw0). The semi-divergence angle of thebeam is θ0 = tan−1(2/kw0). The minimum radius of curvature of the wavefronts occurs atz = z0 at which R(z0) = 2z0 and w(z0) =

√2 w0.

The evolution of the factor A(z) gives the amplitude and phase shift as the Gaussian beampropagates.

A(z) = A0

(

1 + j2B0z

k

)−1

= A0

(

1 + jz

z0

)−1

(8.57)

=A0

√

1 + (z/z0)2∠− tan−1(z/z0) (8.58)

From the amplitude term, we see that there are two regimes. When z ≪ z0, the amplituderemains fairly constant at A0. In this region the beam is said to be well-collimated. On theother hand when z ≫ z0, the amplitude falls as A0z0/z which is characteristic of sphericalspreading (inverse square law in the intensity, amplitude is inversely proportional to distance).The phase shift varies from zero in the well-collimated region to−π/2 in the region of sphericalspreading. As the waist becomes narrower (w0 → 0), z0 is reduced, and θ0 increases. We geta rapid transition into the region of spherical spreading and a rapid divergence of the beam.The phase shift of −π/2 is consistent with the leading factor of −j in the paraxial diffractionintegral. We now see that it arises from the transition from planar wavefronts to sphericalwavefronts.

122

8.9 Tracing Gaussian beams through optical systems

A Gaussian beam travelling through a homogeneous medium along the z direction is com-pletely characterized by three real numbers

• The position of the waist, i.e., the value zw at which B(zw) is real,

• The half-width of the beam at the waist, w0 = 1/√

B(zw),

• The amplitude of the beam at the waist, A0.

Given these numbers, we can write down A(z), B(z) and the spatial dependence of the electricfield for the beam.

When we trace a gaussian beam through a system, we are usually more interested in thechange in the beam width parameter B(z) rather than in A(z). It is possible to devise acatalogue of transformations for various elements in a system. Two simple examples are:

1. Propagation through a homogeneous medium

If the waist of the beam is at zw,

B (z1) = B0

[

1 + j2B0 (z1 − zw)

k

]−1

(8.59)

B (z2) = B0

[

1 + j2B0 (z2 − zw)

k

]−1

(8.60)

From these, we can eliminate both B0 and zw to obtain

B (z2)−1 = B (z1)

−1 +2j

k(z2 − z1) (8.61)

This expresses B(z2) in terms of B(z1) and may be regarded as the rule for propagationof a Gaussian beam.

2. Passage through a thin convex lens of focal length f

The effect of the lens is to multiply fz(x, y) by exp[−jk(x2 + y2)/(2f)]. For a Gaussianbeam,

fz(x, y) = A(z) exp[−B(z)(x2 + y2)]

So after the lens, we have

A(z) exp

[

−(

B(z) +jk

2f

)

(x2 + y2)

]

(8.62)

In terms of the width parameter, a thin convex lens of focal length f turns

B (z−) into B (z+) = B (z−) +jk

2f(8.63)

If we are interested in tracking Gaussian beams through media of different refractiveindices, it is preferable to use a quantity proportional to B(z)/k rather than B(z). In

123

conventional treatments of Gaussian beams, the parameter q = −jk/(2B) is often used.Using this notation, the above relationships for free propagation and the effect of a lensbecome

q(z2) = q(z1) + (z2 − z1) (8.64)

and1

q (z+)=

1

q (z−)− 1

for q(z+) =

q (z−)

1− q (z−) /f(8.65)

These transformations of the q parameter take the form of ratios of two linear terms, namely

q2 =A1q1 +B1

C1q1 +D1(8.66)

If we consider two such transformations in cascade, namely the above together with

q3 =A2q2 +B2

C2q2 +D2(8.67)

we find that the combined transformation can be written

q3 =(A2A1 +B2C1) q1 + (A2B1 +B2D1)

(A1C2 + C1D2) q1 + (B1C2 +D1D2)(8.68)

This is also of the form of the ratio of two linear terms. The coefficients of the ratio areidentical to the result of multiplying together the two-by-two matrices

(

A2 B2

C2 D2

)(

A1 B1

C1 D1

)

=

(

A2A1 +B2C1 A2B1 +B2D1

A1C2 +C1D2 B1C2 +D1D2

)

(8.69)

It is thus possible to carry out the various substitutions required for tracing Gaussian beamsby multiplying together matrices. These matrices are in fact identical to the ABCD matricesused in geometrical optics as may be confirmed by considering the transformations for freepropagation and a thin lens given above. It is thus possible to use the matrix techniques forray optics and finally to interpret the result as applying to Gaussian beams which includesthe effects of diffraction!

Example: A Gaussian beam with half-width w0 at its waist is incident at its waist on a thinconvex lens with focal length f . Find the position and halfwidth of the waist of the outputbeam.

We simply trace the Gaussian beam through each element in turn, remembering that thewaist always occurs when B is real. Just before the convex lens, we have that B = 1/w2

0 . Asa result of passing through the lens,

B =1

w20

+ jk

2f(8.70)

Propagating this through free space for a further distance z yields

[B(z)]−1 =

[

1

w20

+ jk

2f

]−1

+2jz

k(8.71)

124

The waist occurs where B(zw) is real. i.e.,

−jk/ (2f)

w−40 + k2/ (4f2)

+ 2jzwk

= 0 (8.72)

or

zw =k2f

k2 + 4f2/w40

=f

1 + (f/z0)2 (8.73)

The half-width of the beam at the waist is

[B(zw)]−1/2 =2f/ (kw0)

√

1 + (f/z0)2

(8.74)

We see that for a small focused spot, we need to use a small value of f and a large value ofw0.

8.10 One-dimensional propagation in a dispersive medium

Consider a one-dimensional dispersive medium in which the wave velocity c depends on thewave frequency ν. At the position x = 0, we set up a disturbance f(0, t) within the mediumand wish to consider how this disturbance propagates through the medium in the positive xdirection. The disturbance in the the medium (for x > 0) is then denoted f(x, t).

The function f(0, t) may be written as a superposition of complex exponentials in terms ofits Fourier transform F (ν). We suppose that

f(0, t) =

∫ ∞

−∞F (ν) exp(j2πνt) dν (8.75)

Each of the complex exponential components has a well-defined frequency and propagateswithin the medium at a velocity appropriate to that component. Thus the component whichvaries as exp(j2πνt) at x = 0 propagates towards positive x as the wave exp [j2π(νt− x/λ)].

By linearity, we may write down an expression for the total disturbance

f(x, t) =

∫ ∞

−∞F (ν) exp

[

j2π(

νt− x

λ

)]

dν (8.76)

=

∫ ∞

−∞F (ν) exp(j2πνt) exp

(

− j2πx

λ

)

dν (8.77)

which is just the inverse Fourier transform relationship with each complex exponential re-placed by the contribution that this component would make throughout the medium.

We now consider the function λ−1 = ν/c(ν) which appears in the exponential above. Ina non-dispersive medium, we expect this simply to be proportional to ν. In a dispersivemedium however, the functional dependence will be more complicated. If the function f(0, t)represents a narrowband process containing only frequencies close to ν0, we can expandλ−1 = ν/c(ν) in a Taylor series around this centre frequency

1

λ=

ν

c (ν)= a+ b(ν − ν0) + d(ν − ν0)

2 + ... (8.78)

125

In the regime of linear dispersion we consider only the first two terms. Substituting this into(8.77) we find

f (x, t) =

∫ ∞

−∞F (ν) exp (j2πνt) exp (−j2π [a+ b (ν − ν0)]x) dν (8.79)

= exp (−j2π [a− bν0] x)

∫ ∞

−∞F (ν) exp (j2πν [t− bx]) dν (8.80)

= exp (−j2π [a− bν0] x) f (0, t− bx) (8.81)

where we have used the fact that f(0, t) ↔ F (ν) in the last equality. In order to interpretthis result physically, let us consider a prototypical narrow-band process around ν0. This hasthe form

f(0, t) = m(t) exp(j2πν0t) (8.82)

where m(t) is the baseband complex envelope which has a small bandwidth. The multiplica-tion by exp(j2πν0t) shifts the spectrum up to be around ν0.

(Note: In this analysis we work with a complex valued f(0, t) so that we can deal with anarrowband process around the single frequency ν0. In practice, for a real signal, we need toconsider frequencies around both ν0 and −ν0. The Taylor series expansion discussed abovewould only work around ν0, making it necessary to write another expansion around −ν0. Bythrowing away the negative frequencies and working with the analytic signal, we need onlyconsider a single Taylor expansion.)

Substituting (8.82) into (8.79) we find

f(x, t) = exp(−j2π[a−bν0]x)m(t−bx) exp(j2πν0[t−bx]) = m(t−bx) exp(j2π[ν0t−ax]) (8.83)

We see that if we observe the result at position x, the envelope m(t) is delayed (relative tothe initial excitation) by bx. The velocity at which the envelope moves through the mediumis thus 1/b. This is called the group velocity, denoted cg.

On the other hand, if we consider the high frequency oscillations beneath the envelope, thesepropagate at ν0/a. This is called the phase velocity, denoted cp. Unless ν0/a = 1/b, the phasevelocity and group velocity will be different. Referring back to (8.78), we can write downexpressions for a and b in terms of derivatives of ν/c(ν) at ν0. In particular,

a =ν0

c (ν0)(8.84)

b =d

dν

(

ν

c (ν)

)∣

∣

∣

∣

ν=ν0

=d

dν

(

1

λ

)

=dk

dω

where ω = 2πν and k = 2π/λ. The phase and group velocities are thus given by

cp = c(ν0) and cg =dω

dk(8.85)

Exercise: Show that

cg = c− λ dc

dλ(8.86)

Exercise: Calculate the phase velocity and group velocity (as a function of the particlemomentum) for the de Broglie wave of a non-relativistic particle of mass m and momentump which satisfies the relationships

p = ~k, E = ~ω and E =p2

2m(8.87)

126

How do these relate to the classical particle velocity?

Exercise: If we include the quadratic term in equation (8.78) show that the effect of propa-gating the narrowband signal m(t) exp(j2πν0t) through a distance x is to give

f(x, t) = (m ∗ q)(t− bx) exp(j2π[ν0t− ax]) (8.88)

where

q(t) =1√2jxd

exp

(

jπt2

2xd

)

(8.89)

This is called quadratic dispersion. Analytical results are available if m(t) is a Gaussian pulseso that the convolution with q is tractable.

127

9

Energy and Power, Random Signals and Noise

9.1 Energy and Power in Deterministic Signals

Consider a deterministic signal f(t). If we think of f(t) as a voltage, the instantaneous powerthat it develops across a 1 Ω resistor is

Pf (t) = |f(t)|2 (9.1)

We shall adopt this as our definition of the instantaneous power in a signal.

9.1.1 Signals of Finite Energy

The total energy in f(t) is then defined as the integral of Pf (t) over all time, i.e.,

Ef =

∫ ∞

−∞|f(t)|2 dt =

∫ ∞

−∞|F (ν)|2 dν (9.2)

where F (ν) is the Fourier transform of f(t) and we have used Rayleigh’s theorem in the lastequality.

Signals for which E is finite are said to be L2 functions or finite energy signals. All finiteenergy signals vanish as t→ ±∞.

We now wish to define the energy contained within a band of frequencies within a signal.For definiteness, consider the band of frequencies ν1 to ν2. If we consider an ideal band-passfilter with system function

H(ν) =

1 if ν1 ≤ ν ≤ ν2

0 otherwise(9.3)

we may think of passing f(t) through such a bandpass filter which only lets through frequen-cies within this frequency range and measuring the total energy in the output g(t) of such afilter. We find

Eg =

∫ ∞

−∞|G(ν)|2 dν =

∫ ∞

−∞|F (ν)H(ν)|2 dν

=

∫ ν2

ν1

|F (ν)|2 dν (9.4)

129

From this, we see that it makes sense to call |F (ν)|2 the energy spectral density of f(t) sinceits integral over the range ν1 to ν2 gives the energy contained in this frequency range.

The inverse Fourier transform of the energy spectral density is

∫ ∞

−∞|F (ν)|2 exp(j2πντ) dν =

∫ ∞

−∞

(∫ ∞

−∞f(t) exp(−j2πνt) dt

)∗

F (ν) exp(j2πντ) dν

=

∫ ∞

−∞f∗(t)

(∫ ∞

−∞F (ν) exp(j2πν(t+ τ)) dν

)

dt

=

∫ ∞

−∞f∗(t)f(t+ τ) dt (9.5)

This is called the energy auto-correlation function of f(t) and is denoted φeff (τ). In general,

we define the energy cross-correlation function of two deterministic finite energy signals f(t)and g(t) to be

φefg(τ) =

∫ ∞

−∞f∗(t)g(t+ τ) dt (9.6)

when the two functions are the same, this reduces to the energy auto-correlation function

φeff (τ) =

∫ ∞

−∞f∗(t)f(t+ τ) dt (9.7)

We have thus shown that the energy spectral density is the Fourier transform of the energyauto-correlation function, i.e.,

φeff (τ)↔ |F (ν)|2 = Φe

ff (ν)↔ φeff (τ) (9.8)

Exercise:

Show that the total energy is given by φeff (0) and that for all τ , |φe

ff (τ)| ≤ φeff (0). (Hint:

Use the Cauchy-Schwarz inequality).

9.1.2 Signals of Finite Power

Many signals of practical importance have infinite total energy. Periodic signals are simpleexamples of such signals. The Fourier transform of signals with infinite energy are notabsolutely square integrable and often contain generalized functions. Such signals do nothave energy spectral density functions. We define the average power in a signal to be

Pf = limT→∞

1

T

∫ T/2

−T/2|f(t)|2 dt (9.9)

Signals for which Pf is finite are said to have finite power.

For signals of finite power, we define power auto-correlation and power cross-correlationfunctions analogously to the energy correlation functions defined above. The power cross-

correlation function of two deterministic finite power signals f(t) and g(t) is defined to be

φpfg(τ) = lim

T→∞

1

T

∫ T/2

−T/2f∗(t)g(t + τ) dt (9.10)

130

similarly the power auto-correlation function of f(t) is

φpff (τ) = lim

T→∞

1

T

∫ T/2

−T/2f∗(t)f(t+ τ) dt (9.11)

From this definition, it is clear that the average power in f is given by φpff (0).

We next want to show that the Fourier transform of the power auto-correlation functioncan be interpreted as a power spectral density. In order to establish this result, we againconsider passing f(t) through an ideal bandpass filter and look at the average power in thefilter output. This can be done once we know how a linear filtering operation affects thecorrelation functions.

9.1.3 Linear Filters and Correlation Functions

Theorem: If a finite power signal f(t) is passed through a filter with impulse response h(t)yielding an output g(t) = (f ∗ h)(t), then

1. The cross-correlation of f and g and its Fourier transform are given by

φpfg(τ) = (φp

ff ∗ h)(τ) (9.12)

Φpfg(ν) = Φp

ff (ν)H(ν) (9.13)

2. The auto-correlation of g and its Fourier transform are given by

φpgg(τ) = (φp

ff ∗ h ∗ h)(τ) (9.14)

Φpgg(ν) = Φp

ff (ν)|H(ν)|2 (9.15)

where h is defined by h(t) = h∗(−t).

Proof:

1. From the definition,

φpfg(τ) = lim

T→∞

1

T

∫ T/2

−T/2f∗(t)g(t+ τ) dt

= limT→∞

1

T

∫ T/2

−T/2f∗(t)

(∫ ∞

−∞f(t+ τ − u)h(u) du

)

dt

=

∫ ∞

−∞

(

limT→∞

1

T

∫ T/2

−T/2f∗(t)f(t+ τ − u) dt

)

h(u) du

=

∫ ∞

−∞φp

ff (τ − u)h(u) du = (φpff ∗ h)(τ) (9.16)

The Fourier transform of this relationship yields Φpfg(ν) = Φp

ff (ν)H(ν).

131

2. Again from the definition, it is easy to show that

φpgg(τ) = (φp

fg ∗ h)(τ) (9.17)

Substituting the previous result gives φpgg(τ) = (φp

ff ∗ h ∗ h)(τ) and taking its Fourier

transform gives Φpgg(ν) = Φp

ff (ν)|H(ν)|2 since h(τ)↔ H∗(ν).

Note: This theorem also holds for signals of finite energy if all the power correlations φp arereplaced by energy correlations φe.

9.1.4 The Power Spectral Density

Let us now return to the situation of passing a finite power signal f(t) through an idealbandpass filter with system function H(ν) as defined in (9.3). As usual we denote the filterimpulse response by h(t) and the filter output by g(t) = (f ∗h)(t). The average power in g(t)is given by φp

gg(0). Evaluating this in terms of its Fourier transform,

φpgg(0) =

∫ ∞

−∞Φp

gg(ν) dν

=

∫ ∞

−∞Φp

ff (ν)|H(ν)|2 dν by the above theorem

=

∫ ν2

ν1

Φpff (ν) dν (9.18)

Thus the average power in the output is just the integral of Φpff (ν) in the range ν1 to ν2.

This justifies calling Φpff (ν) the power spectral density of f(t).

This leads to the following important general rule

The energy (power) spectral density is the Fourier transform of theenergy (power) auto-correlation function.

This result is known as the Wiener-Khinchin theorem. It also applies to signals with arandom component as will be discussed later.

9.2 Stochastic Processes

We now consider signals which are non-deterministic. These are useful in many applications,for example in describing noise in electrical systems, the random motion of a particle undergo-ing Brownian motion, the concentration of a certain chemical during the course of a reaction,the motion of particles as they diffuse, the intensity of light from a partially coherent sourceetc. These are generally called stochastic processes and are scalar or vector valued quantitieswhich evolve in time. Unlike a deterministic process for which there is a definite value forthe quantity at each instant of time, a stochastic process can only be described statistically.This is because we do not want a stochastic process just to describe a particular situationwhich occurs when an experiment is performed once but rather it should give informationabout the possible range of outcomes in many realizations of the same experiment.

132

9.2.1 The Concept of an Ensemble

Let us consider how we might describe the Johnson noise voltage generated in a resistorwhich is maintained at some nonzero temperature T . At a given instant of time, say att = 0, a given resistor will have a certain voltage across it but this value is not predictable.What we really want to give is a probabilistic description of the possible voltages acrossthe resistor at this time. To do this, it is useful to imagine a whole collection of resistors,all maintained under identical conditions and to look at the probability density formed byconsidering all their voltages at the same time t = 0. This collection is called an ensemble.The probability density functions that we talk about in stochastic processes always refer toprobabilities defined over such an ensemble. It may be useful to imagine a whole collectionof parallel universes in which a physical situation is set up under the same conditions (i.e.,identical to within the constraints specified by the problem) and to regard the ensemble asbeing composed of these “copies” of the system of interest. When we focus on a system ina particular universe, the process that it undergoes is said to be a single realization of thestochastic process.

Once we think about an ensemble of “potential realities”, it makes sense to regard the voltageat a given time, say x(0), as being a random variable. (From now on, we shall discontinuethe convention of using capital letters to distinguish random variables from the values thatthey can take.) We can then consider the probability density function for x(0) and statisticssuch as the mean and variance of this random variable. The notation p(x, 0) is often used forthe probability density of x at time t = 0. The mean and variance of x(0) are then given by

E [x (0)] =

∫ ∞

−∞xp(x, 0) dx (9.19)

E[

x (0)2]

=

∫ ∞

−∞(x− E [x (0)])2p(x, 0) dx (9.20)

=

∫ ∞

−∞x2p(x, 0) dx−

(∫ ∞

−∞xp(x, 0) dx

)2

(9.21)

The expectation value symbol E[·] is usually used to refer to an ensemble average. In a similarway we can define higher order moments of the random variable.

9.2.2 Characterizing a Stochastic Process

In general, characterizing a stochastic process is quite complicated. We have seen that ateach instant of time, the value of the stochastic process is a random variable. We thus needan infinite number of probability densities p(x, t) (one for each t) just to describe the processat each instant time. Further thought shows that even this infinite collection is not enough tofully describe the process. For example, we may be interested in the joint probability densityfor the voltage at two times t1 and t2. This will be specified by a probability density of theform p(x1, t1;x2, t2). These two-time probability densities tell us whether knowledge aboutthe process at time t1 gives us any knowledge about the process at time t2. These two-timeprobability densities form a doubly infinite set of joint probability densities. Similarly, wemay consider the joint probability function for the values of the process at n distinct times.All of these functions for all values of n are usually assumed to be sufficient to characterizethe process. Processes for which this is possible are said to be separable.

133

The need to specify such a large number of probability density functions to characterize ageneral stochastic process means that such processes are seldom studied. Instead, specialcases are considered in which all the joint probability functions can be calculated in termsof simpler quantities. As an example, if the values of the process at different times areknown to be independent, all joint probability densities factorize into a product of single-time probability densities.

9.2.3 Mean, auto-correlation and auto-covariance functions of astochastic process

Corresponding to the moments of a random variable, we define correlation functions of astochastic process. These functions are all defined in terms of ensemble averages over allpossible realizations of the process. The mean of the process is

µx(t) = E [x (t)] =

∫ ∞

−∞x p(x, t) dx (9.22)

Note that there is a mean for each t since we are not averaging over time but across theensemble for each time.

The two-time auto-correlation function is

φxx(t1, t2) = E [x (t1) x (t2)] =

∫ ∞

−∞

∫ ∞

−∞x1x2 p(x1, t1;x2, t2) dx1dx2 (9.23)

and the two-time auto-covariance function is

γxx(t1, t2) = E [(x (t1)− µx (t1)) (x (t2)− µx (t2))] (9.24)

Note: For complex-valued stochastic processes, we take the complex conjugate of the valueat t1 in the above definitions.

Similarly, the n-time auto-correlation function is

φxx...x(t1, t2, ..., tn) = E [x (t1)x (t2) . . . x (tn)] (9.25)

=

∫ ∞

−∞

∫ ∞

−∞...

∫ ∞

−∞x1x2...xn p(x1, t1;x2, t2; ...;xn, tn) dx1dx2 . . . dxn

and the n-time auto-covariance function is

γxx...x(t1, t2, ..., tn) = E [(x (t1)− µx (t1)) (x (t2)− µx (t2)) . . . (x (tn)− µx (tn))] (9.26)

The collection of the mean and all the auto-correlation functions are usually assumed tocompletely characterize the stochastic process. Note that all these functions are deterministiceven though they are calculated from the stochastic process.

9.2.4 Markov Processes

In physics, an important class of stochastic processes are the Markov processes. In a Markovprocess, the conditional probability for any event in the future given a set of values at times

134

t1 > t2 > ... > tn is the same as the conditional probability of that event given the value atthe most recent time t1. For example,

p(x, t|x1, t1;x2, t2; ...;xn, tn) = p(x, t|x1, t1) (9.27)

where t > t1 > ... > tn. In other words, knowledge of the process at the set of timest1, t2, ..., tn gives us no more useful information for predicting the future evolution of theprocess than the knowledge of the process at the most recent time t1.

As a simple example of a Markov process, consider counting the number of particles thata radioactive material has produced from some time origin up to a given time t. Since theprobability of decay per unit time is constant, our ability to predict the number of decays upto time t given that there have been n decays up to time t1 < t is completely independent ofthe history of decays up to t1. On the other hand, if the decays occur with a constant delaybetween successive events, this is not a Markov process. Given that there have been n decaysup to time t1, we would predict a different number of decays by time t depending on exactlywhen the last decay occurred.

For a Markov process, arbitrary time-ordered joint probabilities can be evaluated as a productof transition probabilities. For example, if t1 > t2 > t3,

p(x1, t1;x2, t2;x3, t3) = p(x1, t1|x2, t2;x3, t3) p(x2, t2|x3, t3) p(x3, t3)

= p(x1, t1|x2, t2) p(x2, t2|x3, t3) p(x3, t3) (9.28)

Thus a complete description of a Markov process involves the specification of an initial con-dition p(x0, t0) together with its transition probability density function p(x, t|x′, t′).

The transition probability density function for a Markov process cannot be specified com-pletely arbitrarily but must satisfy a consistency requirement called the Chapman-Kolmogorov

equation

p(x1, t1|x3, t3) =

∫ ∞

−∞p(x1, t1|x2, t2) p(x2, t2|x3, t3) dx2 (9.29)

for any t2 satisfying t1 > t2 > t3. The Chapman-Kolmogorov equation can be written asa differential equation for p(x, t|x′, t′). Special cases of this differential equation are knownas the Master equation, the Liouville equation and the Fokker-Planck equation which are ofgreat importance in the theory of stochastic processes (see for example Handbook of Stochastic

Methods by C.W. Gardiner).

9.2.5 Stationary, Wide-sense Stationary and Ergodic Processes

Another way in which a general stochastic process may be simplified is if its statistics areindependent of the choice of the time origin. Such a process is said to be stationary. Insteadof an infinity of one-time probability densities p(x, t), one for each t, a stationary process hasa single probability density p(x). The mean of a stationary random process is a single valuerather than a function of time. Similarly the n-time joint probability density satisfies

p(x1, t1;x2, t2; ...;xn, tn) = p(x1, t1 + u;x2, t2 + u; ...;xn, tn + u) (9.30)

135

for any u. The two-time auto-correlation and auto-covariance functions now depend on thetime-difference only and we may write, for example

φxx(τ) = E [x∗ (t)x (t+ τ)]

=

∫ ∞

−∞

∫ ∞

−∞x∗1x2p(x1, t;x2, t+ τ) dx1dx2 (9.31)

where the result is independent of t due to the stationarity of the process.

In order for a stochastic process to be stationary the auto-correlation functions of all ordersmust be independent of the time origin. If only the mean and the two-time auto-correlationfunction have this property, the process is said to be wide-sense stationary. In practice, it isusually only feasible to verify wide-sense stationarity.

Exercise: Show that a wide-sense stationary Markov process is in fact stationary.

For a stationary process, it is sometimes possible that ensemble averages can be replaced bytime averages over a single realization of the process. For example, calculating the mean of asingle realization by averaging over time may give the same answer as an ensemble average.Similarly we may try to use the definition of the power auto-correlation (9.11) for a singlerealization as a way of calculating the stationary auto-correlation (9.31). Processes for whichthis is possible are said to be ergodic. All ergodic processes are stationary but the converseis not always true. For many physical processes, we make the assumption that the processis ergodic despite the fact that it may not be possible to prove that this holds. As a simpleexample of a non-ergodic process, consider an ensemble consisting of a collection of batteriesand the process being the battery voltage. Measurements on a single battery as a functionof time gives no information about the statistical properties of the ensemble.

9.3 Power Spectrum of a Stationary Stochastic Process

For a stationary stochastic process x(t), we define the average power by

Px = E[

x (t)2]

(9.32)

Despite the fact that the right-hand side contains the variable t, it is in fact independent oft by stationarity. We similarly define the (stationary two-time) auto-correlation and cross-correlation functions by

φxx(τ) = E [x∗ (t)x (t+ τ)] (9.33)

φxy(τ) = E [x∗ (t) y (t+ τ)] (9.34)

Both of which are independent of t. It should be emphasized again that these correlationfunctions are deterministic and are a property of the whole ensemble - they do not vary fromone realization to another.

136

It is easy to see that if we pass a stationary stochastic process x(t) through a filter withimpulse response h(t) yielding y(t) = (x ∗ h)(t), the two-time cross-correlation of x and y is

φxy(t1, t2) = E [x∗ (t1) y (t2)]

= E

[

x∗(t1)

∫ ∞

−∞h(τ)x(t2 − τ) dτ

]

=

∫ ∞

−∞h(τ)E [x∗ (t1) x (t2 − τ)] dτ

=

∫ ∞

−∞h(τ)φxx(t2 − t1 − τ) dτ = (φxx ∗ h)(t2 − t1) (9.35)

where we adopt the convention that a two-time correlation function with only one argumentindicates an explicitly stationary correlation. Since φxy(t1, t2) is a function of t2 − t1 alone,it is in fact a stationary correlation and we can write

φxy (τ)=(φxx ∗ h) (τ) (9.36)

Similarly

φyy (τ)=(

φxy ∗ h)

(τ) =(

φxx ∗ h ∗ h)

(τ) (9.37)

where h(t) = h∗(−t). Following the same line of argument as was presented for powercorrelation functions of deterministic signals, it should be clear that the Fourier transformΦxx(ν) of the stationary auto-correlation function φxx(τ) is the power spectral density of thestochastic process x(t).

It is important to notice that the power spectrum of a stochastic process depends on thecorrelation between the noise values at different times and not on the probability densityfunction of the noise at a given time. In particular, it is not necessary that noise have aGaussian probability density at each time, although Gaussian noise is probably the mostcommon. If Gaussian noise is passed through any linear filter, the output is still Gaussiannoise but in general the spectrum of the output will be different from that of the input.

9.3.1 White Noise

A stationary noise process is said to be white if its power spectral density is constant. If thepower per unit bandwidth is N0 W m−2, this corresponds to having

Φxx(ν) =N0

2(9.38)

The factor of two is present because a range of frequencies fa to fb over which the noise poweris measured corresponds to the union of the intervals [fa, fb] and [−fb,−fa] in the two-sidedpower spectral density.

We see that the autocorrelation function of white noise is

φxx(τ) =N0

2δ(τ) (9.39)

137

Assuming that the process has zero mean, this shows that the values of the process at anytwo distinct times are uncorrelated. This represents an idealization since it implies thatthe instantaneous power φxx(0) is infinite so that the value of the process at each time canbe arbitrarily large. This should not be too surprising since a flat power spectral densitycorresponds to infinite power when integrated over all frequencies.

In practice the power spectral density always tends to zero for sufficiently high frequencies.The model of white noise is useful if we are only interested in a frequency range over whichthe power spectral density is approximately constant.

Sampling a white noise process directly is not meaningful since the variance of each sampleis infinite. For theoretical purposes, if we sample with a time interval of T , it is convenientto think of prefiltering the process using an ideal lowpass filter which cuts off at the Nyquistfrequency 1/(2T ). It can be shown that if zero-mean white noise with Φxx(ν) = N0/2 is fedinto such a filter, the samples taken at these times are uncorrelated random variables withzero mean and variance N0/(2T ). In particular, if the input noise is Gaussian, the samplevalues are independent identically distributed Gaussian random variables.

Exercise: By considering the impulse response of the ideal low-pass filter and the autocorre-lation of the output of the filter at lags kT for integer values of k, confirm that the samplesare uncorrelated and have the variance stated above.

9.4 Signals in Noise

We are often interested in observing known signal pulses embedded within a stationary noiseprocess. If x(t) is the deterministic signal and n(t) represents a realization of a noise process,the received signal y(t) satisfies

y(t) = x(t) + n(t) (9.40)

There are many ways of defining the signal to noise ratio (SNR) depending on the applica-tion, but for the purposes of signal detectability (i.e., seeing the signal within the noise), aconvenient definition is

SNR =Peak signal power

Total noise power(9.41)

For example, if x(t) is a toneburst with amplitude A, the peak signal power is A2 and if thenoise power spectral density is Φnn(ν) the total noise power is its integral over all frequenciesof interest.

9.4.1 Matched filtering

We wish to investigate what is the “best” filter to pass y(t) through if we wish to optimizethe detectability of the (known) signal x(t) in the midst of noise n(t) of known power spectraldensity Φnn(ν). We shall assume that we are not interested in preserving the shape of x(t)but wish to maximize the signal to noise ratio as defined above.

Let us suppose that the filter through which y(t) is passed has impulse response h(t). Usingthe usual convention that the function denoted by an upper-case letter refers to the Fourier

138

transform of the function denoted by the lower-case letter, the filter output g(t) is given by

g(t) =

∫ ∞

−∞H(ν)Y (ν) exp(j2πνt) dν (9.42)

We shall design the filter to give a maximum output at some time t = t0. Since y is composedof signal x and noise n, we consider the effects on each of these separately. The peak signalpower in the output is given by the peak of the filtered signal, i.e.,

Peak signal power =

∣

∣

∣

∣

∫ ∞

−∞H(ν)X(ν) exp(j2πνt0) dν

∣

∣

∣

∣

2

(9.43)

The filtered noise has power spectral density Φnn(ν)|H(ν)|2. The total noise power in theoutput is

Total noise power =

∫ ∞

−∞Φnn(ν)|H(ν)|2 dν (9.44)

Thus the signal to noise ratio which we wish to maximize is

SNR =

∣

∣

∣

∫∞−∞H (ν)X (ν) exp (j2πνt0) dν

∣

∣

∣

∫∞−∞ Φnn (ν) |H (ν)|2 dν

(9.45)

To carry out this maximization, recall the Cauchy-Schwarz inequality which states that forany two functions U(ν) and V (ν),

∣

∣

∣

∣

∫ ∞

−∞U(ν)V (ν) dν

∣

∣

∣

∣

2

≤∫ ∞

−∞|U(ν)|2 dν

∫ ∞

−∞|V (ν)|2 dν (9.46)

where equality holds if and only if U(ν) = kV ∗(ν) where k is a constant. If we identify

U(ν) =√

Φnn(ν)H(ν) (9.47)

V (ν) =X (ν) exp (j2πνt0)

√

Φnn (ν)(9.48)

and substitute into the Cauchy-Schwarz inequality, we find after minor rearranging that

SNR =

∣

∣

∣

∫∞−∞H(ν)X(ν) exp(j2πνt0) dν

∣

∣

∣

2

∫∞−∞ Φnn(ν)|H(ν)|2 dν

≤∫ ∞

−∞

|X(ν)|2Φnn(ν)

dν (9.49)

We notice that the right hand side is independent of H(ν) and gives an upper bound for theattainable signal to noise ratio. The signal to noise ratio is maximized when equality holds.This happens if

√

Φnn(ν)H(ν) = k

[

X(ν) exp(j2πνt0)√

Φnn(ν)

]∗

(9.50)

or

H (ν) =kX∗ (ν) exp (−j2πνt0)

Φnn (ν)(9.51)

The maximum signal to noise ratio is then given by

(SNR)max =

∫ ∞

−∞

|X(ν)|2Φnn(ν)

dν (9.52)

139

Equations (9.51) and (9.52) define the Middleton-North matched filter which is of importancein the design of signal detection systems (e.g., radar and sonar). Notice how the amplitude ofthe frequency response of the filter |H(ν)| is large where |X(ν)| is large and Pnn(ν) is small.Intuitively, the filter lets through the signal and rejects the noise as well as it can.

Now consider the special case of white stationary noise for which Φnn(ν) = N0/2. In thiscase we find that

H(ν) =2k

N0X∗(ν) exp(−j2πνt0) (9.53)

and

(SNR)max =2

N0

∫ ∞

−∞|X(ν)|2 dν (9.54)

1. We see that the optimum signal to noise ratio is proportional to the signal energy andinversely proportional to the noise power spectral density.

2. The filter impulse response is the inverse Fourier transform of H(ν). We see that

h(t) =2k

N0x∗(t0 − t) (9.55)

This is just the signal x(t) conjugated, reversed in time and delayed by t0. The valueof k is arbitrary since it only affects the overall amplitude and not the signal to noiseratio.

3. As a result of passing the signal component through the matched filter, its shape ischanged. In fact

(x ∗ h)(t) =2k

N0(x(t) ∗ x∗(t0 − t)) =

2k

N0φe

xx(t− t0) (9.56)

Thus in the absence of noise, the output of the matched filter is a scaled and delayedversion of the energy autocorrelation function of the signal. The peak of this is at t0.

4. The matched filter can be implemented digitally using a shift register as a delay line tocarry out the convolution.

When choosing a signal for matched filtering, it is usually best to find one with a highlypeaked autocorrelation. If the time of arrival of the signal is not known (e.g. an echo froma target), the peak of the matched filter output may be used to give an estimate. A signalwhich is often used is a chirp which has a linearly swept carrier frequency.

ν = ν0 + µt (9.57)

The phase is the integral of the frequency with respect to time and so the form of the chirpcould be

x(t) = sin

[

2π

(

ν0t+1

2µt2)]

(9.58)

The following MatLab code illustrates matched filtering with a chirp signal. The functionfilter carries out the discrete convolution using a digital filter.

140

%

% MATCHED.M illustrates matched filtering with the Middleton-North filter

%

t = linspace(0,1,256);

s = sin(2*pi*(2*t+5*t.^2)); % Set up chirp

%

tbase = [0:1023]*(t(2)-t(1));

x = zeros(size(tbase));

x(1:length(s)) = s; % Put chirp at start of frame

h = conj(s(length(s):-1:1)); % Impulse response of matched filter

%

y = x + randn(size(x)); % Corrupt signal with uncorrelated noise

%

% Show result with no noise

%

g = filter(h,1,x); % Matched filter

subplot(4,1,1); plot(tbase,x); % Clean signal

title(’Noise-free signal’);

subplot(4,1,2); plot(tbase,g); % Matched filter output

title(’Matched filtered output’);

%

% Matched filtering of noisy signal

%

g = filter(h,1,y); % Matched filter

subplot(4,1,3); plot(tbase,y); % Noisy signal

title(’Noisy signal’);

subplot(4,1,4); plot(tbase,g); % Matched filter output

title(’Matched filtered output’);

The results of this program are shown in Figure 9.1. The signal is clearly seen in the midstof the noise after applying the matched filter.

9.4.2 Coherent Averaging

A very simple yet powerful technique for improving the signal-to-noise ratio when investigat-ing the response of a time-invariant system to a signal is the process of coherent averaging.A repetitive signal is sent in time frames and the responses are added together coherently.

Suppose that the peak signal amplitude is A and the noise power is σ2. The signal to noiseratio for one frame is then

(SNR)1 frame = A2/σ2 (9.59)

As a result of adding together L frames, the signal amplitudes add because they are coherentwhereas the noise powers add because they are incoherent. The signal to noise ratio after Lframes is thus

(SNR)L frames = (LA)2/(Lσ2) = L× (SNR)1 frame (9.60)

141

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−1

−0.5

0

0.5

1Noise−free signal

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−100

−50

0

50

100

150Matched filtered output

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−4

−2

0

2

4Noisy signal

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5−50

0

50

100

150Matched filtered output

Figure 9.1: Middleton-North matched filter

If L can be made large (this usually depends on the timescale on which the system may beregarded as being time-independent), we can get substantial enhancement in the signal tonoise ratio.

The ratio of signal to noise energies in a time frame can be estimated by a process calleddouble-period averaging. If two frames are of duration T and contain y1(t) and y2(t) we findthe energies

E+ =

∫ T

0|y1(t) + y2(t)|2 dt (9.61)

E− =

∫ T

0|y1(t)− y2(t)|2 dt (9.62)

We assume that the contents of the frames can be written as a deterministic signal togetherwith noise which is different between the frames, i.e.,

y1(t) = x(t) + n1(t) and y2(t) = x(t) + n2(t) (9.63)

142

Then if Es denotes the signal energy and En the expected noise energy, we see that theexpected values of E+ and E− are

E+ = 4Es + 2En and E− = 2En (9.64)

HenceEs

En=

1

2

(

E+

E−− 1

)

(9.65)

This may then be used to estimate the signal to noise ratio.

143

linear systems and noise with applicationselec.otago.ac.nz/w/images/1/10/linearsystemsnoise.pdf ·...

Documents