discrete sines, cosines, and complex exponentials

Discrete sines, cosines, andcomplex exponentials

Alejandro Ribeiro

January 15, 2015

Sines, cosines, and complex exponentials play a very important role insignal and information processing. The purpose of this lab is to gain someexperience and intuition on how this signals look like and behave. Thesignals we consider here are discrete because they are indexed by a finiteand integer time index n = 0, 1, . . . , N � 1. The constant N is referred toas the length of the signal. Start by considering a separate integer numberk to define the discrete complex exponential ekN(n) of discrete frequency kand duration N as

ekN(n) =1pN

ej2pkn/N =1pN

exp(j2pkn/N). (1)

The (regular) complex exponential is defined as ej2pkn/N = cos(2pkn/N)+j sin(2pkn/N) so that if we compute the real and imaginary parts ofekN(n) we have that

Re (ekN(n)) =1pN

cos(2pkn/N),

Im (ekN(n)) =1pN

sin(2pkn/N). (2)

We say that the real part of the complex exponential is a discrete cosineof discrete frequency k and duration N and that the imaginary part is adiscrete sine of discrete frequency k and duration N. The discrete fre-quency k in (2) determines the number of oscillations that we see in the Nelements of the signal. A sine, cosine, or complex exponential of discretefrequency k has a total of k complete oscillations in the N samples.

Mathematically speaking, the complex exponential, the sine, and thecosine are all different signals. Intuitively speaking, all of them are oscilla-tions of the same frequency. Since complex exponentials have imaginary

1

parts, they don’t exist in the real world. Nevertheless, we work with thembecause they are instead of sines and cosines because they are easier tohandle.

1 Signal generation

Let us begin by generating and displaying some complex exponentialsand to use the generated signals to explore some important propertiesthat these signals have.

1.1 Generate complex exponentials1.1 Generate complex exponentials. Write a Matlab function that takesas input the frequency k and signal duration N and returns three vectorswith N components containing the elements of the signal ekN(n) definedin (5), as well as its real and imaginary parts [cf. (2)]. Plot the real andimaginary components for N = 32 and different values of k. Observe thatsome of these signals don’t look much like oscillations. In your report,show the plots for k = 0, k = 2, k = 9, and k = 16.

1.2 Equivalent complex exponentials1.2 Equivalent complex exponentials. Use the code in Part 1.1 to gen-erate complex exponentials of the same duration and frequencies k andl that are N apart. E.g., Make N = 32 and plot signals for frequenciesk = 3 , k = 3 + 32 = 35, k = 3 � 32 = �29. You should observe that thesesignals are identical.

1.3 Conjugate complex exponentials1.3 Conjugate complex exponentials. Use the code in Part 1.1 to gener-ate complex exponentials of the same duration and opposite frequenciesk and �k. E.g., Make N = 32 and plot signals for frequencies k = 3 andk = �3. You should observe that these signals have the same real partand opposite imaginary parts. We say that the signals are conjugates ofeach other.

1.4 More conjugate complex exponentials1.4 More conjugate complex exponentials. Consider now frequenciesk and l in the interval [0, N � 1] such that their sum is k + l = N. Tothink about this relationship, order the frequencies from k = 0 to k = Nand start walking up the chain from k = 0, to k = 1, to k = 3, and soon. Likewise, start walking down the chain from l = N, to l = N � 1, tol = N � 2 and so on. When you have taken the same number of stepsin either direction you have that k + l = N. Given your observations inparts 1.2 and 1.3 you should expect these signals to be conjugates of eachother. Verify your expectation with, e.g., k = 3 and l = 32 � 3 = 29.

2

We consider now the energy of complex exponentials and the inner prod-ucts between complex exponentials of different frequencies. Given twosignals x and y of duration N, their inner product is defined as

hx, yi :=N�1

Ân=0

x(n)y(n). (3)

The energy of a signal is defined as the inner product of the signal withitself kxk2 := hx, xi. If we write the signals x and y as vectors x =[x(0), . . . , x(N � 1)]T and y = [x(0), . . . , x(N � 1)]T , the inner product issimply written as the product xTy = yTx and the energy as the productxTx. We say that a signal is normal, if it has unit energy, i.e., if kxk2 = 1.We say that two signals are orthogonal if their inner product is null, i.e.,if hx, yi. Orthogonality looks like an innocent property, but it is nothinglike that. It is one of the most important properties that a group of signalscan have.

1.5 Orthonormality1.5 Orthonormality. Write a function to compute the inner producthekN , elNi between all pairs of discrete complex exponentials of lengthN and frequencies k = 0, 1, . . . , N � 1. Run and report your result forN = 16. You should observe that the complex exponentials have unitenergy and are orthogonal to each other. When this happens, we say thatthe signals form an orthonormal set.

2 Analysis

The numerical experiments of Part 1 pointed two properties that discretecomplex exponentials have that are very important for subsequent anal-yses. In this section we study these properties analytically. We first workon the observation that when we consider frequencies k and l that areN apart, the complex exponentials may have formulas that look differentbut are actually equivalent.

2.1 Equivalent complex exponentials2.1 Equivalent complex exponentials. Consider two complex exponen-tials ekN(n) and elN(n) as given by the definition in (5). Prove that ifk � l = N the signals are equivalent. I.e. that ekN(n) = elN(n) for alltimes n.

3

2.2 More equivalent complex exponentials2.2 More equivalent complex exponentials. Use the result in Part A toshow that the same is true not only when k � l = N but also wheneverk � l 2 N is a multiple of N.

The second fundamental property that we want explore is that when wehave two complex exponentials that are not equivalent their inner productis null,

hekN , elNi :=N�1

Ân=0

ekN(n)e⇤lN(n). = 0 (4)

We observed that this was true in Part 1.5 for some particular examples.We will now prove that it is true in general.

2.3 Orthogonality2.3 Orthogonality. Consider two complex exponentials ekN(n) and elN(n)that are not equivalent, i.e., for which the difference k � l /2 N is not amultiple of N. Prove that the signals are orthogonal to each other.

2.4 Orthonormality2.4 Orthonormality. Prove that complex exponentials have unit normkekN(n)k = hekN(n), ekN(n)i = 1. The combination of this fact with theorthogonality proven in Part 2.3 means that a set of N consecutive com-plex exponentials form an orthonormal set. Explain this statement.

The statements that we derived above are for a specific sort of discretecomplex exponential. We can write more generic versions if we do notrestrict the discrete frequency k to be discrete or if we shift the argumentof the complex exponential. When performing these operations it is in-teresting to ask if the equivalence properties of parts 2.1 and 2.2 and theorthogonality properties of parts 2.3 and 2.4 hold true.

2.5 Phase shifts2.5 Phase shifts. Let f 2 R be an arbitrary given number that we call aphase shift. We define a shifted complex exponential by subtracting theshift from the exponent in (5)

ekN(n � f) =1pN

ej2pkn/N�f =1pN

exp(j2pkn/N � f). (5)

The reason why subtracting f from (5) is called a shift, is because the fre-quency of the oscillation doesn’t change. It’s just that the oscillation getsshifted to the right. In this problem we consider discrete frequencies k 6= land a common shift f. Is there a condition to make complex exponentialsekN(n � f) and elN(n � f) of frequencies k and l equivalent? Is there acondition that guarantees that the complex exponentials ekN(n � f) andelN(n � f) are orthogonal?

4

2.6 Fractional frequencies2.6 Fractional frequencies. Lift the assumption that k in (5) is integerand consider arbitrary frequencies k, l 2 R. Is there a condition to makecomplex exponentials ekN(n) and elN(n) of frequencies k and l equiva-lent? Is there a condition that guarantees that the complex exponentialsekN(n) and elN(n) are orthogonal?

3 Generating and playing musical tones

Up until now we have considered discrete signals as standalone entities.However, discrete signals are most often used as representations of a con-tinuous signal that exists in the palpable – as opposed to virtual – world.To connect discrete signals to the physical world we define the samplingtime Ts as the time elapsed between times n and n + 1. Two ancillarydefinitions that follow from this one are the definition of the samplingfrequency fs = 1/Ts and the definition of the signal duration T = NTs.

To move from discrete to actual frequencies, say that we are given adiscrete cosine of frequency k and duration N with an associated sam-pling time of Ts seconds. We want to determine the frequency f0 of thatcosine. To do so, recall that a discrete cosine of frequency k has a totalof k oscillations in the N samples, which is the same as saying that it hasa total of k oscillations in T = NTs seconds. The period of the cosine istherefore N/k samples, which, as before, is the same as saying that it hasa period of T/k = NTs/k seconds. The frequency of the cosine is theinverse of its period,

f0 =kT

=k

NTs=

kN

fs (6)

Conversely, if we are given a cosine of frequency f0 Hertz that we wantto observe with a sampling frequency fs for a total of T = NTs = N/ fsseconds, it follows that the corresponding discrete cosine has discretefrequency

k = Nf0fs

. (7)

In explicit terms, we can use the definition in (2) with the discrete fre-quency in (7) to write the discrete cosine as

x(n) = cosh2pkn/N

i= cos

h2p[( f0/ fs)N]n/N

i(8)

5

Simplifying the signal durations N in (8) and recalling that Ts = 1/ fs, thecosine x(n) can be rewritten as

x(n) = cosh2p( f0/ fs)n

i= cos

h2p f0(nTs)

i(9)

The last expression in (9) is intuitive. It’s saying that the continuous timecosine x(t) = cos(2p f0t) is being sampled every Ts seconds during a timeinterval of length T = NTs seconds.

3.1 Discrete cosine generation3.1 Discrete cosine generation. Write down a function that takes asinput the sampling frequency fs, the time duration T and the frequencyf0 and returns the associated discrete cosine x(n) as generated by (9).Your function has to also return the number of samples N. When T isnot a multiple of Ts = 1/ fs you can reduce T to the largest multiple of Tssmaller than T.

3.2 Generate an A note3.2 Generate an A note. The musical A note corresponds to an oscilla-tion at frequency f0 = 440 Hertz. Use the code of part 3.1 to generate anA note of duration T = 2 seconds sampled at a frequency fs = 44, 100Hertz. Play the note in your computer’s speakers.

3.3 Generate musical notes3.3 Generate musical notes. A piano has 88 keys that can generate 88different musical notes. The frequencies of these 88 different musicalnotes can be generated according to the formula

fi = 2(i�49)/12 ⇤ 440. (10)

Modify the code of Part 3.1 so that instead of taking the frequency fi as anargument receives the piano key number and generates the correspond-ing musical tone.

3.4 Generate musical notes3.4 Generate musical notes. To play a song, you just need to play dif-ferent notes in order. Use the code in Part 3.3 to play a song that has atleast as many notes as Happy Birthday.

6

4 Time management

The formulation of the problems in Part 1 is lengthy, but their solutionsare straightforward. The goal is to finish that up during the Tuesday labsession. Try to get a head start in solving the problems. You may notsucceed, but thinking about them will streamline the Tuesday session.This should require just 1 more hour besides the lab.

The problems in Part 2 will take more time to complete. You shouldwait until after class on Wednesday morning to solve them. We will doparts 2.1, 2.2, 2.3, and 2.4 in class. I am asking that you report on them tomake sure that you understood them. They are very important propertiesto understand Fourier transforms. To solve parts 2.5 and 2.6 you have towork on your own, but the solutions are simple generalizations of earlierparts. You should be able to wrap this up in 3 hours, about 30 minutesfor each of the questions.

Part 4 is the one that will take more time because you have to put inplace your creativity and problem solving skills. If you are familiar withtones, beats, and know how to read music, this should take about 6 hoursto complete. If you don’t, part of being an engineer is being able to dosomething you don’t know how to do. It’ll take you a couple more ofhours to learn how to read Happy birthday.

7

Discrete Fourier transform (DFT)

Alejandro Ribeiro

March 4, 2015

Let x : [0, N � 1] ! C be a discrete signal of duration N and havingelements x(n) for n 2 [0, N � 1]. The discrete Fourier transform (DFT)of x is the signal X : Z ! C where the elements X(k) for all k 2 Z aredefined as

X(k) :=1pN

N�1

Ân=0

x(n)e�j2pkn/N =1pN

N�1

Ân=0

x(n) exp(�j2pkn/N). (1)

The argument k of the signal X(k) is called the frequency of the DFT andthe value X(k) the frequency component of the given signal x. WhenX is the DFT of x we write X = F (x). The DFT X = F (x) is alsoreferred to as the spectrum of x. Recall that for a complex exponential,discrete frequency k is equivalent to (real) frequency fk = (k/N) fs, whereN is the total number of samples and fs the sampling frequency. Wheninterpreting DFTs, it is often easier to consider the real frequency valuesinstead of the corresponding discrete frequencies.

An alternative form of the DFT is to realize that the sum in (1) isdefining the inner product between x and the complex exponential ekNwith elementsekN(n) = (1/

pN)ej2pkn/N . We can then write

X(k) := hx, ekNi. (2)

This latter expression emphasizes the fact that X(k) is a measure of howmuch the signal x resembles an oscillation of frequency k.

Because complex exponentials of frequencies k and k + N are equiva-

1

lent, it follows that DFT values X(k) and X(k + N) are equal, i.e.,

X(k + N) =1pN

N�1

Ân=0

x(n)e�j2p(k+N)n/N

=1pN

N�1

Ân=0

x(n)e�j2pkn/N

= X(k) (3)

The relationship in (3) means that the DFT is periodic with period N andwhat while it is defined for all k 2 Z, only N values are different. Forcomputational purposes we work with the canonical set of frequencies inthe interval k 2 [0, N � 1]. For interpretation purposes we work with thecanonical set of frequencies k 2 [�N/2, N/2]. This latter canonical setcontains N + 1 frequencies instead of N – frequencies N/2 and �N/2are equivalent in that X(N/2) = X(�N/2) – but it is used to have a setthat is symmetric around k = 0. Going from one canonical set to the otheris straightforward. The frequencies in the interval [0, N/2] are present inboth sets and to recover, e.g., the negative frequencies k 2 [�N/2,�1]from the positive frequencies [N/2, N � 1] we just use the fact that

X(�k) = X(N � k), for all k 2 [�N/2,�1] (4)

We say that the operation in (4) is a “chop and shift.” To recover the DFTvalues for the canonical set [�N/2, N/2] from the canonical set [0, N � 1]we chop the frequencies in the interval [N/2, N � 1] and shift them tothe from of the set. For the purposes of this homework, when you areasked to report a DFT, you should report the DFT for the canonical set[�N/2, N/2].

1 Spectrum of pulses

In this first part of the lab we will consider pulses and waves of differ-ent shapes to understand what information can be gleaned from spectralanalysis. A prerequisite for that is to have a function to compute DFTs.

1.1 Computation of the DFT1.1 Computation of the DFT. Write down a function that takes as inputa signal x of duration N and associated sample frequency fs and returnsthe values of the DFT X = F (x) for the canonical set k 2 [�N/2, N/2] aswell as a vector of frequencies with the real frequencies associated with

2

t

uM(n)

1/p

M

M � 1 N � 1

Figure 1. Unit energy square pulse of length T0 = MTs and duration T = NTs.The signal is constant for indexes n < M and null for other n. The height of thepulse is set to 1/

pM to have unit total energy.

each of the discrete frequencies k. Explain how to use the outcome of thisfunction to recover DFT values X(k) associated with frequencies in thecanonical set k 2 [0, N � 1].

With N samples and a sampling frequency fs the total signal duration isT = NTs. Given a length T0 = MTs < T, we define the unit energy squarepulse of time length T0, or, equivalently, discrete length M, as

uM (n) =1pM

if 0 n < M,

uM (n) = 0 if M n. (5)

Intuitively, pulses of shorter length are faster signals than pulse os longerlength. We will see that this rate of change information is captured by theDFT.

1.2 DFTs of square pulses1.2 DFTs of square pulses. Use the code in Part 1.1 to compute theDFT of square pulses of duration T = 32s sampled at a rate fs = 8Hzand different time lengths. You should observe that the DFT is moreconcentrated for wider pulses. Make this evaluation more quantitativeby computing the DFT energy fraction corresponding to frequencies fkin the interval [�1/T0, 1/T0]. Report your results for pulses of durationT0 = 0.5s, T0 = 1s, T0 = 4s, and T0 = 16s.

While it is true that wider pulses change more slowly, all square pulseshave, at some point, a high rate of change when they jump from x(M �1) = 1/

pM to x(M) = 0. We can construct a pulse that changes more

3

t

uM(n)

M/2 � 1

M2 �1 M

2M � 1 N � 1

Figure 2. Unit energy triangular pulse of length T0 = MTs and duration T =NTs. The signal is smoother, i.e., changes more slowly, than the square pulse ofequivalent length.

slowly by smoothing out the transition. One possibility is to define atriangular pulse by raising and decreasing its height linearly. Specifically,consider an even pulse length M and define the triangular pulse as

^M (n) = n if 0 n < M/2,^M (n) = (M � 1)� n if M/2 n < M,^M (n) = 0 if M n. (6)

Observe that, as defined in (6), the triangular pulse does not have unitenergy. In you comparisons below, you may want to scale the pulse nu-merically to have unit energy. To do so, you just have to divide the pulseby its norm, i.e., use ^M(n)/k ^M k instead of ^M(n).

1.3 DFTs of triangular pulses1.3 DFTs of triangular pulses. Consider the same parameters of Part1.2 and observe that, as in the case of square pulses, the DFT is more con-centrated for wider pulses. Make this observation quantitative by lookingat the DFT energy fraction corresponding to frequencies fk in the inter-val [�1/T0, 1/T0]. Report your results for pulses of duration T0 = 0.5s,T0 = 1s, T0 = 4s, and T0 = 16s. Compare your results with the resultsof Part 1.2. Is your observation consistent with the intuitive apprecia-tion that the triangular pulse changes more slowly than the square pulse?A qualitative explanation suffices for most people, but a good engineerwould provide a quantitative answer.

1.4 Other pulses1.4 Other pulses. We can define some other pulses with more concen-trated spectra. These pulses are also called windows and there is an

4

extensive literature on windows with appealing spectral properties. Findout about Parzen windows, raised cosine, Gaussian, and Hamming win-dows. Compare the spectra of these windows to the spectra of squareand triangular pulses.

2 Properties of the DFT

Our interest in the DFT is, mainly, as a computational tool for signal pro-cessing and analysis. For that reason, we will rarely be working on com-puting analytical expressions. There are, however, some DFT propertiesthat is important to understand analytically.In this part of the assignmentwe will work on proving three of these properties: conjugate symmetry,energy conservation, and linearity.

2.1 Conjugate symmetry2.1 Conjugate symmetry. Consider a real signal x, i.e., a signal with noimaginary part, and let its DFT be X = F (x). The DFT X is conjugatesymmetric,

X(�k) = X⇤(k) (7)

2.2 Energy conservation (Parseval’s Theorem)2.2 Energy conservation (Parseval’s Theorem). Let X = F (x) be theDFT of signal x and restrict the DFT X to a set of N consecutive frequen-cies. Prove that the energies of x and the restricted DFT are the same,

N�1

Ân=0

|x(n)|2 = kxk2 = kXk2 =N0+N�1

Âk=N0

|X(k)|2. (8)

The constant N0 in (10) is arbitrary.

2.3 Linearity2.3 Linearity. Prove that the DFT of a linear combination of signals isthe linear combination of the respective DFTs of the individual signals,

F (ax + by) = aF (x) + bF (y). (9)

In (9), both signals are of the same duration N – otherwise, the sumwouldn’t be properly defined.

The properties above are very important in the spectral analysis of sig-nals. We present below a fourth property that is not as important, butnevertheless worth knowing.

5

2.4 Conservation of inner products (Plancherel’s Theorem)2.4 Conservation of inner products (Plancherel’s Theorem) . Let X =F (x) be the DFT of signal x and Y = F (y) be the DFT of signal y. Restrictthe DFTs X and Y to a set of N consecutive frequencies. Prove that the in-ner products hx, yi between the signals and hX, Yi between the restrictedDFTs are the same,

N�1

Ân=0

x(n)y⇤(n) = hx, yi = hX, Yi =N0+N�1

Âk=N0

X(k)Y⇤(k) (10)

The constant N0 in (10) is arbitrary. Reserve the result in Part 2.2 as aparticular case of Plancherel’s Theorem.

3 The spectra of musical tones

In the first lab assignment we studied how to generate pure musical tones.To do so we simply noted that for sampling time Ts a cosine of frequencyf0 is generated according to the expression

x(n) = cosh2p( f0/ fs)n

i= cos

h2p f0(nTs)

i, (11)

where the index n varies from 0 to N � 1, which is equivalent to observ-ing the tone between times 0 and T = NTs. As already noted, the lastexpression in (11) is intuitive. It’s saying that the continuous time co-sine x(t) = cos(2p f0t) is being sampled every Ts seconds during a timeinterval of length T = NTs seconds.

Musical tones have specific frequencies. In particular, the A note cor-responds to a frequency of 440Hz, and the 49th key of a piano. The other88 basic notes generated by a piano have frequencies that follow the for-mula

fi = 2(i�49)/12 ⇤ 440. (12)

We have already used this knowledge to play a song using pure musicaltones. In this lab assignment, we will compute the DFT of the song weplayed and interpret the result.

3.1 DFT of an A note3.1 DFT of an A note. Generate an A note of duration T = 2 secondssampled at a frequency fs = 44, 100 Hertz. Compute the DFT of thissignal and verify that: (a) The DFT is conjugate symmetric. (b) Parseval’sTheorem holds. We know that the DFT of a discrete cosine is given bya couple of delta functions. The DFT of this A note, however, is close

6

to that but not exactly. Explain why and find a frequency or frequencyrange that contains at least 90% of the DFT energy. What can you changeto make the spectrum exactly equal to a pair of deltas?

3.2 DFT of a musical piece3.2 DFT of a musical piece. Concatenate tones to interpret a musicalpiece with as many notes as Happy Birthday. Compute the DFT of thispiece and identify the different musical tones in your piece.

3.3 Energy of different tones of a musical piece3.3 Energy of different tones of a musical piece. For each of the tonesidentified in Part 3.2, compute the total energy that the musical piececontains on the tone. Cross check that this energy is, indeed, the energythat you know should be there because of the number of times you playedthe note.

The rich sound of actual musical instruments comes from the fact thatthey don’t play pure tones, but multiple harmonics. A generic model fora musical instrument is to say that when a note is played it generates notonly a tone at the corresponding frequency but a set of tones at frequen-cies that are multiples of the base tone. To construct a model say that weare playing a note that corresponds to base frequency f0. The instrumentgenerates a signal that is given by a sum of multiple harmonics,

x(n) =H

Âh=1

ah cosh2ph f0(nTs)

i. (13)

In (13), H is the total number of harmonics generated by the instrumentand ah is the relative gain of the hth harmonic. The constants ah arespecific to an instrument. E.g., we can get a sound reminiscent of anoboe with H = 8 harmonics and gains ah given by the components of thevector:

a = [1.386, 1.370, 0.360, 0.116, 0.106, 0.201, 0.037, 0.019]T . (14)

Likewise, we can get something not totally unlike a flute with H = 5harmnics and gains

a = [0.260, 0.118, 0.085, 0.017, 0.014]T . (15)

A very quacky trumpet can be simulated with H = 13 harmonics havinggains

a =[1.167, 1.178, 0.611, 0.591, 0.344, 0.139,

0.090, 0.057, 0.035, 0.029, 0.022, 0.020, 0.014]T , (16)

7

and an even more quacky clarinet with H = 19 harmonics with gains

a =[0.061, 0.628, 0.231, 1.161, 0.201, 0.328, 0.154, 0.072, 0.186, 0.133,

0.309, 0.071, 0.098, 0.114, 0.027, 0.057, 0.022, 0.042, 0.023]T . (17)

We can use this harmonic decompositions to play songs with morerealistic sounds.

3.4 DFT of an A note of different musical instrumetns3.4 DFT of an A note of different musical instrumetns. Repeat Part 3.1for each of the 4 musical instruments described above.

3.5 DFT of an A note of different musical instrumetns3.5 DFT of an A note of different musical instrumetns. Repeat parts3.3 and 3.4 for one of the musical instruments described above. If youhave no favorite, choose the flute.

4 Time management

The problems in Part 1 are not straightforward but not too difficult. Thegoal is to finish that up during the Tuesday lab session. Try to get a headstart in solving the problems. You may not succeed, but thinking aboutthem will streamline the Tuesday session. This should require 2 morehour besides the lab.

The problems in Part 2 will take another couple hours to complete.You should wait until after class on Wednesday morning to solve them.We will do parts 2.1, 2.2 and 2.3 in class. I am asking that you report onthem to make sure that you understood them. To solve Part 2.4 you haveto work on your own, but the solution is a simple generalization of Part2.3. You should be able to wrap this up in 2 hours, about 30 minutes foreach of the questions.

Part 3 is the one that will take more time because you have to use yourproblem solving skills. It should take about 6 hours to complete. I wouldsay something like 4 hours for the first three parts and 2 more hours towrap up the pieces that simulate the wind instruments.

8

Inverse Discrete Fourier transform (DFT)

Alejandro Ribeiro

January 30, 2015

Suppose that we are given the discrete Fourier transform (DFT) X :Z ! C of an unknown signal. The inverse (i)DFT of X is defined as thesignal x : [0, N � 1] ! C with components x(n) given by the expression

x(n) :=1pN

N�1

Âk=0

X(k)ej2pkn/N =1pN

N�1

Âk=0

X(k) exp(j2pkn/N) (1)

When x is obtained from X through the relationship in (1) we writex = F�1(X). Recall that if X is the DFT of some signal, it must be peri-odic with period N. That means that in (1) we can replace the sum overthe frequencies k 2 [0, N � 1] but a sum over any other set of N consecu-tive frequencies. In particular, the iDFT of X can be alternatively writtenas

x(n) =1pN

N/2

Âk=�N/2+1

X(k)ej2pkn/N (2)

To see that (2) is correct, it suffices to note that X(k + N) = X(k) and thatej2p(k+N)n/N = ej2pkn/N to conclude that all of the terms that appear in(1) are equivalent to one, and only one, of the terms that appear in (2).

It is not difficult to see that taking the iDFT of the DFT of a signal xrecovers the original signal x. This means that the iDFT is, as it namesindicates, the inverse operation to the DFT. This result is of sufficientimportance to be highlighted in the form of a theorem that we state next.

Theorem 1 Given a discrete signal x : [0, N � 1] ! C, let X = F (x) : Z !C stand in for the DFT of x and x = F�1(X) : [0, N � 1] ! C be the iDFT ofX. We then have that x ⌘ x, or, equivalently,

F�1[F (x)] = x. (3)

1

Proof: Write down a proof of Theorem 1. ⌅The result in Theorem 1 is important because it tells us that a signal

x can be recovered from its DFT X by taking the inverse DFT. This im-plies that x and X are alternative representations of the same informationbecause we can move from one to the other using the DFT and iDFT op-erations. If we are given x we can compute X through the DFT and weare given X we can compute x through the iDFT.

An important practical consequence of this equivalence is that if weare given one of the representations, say the signal x, and the other one iseasier to interpret, say the DFT X, we can compute the respective trans-form and proceed with the analysis. This analysis will neither introducespurious effect, nor miss important features. Since both representationsare equivalent, it is just a matter of which of the representations makesthe identification of patterns easier. There is substantial empirical evi-dence that it is easier to analyze signals in the frequency domain – i.e.,the DFT X –, than it is to analyze signals in the temporal domain – theoriginal signal x.

1 Signal reconstruction and compression

A more mathematical consequence of Theorem 1 is that any signal x canbe written as a sum of complex exponentials. To see that this is truewe just need to reinterpret the equations for the DFT and iDFT. In thisreinterpretation, the components of the signal x can be written as [cf. (1)and (2)]

x(n) =1pN

N�1

Âk=0

X(k)ej2pkn/N =1pN

N/2

Âk=�N/2+1

X(k)ej2pkn/N (4)

with coefficients X(k) that are given by the formula [cf. equation (1) inlab assignment 2]

X(k) :=1pN

N�1

Ân=0

x(n)e�j2pkn/N (5)

This is quite a remarkable fact. We may have a signal that doesn’t lookat all like an oscillation, but it is a consequence of Theorem 1 that suchsignal can be written as a sum of oscillations.

It is instructive to rewrite (4) in a expanded form that makes the latterobservation clearer. To do so, consider the rightmost expression, write

2

the N summands explicitly and reorder the terms so that the terms cor-responding to positive frequency k and its opposite frequency �k appeartogether. Doing so and noting that frequencies k = 0 and k = N/2 haveno corresponding opposites, it follows that (4) is equivalent to⇣p

N⌘

x(n) = X(0) ej2p0n/N

+ X(1) ej2p1n/N + X(�1) e�j2p1n/N

+ X(2) ej2p2n/N + X(�2) e�j2p2n/N

......

......

+ X✓

N2� 1

◆ej2p( N

2 �1)n/N + X✓�N

2+ 1

◆e�j2p( N

2 �1)n/N

+ X✓

N2

◆ej2p( N

2 )n/N (6)

where we have multiplied both sides of the equality byp

N to simplify theexpression. Observe that the term that corresponds to frequency k = 0 issimply X(0)ej2p0n/N = X(0). We write the exponential part of this factorto avoid breaking the symmetry of the expression.

We can interpret (6) as a set of successive approximations of x(n) thatintroduce ever finer details in the form of faster signal variations. I.e., wecan choose to approximate the signal x by the signal xK which we defineby truncating the DFT sum to the first K terms in (6),

xK(n) :=1pN

K

Âk=0

X(k)ej2pkn/N + X(�k)e�j2pkn/N . (7)

The approximation that uses k = 0 only, approximates the signal x witha constant. The approximation that uses k = 0 and k = ±1 approximatesx with a constant and a single oscillation, the approximation that addsk = ±2, refines the signal by adding finer details in the form of a (morerapid) double oscillation. In general, when adding the kth frequencyand its opposite �k, we add an oscillation of frequency k that makesthe approximation closer to the actual signal. If we have a signal thatvaries slowly, a representation with just a few coefficients is sufficient.For signals that vary faster, we need to add more coefficients to obtain areasonable approximation.

Alternatively, if only gross details are important, we can eliminate thefiner irrelevant features by studying the approximated signal instead of

3

the original signal. This observation is related to our digression on theempirical value of the DFT as a tool for pattern identification. The repre-sentation of x as a sum of complex exponentials facilitates identificationof relevant time features that tend to correspond to variations that areslower than patterns. E.g., weather varies from day to day, but there isan underlying slower pattern that we call climate. Weather will mani-fest in the DFT coefficients for large frequencies and climate in the DFTcoefficients associated with slower frequencies. We can study climate byreconstructing a weather signal x with a small number of DFT coefficients.

In this part of the lab we will study the quality of the reconstructionof x with approximating signals xK as we increase K.

1.1 Computation of the iDFT

1.1 Computation of the iDFT. Consider a DFT X corresponding to areal signal of even duration N and assume that we are are given theN/2+ 1 coefficients corresponding to frequencies k = 0, 1, . . . , N/2. Writedown a function that takes these N/2 coefficients as input as well as theassociated sampling frequency fs and returns the iDFT x = F�1(X) ofthe given X. Return also a vector of real times associated with the signalsamples.

1.2 Signal reconstruction

1.2 Signal reconstruction. Suppose now that we are given the first K + 1coefficients of the DFT of a signal of duration N. Write down a functionthat returns the approximated signal xK with elements xK(n) as given in(7). The inputs to this function include the K + 1 coefficients, the signalduration N, and the sampling frequency fs. Return also a vector of realtimes associated with the signal samples. Given that you already solvedPart 1.1, it should take you less than a minute to solve this part.

1.3 Reconstruction of a square pulse

1.3 Reconstruction of a square pulse. Generate a pulse of durationT = 32s sampled at a rate fs = 8Hz and length T0 = 4s and computeits DFT. Use the function in Part 1.2 to create successive reconstructionsof the pulse. Compute the energy of the difference between the signals xand xK. This energy should decrease for increasing k. Report your resultsfor K = 2, K = 4, K = 8, and K = 16 K = 32. Repeat for a pulse of lengthT0 = 2s. Since this pulse varies faster, the reconstruction should be worse.Is that the case?

4

1.4 Reconstruction of a triangular pulse

1.4 Reconstruction of a triangular pulse. Generate a triangular pulseof duration T = 32s sampled at a rate fs = 8Hz and length T0 = 4s andcompute its DFT. Use the function in Part 1.2 to create successive recon-structions of the pulse. Compute the energy of the difference between thesignals x and xK. Report your results for K = 2, K = 4, K = 8, and K = 16K = 32. This pulse should be easier to reconstruct than the square pulse.Is that true?

1.5 The energy of the difference signal

1.5 The energy of the difference signal. In parts 1.3 and 1.4 you havecomputed the energy of the difference between the signals x and xK.Just to be formal, define the error signal rK as the one with componentsrK(n) = x(n)� xK(n). The energy you have computing is therefore givenby

krKk2 =N�1

Ân=0

��rK(n)��2 =

N�1

Ân=0

��x(n)� xK(n)��2. (8)

Using Parseval’s theorem, this energy can be computed from the valuesof the DFT coefficients that you are neglecting to include in the signalapproximation. Explain how this can be done, and verify that your nu-merical results coincide.

A square wave can be visualized as a train of square pulses pasted nextto each other. Mathematically, it is easier to generate a square wave bysimply taking the sign of a discrete cosine. Consider then a given fre-quency f0 and a given sampling frequency fs and define the square waveof frequency f0 as the signal

x(n) = signhcos

⇣2p( f0/ fs)n

⌘i. (9)

This signal can be reconstructed with a few DFT coefficients, but notwith the first K. To compress this signal well, we pick the K largest DFTcoefficients, which are not necessarily the first K. When reconstructingthe signal, we use a modified version of (7) in which we sum over thecoefficients that were picked during the compression stage.

1.6 Signal compression

1.6 Signal compression. Write down a function that receives as input asignal x of length N, the sampling frequency fs, and a compression targetK. The function outputs a vector with the K largest DFT coefficients andthe corresponding set of frequencies at which these coefficients are ob-served. Notice that each of the coefficients that is kept requires storage oftwo numbers, the coefficient and the frequency. This is disadvantageous

5

with respect to keeping just the first K coefficients. This more sophisti-cated compression is justified only if keeping these coefficients reducesthe total number of DFT coefficients by a factor larger than 2.

1.7 The why of signal compression

1.7 The why of signal compression. Why do we keep the largest DFTcoefficients? This question has a very precise mathematical answer thatfollows from Parseval’s Theorem. Provide that very precise answer. Youmay want to look at Part 1.5.

1.8 Signal reconstruction

1.8 Signal reconstruction. Write down a function that receives as inputthe output to the function in Part 1.6 and reconstructs the original signalx. Given that you already solved Part 1.1, and Part 1.2, it should take youless than a minute to solve this part.

1.9 Compression and reconstruction of a square wave

1.9 Compression and reconstruction of a square wave. Generate asquare wave of duration T = 32s sampled at a rate fs = 8Hz and fre-quency 4Hz. Compress and reconstruct this wave using the functions inparts 1.6 and 1.8. True different compression targets and report the en-ergy of the error signal for K = 2, K = 4, K = 8 and K = 16. This problemshould teach you that a square wave can be approximated better than asquare pulse if you keep the same number of coefficients. This shouldbe the case because the square wave looks the same at all points, but thesquare pulse doesn’t. Explain this statement.

2 Speech processing

The DFT, in conjunction with the iDFT can be used to perform somebasic speech analysis. In this part of the lab you will record your voiceand perform a few interesting spectral transformations.

2.1 Record, graph, and play your voice

2.1 Record, graph, and play your voice. Record 5 seconds of your voicesampled at a frequency fs = 20KHz. Plot your voice. Compute the DFTof your voice and plot its magnitude. Play it back on the speakers.

2.2 Voice compression

2.2 Voice compression. The 5 second recording of your voice at sam-pling frequency fs = 8KHz is composed of 100,000 samples. Use the DFTand iDFT to compress your voice by a factor of 2, i.e., store K = 50, 000numbers instead of 100,000, a factor of 4, (store K = 25, 000 numbers),a factor of 8 (store K = 12, 500 numbers), and so on. Keep compressinguntil the sentence you spoke becomes unrecognizable. You can performthis compression by keeping the first K DFT coefficients or the largestK/2 DFT coefficients. Which one works better?

6

2.3 Voice masking

2.3 Voice masking. Say that you and your partner speak the same sen-tence. The DFTs of the respective recording will be similar because it’sthe same sentence but also different, because your voices are different.You can use this fact to mask your voice by modifying its spectrum, i.e.,by increasing the contribution of some frequencies and decreasing thecontributions of others. Design a system to record your voice, make itunrecognizable but intelligible, and play it in the speakers.

As we saw in Part 1.9, it is easier to reconstruct a square wave than it isto reconstruct a square pulse. This happens because the pulse looks thesame at all points, while the pulse looks different at different points. Thissuggests a problem with approximating the 5 second recording of yourvoice, namely, that you are trying to use the same complex exponentialsto approximate different parts of your speech. You can overcome thislimitation by dividing your signal in pieces and compressing each pieceindependently.

2.4 Better voice compression

2.4 Better voice compression. Design a system that divides your speechin chunks of 100ms, and compresses each of the chunks by a given fac-tor g. Design the inverse system that takes the compressed chunks, re-constructs the individual speech pieces, stitches them together and playsthem back in the speakers. You have just designed a rudimentary MP3compressor and player. Try out for different values of g. Push g to thelargest possible compression factor.

3 Uncover a secret message

You teaching assistant will provide you on Tuesday with the Answer tothe Ultimate Question of Life, the Universe, and Everything. ENIAChas been working on this answer since the early hours of the evening ofValentine’s Day, 1946. Since this information is of a sensitive nature itwill be given in an audio message with a secret code that will make itsound like a fast paced Happy Birthday song. If you are able to decode themessage report it back Jeopardy style.

If you are a nerd, you will think that this is the coolest thing you havedone in your life. In that case, you don’t get points for this answer. If youare not a nerd, you will get 2 extra points on top of the four you are toget for the rest of the lab1.

1My lawyer just informed me that I am not allowed to ask your nerdal orientation, andthat in any event, I am not allowed to exhibiting any nerder bias. Fine, you get 2 pointseven if you’re a nerd.

7

4 Time management

The effort for this particular lab is evenly divided between parts 1 and 2.There is some overlap between the questions. If you do Part 1 properly,then Part 2 will be easier. Thus, the time split can be 4 and 6 hours or 6and 4 hours. Depending on the sort of person you are.

Do notice that some of the parts are conceptually simple but havefiner points that may make them difficult to implement. The teachingassistants will provide substantial help with these fine points.

Part 3 is for the fine of it, although the extra points are for real. If youget how to solve it, it’ll take you 5 minutes. If you don’t, it will take you 5years. In any event, don’t waste much time. Try something. If you don’tcrack it, ask around. Some of you will figure it out.

8

Fourier transform

Alejandro Ribeiro

February 9, 2015

The discrete Fourier transform (DFT) is a computational tool to workwith signals that are defined on a discrete time support and contain afinite number of elements. Time in the world is neither discrete nor fi-nite, which motivates consideration of continuos time signals x : R ! C.These signals map a continuous time index t 2 R to a complex valuex(t) 2 C. The signal values x(t) can be, and often are, real.

Paralleling the development performed for discrete signals, we definethe Fourier transform of the continuous time signal x as the signal X :R ! C for which the signal values X( f ) are given by the integral

X( f ) :=Z •

�•x(t)e�j2p f t

dt. (1)

The definition in (1) is different in form to the definition of the DFT, butit is conceptually analogous. Whatever intuition we have gained so far ondealing with the DFT of discrete signals extends more or less unchangedto the Fourier transform of continuous signals.

The statement above has a very deep meaning that will become clearonce we develop the theory of sampling. For the time being we can ob-serve that the DFT can be considered as an approximation of the Fouriertransform in which we start with N samples of x to obtain N samplesof X. To see that this is true. consider N samples of x, separated by asampling time T

s

, and extending between times t = 0 and t = NT

s

. TheRiemann approximation of the integral in (1) is then given by

X( f ) =Z •

�•x(t)e�j2p f t

dt ⇡ T

s

N�1

Âk=0

x(nT

s

)e�j2p f nT

s . (2)

The approximation above is true for all frequencies, but if we just consider

1

x Fourier transform

DFTsample ) T

s

sample ) f

s

N

X

x

X

Figure 1. The discrete Fourier transform provides a numerical approximation tothe Fourier transform.

the frequencies f = (k/N) f

s

for k 2 [�N/2, N/2] we can rewrite (2) as

X

✓k

N

f

s

◆⇡ T

s

N�1

Âk=0

x(nT

s

)e�j2p(k/N) f

s

nT

s = T

s

N�1

Âk=0

x(nT

s

)e�j2pkn/N .

(3)Except for constants, the rightmost side of (3) is the definition of the DFTof the discrete signal x with components x(n) = x(nT

s

). Indeed, the DFTX = F (x) of the discrete signal x has components

X(k) =1pN

N�1

Âk=0

x(n)e�j2pkn/N =1pN

N�1

Âk=0

x(nT

s

)e�j2pkn/N . (4)

Upon comparison of (3) and (4) we can conclude that the DFT X of thesampled signal x and the Fourier transform X of the continuous signal x

are approximately related by the expression

X(k) ⇡ 1T

s

pN

X

✓k

N

f

s

◆. (5)

The relationship in (5) allows us to approximate the Fourier transformof a signal with numerical operations, or, conversely, to conclude that aproperty derived for Fourier transforms is approximately valid for DFTsas well. The approximating relationship in (5) is represented schemati-cally in Figure 1. In this lab we will use (5) to verify numerically someformulas that we will derive analytically.

1 Computation of Fourier transforms

We define a Gaussian pulse of standard deviation s and average value µas the signal x with values x(t) given by the formula

x(t) = e

�(x�µ)2/(2s2). (6)

2

The standard deviation s controls the width of the pulse. Large s cor-responds to wide pulses and small s corresponds to narrow pulses. Themean value µ controls the location of the pulse on the real line.

1.1 Fourier transform of a Gaussian pulse

1.1 Fourier transform of a Gaussian pulse. Derive an expression forthe Fourier transform of the Gaussian pulse when µ = 0. You will haveto make use of the fact that the integral

Z •

�•xs(t) =

Z •

�•e

�x

2/(2s2) =p

2ps. (7)

1.2 Numerical verification

1.2 Numerical verification. Verify numerically that your derivation inPart 2.1 is correct. You will have to be careful with the selection of yoursampling time and sampling interval. Try the comparison for differentvalues of s. Report for s = 1, s = 2, and s = 4.

1.3 Fourier transform of a shifted Gaussian pulse

1.3 Fourier transform of a shifted Gaussian pulse. Derive an expres-sion for the Fourier transform of the Gaussian pulse for generic µ. Verifynumerically. The solution to this part is very easy once you have solvedPart 2.1.

2 Modulation and demodulation

An important property of Fourier transforms is that shifting a signal inthe time domain is equivalent to multiplying by a complex exponentialin the frequency domain. More specifically consider a given signal x andshift t and define shifted signal xt as

xt = x(t � t) (8)

The Fourier transform of x is denoted as X = F (x) and the Fourier trans-form of xt is denoted as Xt = F (xt). We then have that the followingtheorem holds true.

Theorem 1 A time shift of t units in the time domain is equivalent to mul-

tiplication by a complex exponential of frequency �t in the frequency domain

xt = x(t � t) () Xt( f ) = e

�j2p f tX( f ) (9)

3

x(t) x

g

(t)

e

j2pgt

f

X( f )

-W/2 W/2 f

X

g

( f )

g � W/2 g + W/2g

Figure 2. Modulation of a bandlimited signal. The bandlimited spectrum ofsignal x is re-centered at frequency g when the signal is multiplied by a complexexponential of frequency g.

These result has important applications, the most popular of whichis its use in signal detection. This application utilizes the fact that themoduli of X and Xt are the same, which allows the comparison of signalswithout worrying about the selection of the time origin.

A property that we can call dual of the result in Theorem 1 is thatmultiplying a signal by a complex exponential results in a shift in thefrequency domain. Specifically, for given signal x and frequency g, wedefine the modulated signal

x

g

(t) = e

j2pgt

x(t) (10)

We write the Fourier transform of x as X = F (x) and the Fourier trans-form of x

g

as X

g

= F (x

g

). We then have that the following theorem holdstrue.

Theorem 2 A multiplication by a complex exponential of frequency g in the

time domain is equivalent to a shift of g units in the frequency domain

x

g

(t) = e

j2pgt

x(t) () X

g

( f ) = X( f � g) (11)

Proof: Write down a proof of Theorem 2. I.e., prove that if x

g

(t) =e

j2pgt

x(t) we must have X

g

( f ) = X( f � g). ⌅Despite looking less interesting than the clain in Theorem 1, the result

in Theorem 2 is at least of equal importance because of its application inthe modulation and demodulation of bandlimited signals. To explain thisstatement better, we begin with the definition of a bandlimited signal thatwe formally introduce next.

4

Definition 1 The signal x with Fourier transform X = F (x) is said ban-

dlimited with bandwidth W if we have X( f ) = 0 for all frequencies f /2[�W/2, W/2].

An illustration of the spectrum of a bandlimited signal is shown inFigure 2, where we also show the result of multiplying x by a complexexponential of frequency g. When we do that, the spectrum is re-centeredat the modulating frequency g. Signals that are literally bandlimited arehard to find, but signals that are approximately bandlimited do exist. Asan example, we consider voice recordings.

2.1 Voice as a bandlimited signal

2.1 Voice as a bandlimited signal. Record 3 seconds of your voice ata sampling rate of 40kHz. Take the DFT of your voice and observe thatcoefficients with frequencies f > 4kHZ are close to null. Set these co-efficients to zero to create a bandlimited signal. Play your voice backand observe that the removed frequencies don’t affect the quality of yourvoice.

2.2 Voice modulation

2.2 Voice modulation. Take the bandlimited signal you created in Part2.1 and modulate it with center frequency g1 = 5kHZ.

2.3 Modulation with a cosine

2.3 Modulation with a cosine. The problem with modulating with acomplex exponential as we did in Part 2.2 is that complex exponentialsare signals with imaginary parts that, therefore, can’t be generated in areal system. In a real system we have to modulate using a cosine, or asine. Redefine then the modulated signal as

x

g

(t) = cos(2pgt)x(t), (12)

and let X

g

= F (x

g

) be the respective Fourier transform. Write down anexpression for X

g

in terms of X. Take the bandlimited signal you createdin Part 2.1 and modulate it with a cosine with frequency g1 = 5kHZ.Verify that expression you derived is correct.

2.4 The voice of you partner

2.4 The voice of you partner. Record the voice of your lab partner andrepeat Part 2.1. Repeat Part 2.3 but use a cosine with frequency g2 =15kHZ. Sum up the respective modulated signals to create the mixedsignal z.

2.5 Recover individual voices

2.5 Recover individual voices. Explain how to recover your voice andthe voice of your partner from the mixed signal z. Implement the recoveryand play back the individual voice pieces.

5

3 Time management

This lab is designed to be a respite from the more intensive Lab 3. Part 1should take 2 hours and Part 2 between 4 and 5. To solve Part 1 you needto make use of a technique called “completing squares.” If you have neverdone that, ask one of your teaching assistants right away. To solve Part2 do remember to make use of our help. There’s no reason to strugglewhen you can receive help.

6

Sampling

Alejandro Ribeiro

February 16, 2015

Signals exist in continuous time but it is not unusual for us to processthem in discrete time. When we work in discrete time we say that we aredoing discrete signal processing, something that is convenient due to therelative ease and lower cost of using computers to manipulate signals.When we use discrete time representations of continuous time signals weneed to implement processes to move back and forth between continuousand discrete time. The process of obtaining a discrete time signal froma continuous time signal is called sampling. The process of recoveringa continuous time signal from its discrete time samples is called signalreconstruction.

Mathematically, the sampling process has an elementary description.Given a sampling time T

s

and a continuous time signal x with values x(t),the sampled signal x

s

is the one that take values

x

s

(n) = x(nT

s

). (1)

As per (1), the sampled signal retains values at regular intervals spaced byT

s

and discards the remaining values of x(t) – see Figure 1. The processby which this is done in, say, a sound card, is a problem of circuit design.For our purposes, let us just say that (1) is a reasonable model for thetransformation of a continuous time signal into a discrete time signal.

A relevant question, perhaps the most relevant question, is what in-formation is lost when discarding all the values of x(t) except for thoseobserved at times nT

s

. To answer this question, we compare the spectralrepresentations of x

s

and x. In fact, since x

s

is a discrete time signal andx is a continuous time signal it is convenient to introduce a continuoustime representation of the sampled signal as we describe in the followingsection.

1

-4T

s

-3T

s

-2T

s

-Ts

0 T

s

2T

s

3T

s

4T

s

t

x

x

s

Sample ) T

s

x

x

s

T

s

Figure 1. Sampling with sampling time T

s

. Sampling the continuous time signalx to create the discrete time signal x

s

entails retaining the values x

s

(n) = x(nT

s

).A relevant question is in what respect the sampled signal x

s

(n) differs from theoriginal signal x(t).

-4T

s

-3T

s

-2T

s

-Ts

0 T

s

2T

s

3T

s

4T

s

t

F

F�1

-4 f

s

-3 f

s

-2 f

s

- f

s

0f

s

2 f

s

3 f

s

4 f

s

f

Figure 2. A Dirac train with spacing T

s

(left). The Fourier transform of the Diractrain is another Dirac train with spacing f

s

= 1/T

s

.

1 Dirac train representation of sampled signals

A Dirac train, or Dirac comb, with spacing T

s

is a signal x

c

defined by asuccession of delta functions located at positions nT

s

– see Figure 2 –,

x

c

(t) = T

s

•

Ân=�•

d(t � nT

s

). (2)

A Dirac train is, in a sense, an artifice to write down a discrete time signalin continuous time. The train is formally defined to be a continuous timesignal but it becomes relevant only at the (discrete) set of times nT

s

. Inour forthcoming discussions of sampling we use the Fourier transform ofthe Dirac comb. This transform can be seen to be another Dirac comb,but with spacing f

s

= 1/Ts. I.e., if we denote the Fourier transform of x

c

as X

c

= F (x

c

) we have that

X

c

( f ) =•

Âk=�•

d( f � k f

s

). (3)

That X

c

( f ) does represent the values of the Fourier transform of x

c

is notdifficult to show by identifying x

c

with the discrete time constant signalx(t) = 1, but we don’t show this derivation on these notes.

2

-4T

s

-3T

s

-2T

s

-Ts

0 T

s

2T

s

3T

s

4T

s

t

Figure 3. Representation of sampled signal with a modulated Dirac train. Therepresentation is equivalent to the one in Figure 1 but makes comparisons withthe original signal x easier.

In the Dirac train representation of sampling we use the samplesx

s

(n) = x(nT

s

) to modulate the deltas of a Dirac train. Specifically, wedefine the signal xd as – see Figure 3 –

xd(t) =•

Ân=�•

x(nT

s

)d(t � nT

s

). (4)

That (1) and (4) are equivalent representations of sampling follows fromthe simple observation that when given the value x

s

(n) we can deter-mine xd(nT

s

) and vice versa. The representation in (1) is simpler, but therepresentation in (4) permits comparisons with the original signal x.

Indeed, the sampling representation in (4) allows us to realize that wecan write xd(t) as the product between x(t) and the Dirac train in (2)

xd(t) = x(t)⇥

T

s

•

Ân=�•

d(t � nT

s

)

�. (5)

That the expressions in (4) and (5) are equivalent follows from the simpleobservation that in the multiplication of the function x(t) with the shifteddelta function d(t � nT

s

) only the value x(nT

s

) is relevant. It is thereforeequivalent to simply multiply d(t � nT

s

) by x(nT

s

).Straightforward though it is, rewriting (4) as (5) allows us to rapidly

characterize the spectrum of the sampled signal xd(t). Since we knowthat multiplication in time is equivalent to convolution in frequency wehave that the Fourier transform Xd = F (xd) can be written in terms ofthe Fourier transforms X = F (x) of x and the Dirac train as

Xd = X ⇤F

T

s

•

Ân=�•

d(t � nT

s

)

�. (6)

The Fourier transform of the Dirac train T

s

Â•n=�• d(t� nT

s

) we have seenis given by the Dirac train in (3). Using this result in (6) and the linearity

3

-3 f

s

/2 - f

s

- f

s

/2 0f

s

/2 f

s

3 f

s

/2 2 f

s

5 f

s

/2 f

Figure 4. Spectrum Xd = F (xd) of the sampled signal xd. The spectrum ofthe original signal is copied and shifted to all the frequencies that are integermultiples of f

s

. The spectrum Xd is the sum of all these shifted copies.

of the convolution operation we further conclude that

Xd =•

Âk=�•

X ⇤ d( f � k f

s

). (7)

A final simplification come from observing that the convolution of X withthe shifted delta function d( f � k f

s

) is just a shifting of the spectrum of X

so that it is re-centered at f = f

s

. We can therefore write

Xd( f ) =•

Âk=�•

X( f � k f

s

). (8)

The result in (8) is sufficiently important so as to deserve a summary inthe form of a Theorem that we formally state next.

Theorem 1 Consider a signal x with Fourier transform X = F (x), a sampling

time T

s

and the corresponding sampled signal xd as defined in (4). The spectrum

Xd = F (xd) of the sampled signal xd is a sum of shifted versions of the original

spectrum

Xd( f ) =•

Âk=�•

X( f � k f

s

). (9)

The result in Theorem 1 is explained in terms of what we call spectrumperiodization. We start from the spectrum X of the continuous time signalthat we replicate and shift to teach of the frequencies that are multiplesof the sampling frequency f

s

. The spectrum Xd of the sampled signal isgiven by the sum of all these shifted copies – see Figure 4.

The result of spectrum periodization provides a very clear answer tothe question of what information is lost when we sample a signal atfrequency f

s

. The answer is that whatever information is contained by

4

frequency components X( f ) outside of the set f 2 [� f

s

/2, f

s

/2] is com-pletely lost. Information contained at frequencies f close to the borders ofthis set are not completely lost but rather distorted by their mixing withthe frequency components outside of the set f 2 [� f

s

/2, f

s

/2]. We referto this distortion phenomenon as aliasing.

The result in (1) points out to a particularly interesting result for thecase of bandlimited signals that you are asked to analyze.

1.1 Sampling of bandlimited signals

1.1 Sampling of bandlimited signals. Suppose the signal X has band-width W, i.e., that X( f ) = 0 for all f /2 [�W/2, W/2]. In this case,sampling entails no loss of information in that it is possible to recoverx(t) perfectly if only the samples x

s

(n) are given. Explain why this istrue and describe a method to recover the continuous time signal x fromthe modulated Dirac train xd.

1.2 Avoiding aliasing

1.2 Avoiding aliasing. When we sample a signal that is not bandlimited,there is an unavoidable loss of the information contained in frequencieslarger than f

s

/2 – and the equivalent information contained in frequen-cies smaller than � f

s

/2. However, it is possible to avoid aliasing throughjudicious use of a low pass filter. Explain how this is done.

1.3 Reconstruction with arbitrary pulse trains

1.3 Reconstruction with arbitrary pulse trains. While it is mathemati-cally possible to reconstruct x(t) from xd(t), it is physically implausibleto generate a Dirac train because delta functions are not physical entities.We can, however, approximate d(t) by a narrow pulse p(t) and attemptto reconstruct x(t) from the modulated train pulse

x

p

(t) =•

Ân=�•

x(nT

s

)p(t � nT

s

). (10)

As long as the pulse p(t � nT

s

) is sufficiently tall and narrow, x

p

(t) isnot too far from xd(t) and the reconstruction method described in 1.1should yield acceptable results with x

p

(t) used in lieu of xd(t). Work inthe frequency domain to explain what distortion is introduced by the useof x

p

(t) in lieu of xd(t). In the course of this analysis you will realizethat there is a condition on p(t) that guarantees no distortion, i.e., perfectreconstruction of x(t) without using a Dirac train. Derive this conditionand propose a particular pulse with this property. Do notice that thepulse you are proposing is not that narrow after all.

5

-7T

s

-6T

s

= �2t -5T

s

-4T

s

-3T

s

= �t -2T

s

-Ts

0 T

s

2T

s

3T

s

= t 4T

s

5T

s

6T

s

= t 7T

s

t

Figure 5. Subsampling (top). When subsampling a discrete time signal we retaina subset of the values of the given discrete time signal. In the figure, the samplingtime of the given signal x is T

s

and the sampling time of the subsampled signalx

s

is t = 3T

s

. We therefore keep one out of every three values of x to form x

s

..

2 Subsampling

Most often, sampling is understood as a technique to generate a discretetime signal from a continuous time signal. However, we can also usesampling to generate a smaller number of a samples from an alreadysampled signal. Consider then a discrete time signal x with samplingtime T

s

and values x(n). We want to generate a (sub)sampled signal x

s

with sampling time t and values x

s

(m) given by

x

s

(m) = x

✓m

t

T

s

◆. (11)

For the expression in (11) to make sense we need to have the subsamplingtime t to be an integer multiple of T

s

. Under that assumption, makingx

s

(m) = x(mt/T

s

) means that we retain one value of x(n) out of everyt/T

s

values. E.g., of t/T

s

= 2, we keep every other sample of x intox(s). If t/T

s

= 3, we make x

s

(0) = x(0), x

s

(1) = x(3), and, in generalx

s

(m) = x(3m) so that we keep all the values in x that correspond to timeindexes that are multiples of 3 – see Figure 5.

As in the case of sampling, we want to understand what information,if any, is lost when we subsample x into x

s

. And, also as in the case ofsampling, the difficulty in answering this question is that the support ofthe signals x and x

s

are different. In (1), the continuous time signal x isa function of the continuous time parameter t and the sampled signal x

s

is a function of the discrete time parameter n. In (11), the original signalx is defined for times nT

s

, whereas the subsampled signal is defined fortimes mt.

6

-7T

s

-6T

s

= �2t -5T

s

-4T

s

-3T

s

= �t -2T

s

-Ts

0 T

s

2T

s

3T

s

= t 4T

s

5T

s

6T

s

= t 7T

s

t

Figure 6. Delta train representation of subsampling. The difference with thesubsampled signal in Figure 5 is that here we pad with zeros so that the supportof this signal is the same support of the original signal.

.

We can overcome this problem by introducing the analogous of themodulated Dirac train in (4). To do so, consider a train of discrete timedelta functions centered at discrete time indexes mt/T

s

and define thedelta train representation of the subsampled signal as

xd(n) =•

Âm=�•

x

✓m

t

T

s

◆d

✓n � m

t

T

s

◆. (12)

A schematic representation of (12) is available in Figure 6. The differencebetween xd and x

s

is that xd is padded with zeros so that its support isthe same support of the original signal x. Do notice that it is pointless toutilize xd for signal processing, when we can use the equivalent signal x

s

.However, the delta train representation xd is more convenient for analysis.In particular, it is ready to repeat the steps in (5) - (8) to conclude that aresult equivalent to the periodization statement of Theorem 1 holds.

2.1 Subsampling theorem

2.1 Subsampling theorem. Derive the equivalent of Theorem 1 relat-ing the spectra of the discrete time signal x and its subsampled versionxd. To solve this part you need to compute the DTFT of the delta trainÂ•

m=�• d(n � mt/T

s

). This DTFT is a Dirac train with spikes that arespaced by the subsampling frequency n = 1/t. If you have problemswith this derivation, which you will most likely have, talk with one ofyour teaching assistants. If you don’t want to talk with them ponderthe fact that the train Â•

m=�• d(n � mt/T

s

) is akin to a constant functionwhen we use the sampling time t.

7

2.2 Subsampling function

2.2 Subsampling function. Create a function that takes as input a sig-nal x, a sampling time T

s

, and a subsampling time t to return the sub-sampled signal x

s

and its delta train representation xd. The latter signalwould not be returned in practice, but we will use it here to performsome analyses. Test your function with a Gaussian pulse of standard de-viation s = 100ms and mean µ = 1s. Set the original sample frequencyto f

s

= 40kHz, the subsampling frequency to f

s

= 4kHz and the totalobservation period to T = 2s.

2.3 Spectrum periodization

2.3 Spectrum periodization. Take the DFT of the functions x and xd ofPart 2.2 and check that the periodization result of Part 2.1 holds. Keep allparameters unchanged and vary the standard deviation of the Gaussianpulse to observe cases with and without aliasing.

2.4 Prefiltering

2.4 Prefiltering. The function you wrote in Part 2.2 results in aliasingwhen the spectrum of the signal x has a bandwidth W that exceeds n. Wecan avoid aliasing by implementing a low pass filter to eliminate frequen-cies above n before subsampling. Modify the function of Part 2.2 to addthis feature.

2.5 Spectrum periodization with prefiltering

2.5 Spectrum periodization with prefiltering . Repeat Part 2.3 usingthe function in Part 2.4. For the cases without aliasing the result shouldbe the same. Observe and comment the differences for the cases in whichyou had observed aliasing.

2.6 Reconstruction function

2.6 Reconstruction function. Create a function that takes as input asubsampled signal x

s

, a sampling time T

s

, and a subsampling time tto return the signal x. Depending on context this process may also becalled interpolation – because we interpolate the values between subse-quent samples in x

s

– or upsampling – because we increase the samplingfrequency from n to f

s

. In implementing this function you can assumethat the signal x

s

is bandlimited and was generated without aliasing. Testyour function for a Gaussian pulse. Choose parameters of Part 2.3 thatdid not result in aliasing. Choose parameters for which you observedaliasing and check that, indeed, the reconstructed pulse is not a faith-ful representation of the original pulse. For this latter experiment utilizeboth, the subsampling function in Part 2.2 and the subsampling functionin Part 2.4.

8

3 Time management

This lab returns to the mean and is more involved that Lab 4. Part 1includes results that we will derive on class, so it shouldn’t be too onerousto finish. The teaching assistants will work on these problems duringTuesday’s meeting. It should be an hour or so more to wrap it up.

Part 2.1 is an odd man as it is asking that you do a somewhat involvedderivation. Work on it during Wednesday and, if you can’t solve it beforethe end of the day, go talk with one of your teaching assistants. A 2 hourinvestment should do.

We will work on the remaining parts on the Thursday session. Com-pleting the rest should take about 5 hours, 1 hour for each of the parts.

9

Voice recognition

Alejandro Ribeiro

February 23, 2015

The goal of this lab is to use your accumulated knowledge of signaland information processing to design a system for the recognition of aspoken digit.

Figure 1 shows four different realizations of the DFT of the signalrecorded when I spoke the word “one.” These four DFTs are differentto each other because there are variations in the sounds that I produce,but they also have discernible patterns. E.g. in all four DFTs you cansee two well defined frequency spikes close to frequencies 0.5kHz and0.7kHz. That these patterns are specific to the word “one” can be verifiedby the four different realizations of the DFT of the signal recorded when Ispoke the word “two” that are shown in Figure 2. The two characteristicspikes of the DFTs in Figure 1 are absent from this second set of DFTs,which, instead, seem to all have a high frequency component a little below0.4kHz. Another feature that arises upon comparison is that the spectraassociated with the word “two” have their energy more evenly spread outthan the energy of the word “one.” Regardless of the specific features, thegeneral level conclusion is that the four DFTs in Figure 1 are more likeeach other than they are like the DFTs in Figure 2, which are also morelike each other than they are to the DFTs in Figure 1.

We could spend days talking about the physical meaning of these dif-ferences. Large frequency components are generally associated with vow-els that produce high energy sounds at definite frequencies. Consonantsgenerate sounds with less power because the vocal cords are not involvedin their generation. Constant sounds also tend to be more spread out infrequency because they are not associated with a well defined oscillatingtone. The differences we see between the DFTs of the spoken words “one”and “two” are because their vowel sounds are different – thus, frequencyspikes are observed at different locations – and because the consonantsound in “two” is longer than the consonant sound in “one” – therefore,

1

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6

Figure 1. Six different observations of the Fourier transform of the signal recordedwhen speaking the word “one.” These transforms are different from each otherbut they are more like each other than they are to the ones in Figure 2. (Frequencyaxis in kHz, sampling frequency set to f

s

= 8 kHz, signal duration T = 2s)

the energy in “two” is more spread out.However, our interest today is not on analyzing these differences but

on using them to detect a spoken digit. To that end, start by recordingN waveforms y

i

for the spoken word “one” and K waveforms z

i

for thespoken word “two.” The respective DFTs are denoted as Y

i

= F (zi

)and Z

i

= F (zi

). The sets of all DFTs Y := {Y

i

, i = 1, . . . , n} and Z :={Z

i

, i = 1, . . . , n} are called training sets. We assume that the signals inthe training sets have been normalized to have unit energy.

1 Acquire and process training sets. Acquire and store N = 10 record-ings for each of the two digits “one” and “two.” Compute and normalizethe respective DFTs.

Having acquired and processed the training sets Y and Z we acquire asignal x that results from the utterance of either the word “one” or the

2

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6

-1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6

Figure 2. Six different observations of the Fourier transform of the signal recordedwhen speaking the word “two.” These transforms are different from each otherbut they are more like each other than they are to the ones in Figure 1.

word “two” that we want to (correctly) identify. To do so we compare theDFT X = F (x) – which we also assume has been normalized to have unitenergy – with the DFTs Y

i

and Z

i

that were stored in the training sets.There are different choices to make this comparison. We will try two ofthem in this lab.

2 Comparison with average spectrum. For each of the training setsdefine the average spectra

Y =1K

N

Ân=1

Y

i

, Z =1K

N

Ân=1

Z

i

. (1)

Further define the inner product p(X, X

0) between the spectra X and X

0

as the norm of the inner product between their absolute values,

p(X, X

0) = |X|T |X0| =

N

Ân=1

|Xi

|.|X0i

|�1/2

. (2)

3

Compare the inner product p(X, Y) between the unknown spectrum X

and the average spectrum Y with the inner product p(X, Z) between theunknown spectrum X and the average spectrum Z. Assign the digit tothe spectrum with the largest inner product. Estimate your classificationaccuracy. Explain why we are using the absolute values of the spectra.

3 Nearest neighbor comparison. Compute the inner product p(X, Y

i

)between the unknown spectrum X and each to the spectra Y

i

associatedwith the word “one.” Do the same for the inner product p(X, Y

i

) betweenthe unknown spectrum X and each to the spectra Z

i

associated with theword “two.” Assign the digit to the spectrum with the largest inner prod-uct. Estimate your classification accuracy.

3 Larger number of digits. Try developing a system to identify all 10digits. We will give you 2 extra points for the effort and 3 more extrapoints if you succeed.

4

1 Time management

This lab marks a shift with respect to previous labs. You have acquired,or at least I am assuming that you have acquired, all the fundamentalconcepts on the spectral analysis of one dimensional signals. This lab isjust a test of the application of the concepts you have learnt. The piecesare not lengthy. You should be able to solve Part 1 in less than 1 hour andspend two or three hours in each of the other two parts. The extra timeyou can use it to start preparing for the midterm.

5

Voice recognition with a linear timeinvariant system

Alejandro Ribeiro

March 2, 2015

To recognize spoken words we can compare the spectra of prerecordedsignals associated with known words with the spectrum observed at clas-sification time. More specifically, say that we want to discern between thespoken word “one” and the spoken word “two.” We do so by recordingN waveforms y

i

for the spoken word “one” and N waveforms z

i

for thespoken word “two.” The respective signals are normalized to unit energyand DFTs Y

i

= F (yi

/ky

i

k) and Z

i

= F (zi

/kz

i

k) are computed. The setsof all DFTs are stored to construct the training sets Y := {Y

i

, i = 1, . . . , N}and Z := {Z

i

, i = 1, . . . , N}.With the training sets acquired we proceed to observe a signal x and

compare the DFT X = F (x) with the DFTs in the training sets Y and Z .To that end we define the energy p(X, X

0) of the crossproduct betweenthe spectra X and X

0 as

p(X, X

0) =

N�1

Âk=0

(|Xk

|.|X0k

|)2�1/2

. (1)

The cross product energy p(X, X

0) can be interpreted as the energy of thelinear filtering of the signal x through a filter with frequency response|X0|. If we filter the signal x with that filter, the resulting signal y has aspectrum Y = F (y) given by

Y = X|X0|, (2)

whose energy kYk2 is given by (1), indeed. This filtering representationcan be used to propose a time implementation of voice recognition thatdoesn’t involve computation of DFTs.

1

1 Comparison with average spectrum. For each of the training setsdefine the average spectra

Y =1K

N

Ân=1

Y

i

, Z =1K

N

Ân=1

Z

i

. (3)

Interpret Y and Z as frequency response of respective filters. Determinethe individual impulse responses and use them to compute p(X, Y) andp(X, Z) without determining the respective DFTs. Assign the spokenwaveform to the digit with the largest energy.

2 Online operation. An advantage of the implementation in Part 1, asopposed to the computation and comparison of the DFTs, is that they canbe run online, i.e., as a system that runs continuously and detects digitsas they are spoken. Explain how this can be done.

3 Online operation implementation. Your teaching assistants can ex-plain the use of Matlab’s real time toolbox to implement the online clas-sifier of Part 2.

1 Time management

This lab is intended to be very short as you are supposed to be studyingfor your midterm. One or two hours should be sufficient.

2

Two-Dimensional Signal Processing andImage De-noising

Alec Koppel, Mark Eisen, Alejandro Ribeiro

March 20, 2015

Before we considered (one-dimensional) discrete signals of the formx : [0, N � 1] ! C where N is the duration, having elements x(n) forn 2 [0, N � 1]. Extend this definition to two dimensions by consideringthe set of ordered pairs x : [0, M � 1]⇥ [0, N � 1] ! C

x = {x(m, n) : m 2 [0, M � 1], n 2 [0, N � 1]}. (1)

We can think of x as the set of values along the integer lattice in the two-dimensional plane, hence as elements of a discrete spatial domain. Weassociate these signals with the space of complex matrices CM⇥N . Theinner product for two signals x and y, both of duration M ⇥ N, in two-dimensions is a natural extension of the one-dimensional case, and isdefined as

hx, yi :=M�1

Âm=0

N�1

Ân=0

x(m, n)y⇤(m, n). (2)

As before, we define the energy of a two-dimensional signal x as hx, xi =kxk2. We say two signals are orthogonal in two-dimensions if hx, yi = 0whenever x 6= y, and if in addition both vectors have unit energy, i.e.kxk = kyk = 1 they are said to be orthonormal. The discrete two-dimensional impulse d(n, m) is defined as

d(m, n) =

(1 if n = m

0 if n 6= m

(3)

The discrete complex exponential e

kl,MN

(m, n) in two-dimensions at fre-

1

quencies k and l is the signal

e

kl,MN

(m, n) =1pMN

e

�j2pkm/M

e

�j2pln/N

=1pMN

e

�j2p(km/M+ln/N). (4)

It is a straight forward computation to check that e

kl,MN

is orthonormal toe

pq,MN

for k 6= p, l 6= q. The two-dimensional discrete Fourier transform(DFT) of x is the signal X : Z2 ! C where the elements X(k, l) for allk, l 2 Z are defined as

X(k, l) :=1pMN

M�1

Âm=0

N�1

Ân=0

x(m, n)e�j2pkm/M

e

�j2pln/N (5)

The arguments k and l of the signal X(k, l) are called the vertical andhorizontal frequency of the DFT and the value X(k, l) the frequency com-ponent of the given signal x. As in the one-dimensional case, when X isthe DFT of x we write X = F (x). Recall that for a complex exponential,discrete frequency k is equivalent to (real) frequency f

k

= (k/N) f

s

, whereN is the total number of samples and f

s

the sampling frequency.An alternative form of the 2d DFT is to realize that the sum in (5) is

defining a two-dimensional inner-product between x and the complex ex-ponential e

kl,MN

(m, n) with elements e

kl,MN

(m, n) = (1/p

MN)e�j2p(kn/N+lm/M).We can then write

X(k, l) :=1pMN

M�1

Âm=0

N�1

Ân=0

x(m, n)e�j2pln/N

!e

�j2pkm/M

=1pM

M�1

Âm=0

hx(m, ·), e

lN

ie�j2pkm/M

= hhx(m, ·), e

lN

i, e

kM

i = hx, e

kl,MN

i (6)

which allows us to view X(k, l) as a measure of how much the signal x

resembles an oscillation of frequency k in the vertical direction and l inthe horizontal direction. Note that the inner products in the two middleequalities above are scalar, whereas the last is in two dimensions, as maybe noted by the bold-face type.

Because the complex exponential is (M, N)-periodic, the 2d DFT val-

2

ues X(k, l) and X(k + M, l + N) are equal, i.e.,

X(k + M, l + N) =1pMN

M�1

Âm=0

N�1

Ân=0

x(m, n)e�j2p(k+M)m/M

e

�j2p(l+N)n/N

=1pMN

M�1

Âm=0

N�1

Ân=0

x(m, n)e�j2pkm/M

e

�j2pln/N

= X(k, l) (7)

The relationship in (7) means that the DFT is periodic in in both directionswith period N, and while it is defined for all k, l 2 Z2, only M ⇥ N valuesare different. As in the 1d case, we work with the canonical set of fre-quencies k, l 2 [0, M � 1]⇥ [0, N � 1] for computational purposes, and theset of frequencies k, l 2 [�M/2, M/2] ⇥ [�N/2, N/2] for interpretationpurposes. This latter canonical set contains (M + 1)(N + 1) frequenciesinstead of M ⇥ N. As before, we may shift the values of the 2-d DFT toconvert from one canonical set to another using periodicity:

X(�k,�l) = X(M � k, N � l) (8)

for all k, l 2 [�M/2, M/2] ⇥ [�N/2, N/2]. The operation in (8) is a“chop and shift,” from which we may recover the DFT values for thecanonical set [�N/2, N/2]⇥ [�N/2, N/2] from the canonical set [0, M �1] ⇥ [0, N � 1]. Simply chop the frequencies in the ”box” [M/2, M �1] ⇥ [N/2, N � 1] and shift them to the from of the set as in the one-dimensional case. For the purposes of this homework, when you areasked to report a DFT, you should report the DFT for the canonicalset [�M/2, M/2] ⇥ [�N/2, N/2]. Further define the 2d inverse Dis-crete Fourier Transform (2d iDFT) F�1(X) as the two-dimensional signalx(m, n)

x(m, n) :=1pMN

M�1

Âk=0

N�1

Âl=0

X(k, l)ej2pkm/M

e

j2pln/N

=1pMN

M/2

Âk=�M/2+1

N/2

Âl=�N/2+1

X(k, l)ej2pkm/M

e

j2pln/N (9)

The expression in (9) means that any arbitrary signal x in two dimen-sions may be represented as a sum of oscillations, as in the one dimen-sional case. Moreover, conjugacy means we can represent the sum (9)with only half as many terms. This means that we can effectively repre-sent signals using only particular DFT coefficients under some conditions.

3

This observation is the foundation of image de-noising and compressionmethods which are the focus of the this and next week’s lab assignments.For the subsequent questions, you may assume that M = N so that sig-nals are of dimension N

2.

1 Two Dimensional Signal Processing

1.1 Inner products and orthogonality

1.1 Inner products and orthogonality. Write down a function that takesas input two-dimensional signals x and y and outputs their inner product.Each signal is now defined by N

2 complex numbers.

1.2 Discrete Complex Exponentials

1.2 Discrete Complex Exponentials. Write down a function that takesas input the frequencies k, l and signal duration N and returns three ma-trices of size N

2, the first of which contains the values of e

kl,N and thelater two contain its real and imaginary parts.

1.3 Unit Energy 2-D Square Pulse

1.3 Unit Energy 2-D Square Pulse. The two dimensional square pulseis defined as

uL

(m, n) =1L

2 if 0 m, n < L,

uL

(m, n) = 0 if m, n � L. (10)

Write down a function that takes as input the signal duration T, T0, andsampling frequency f

s

and outputs the square pulse as an array of sizeN

2 as well as the N ⇥ N the signal duration. Plot the two dimensionalsquare pulse for T = 32s, f

s

= 8Hz, and T0 = .5. (Use MATLAB’s imshowfunction).

1.4 Two-Dimensional Gaussian Signals

1.4 Two-Dimensional Gaussian Signals. The Gaussian pulse centeredat µ, when t and s are uncorrelated, is defined as

x(t, s) = e

�[(t�µ)2+(s�µ)2]/(2s2). (11)

Write down a function which takes as input µ, s and T

s

, and outputs theGaussian pulse in two-dimensions. Recall that as in the one-dimensionalcase, we sampled t 2 [�3s, 3s] in increments of size T

s

. Plot the twodimensional Gaussian for µ = 2 and s = 1. Note how these signals whenprojected onto a single dimension.

4

1.5 DFT in two dimensions

1.5 DFT in two dimensions. Modify your function for the one dimen-sional DFT in lab 2 so that it now computes the DFT of a two-dimensionalsignal. This computation may be expressed in terms of an inner product(written for question 1.2) of the two-dimensional discrete complex expo-nential (question 1.2). Compute the DFT of a 2d Gaussian pulse withµ = 0 and s = 2. Plot your results in the two-dimensional plane.

1.6 iDFT in Two Dimensions

1.6 iDFT in Two Dimensions. Write down a function which takes asinput a two-dimensional signal of duration N

2 and computes its two-dimensional iDFT [cf. (9)]. Exploit conjugacy in your computation sothat you only need to take in (N/2)2 DFT coefficients. Compute theiDFT associated with the DFT of the Gaussian pulse you computed inthe previous question. Plot this signal and compare it with the originalGaussian pulse.

2 Image Filtering and de-noising

We may de-noise corrupted images using spatial information about thesignal. To do this, we consider the two-dimensional convolution of signalsx and y, denoted by x ⇤ y, and defined as

[x ⇤ y](n, m) :=•

Âk=�•

•

Âl=�•

x(k, l)y(k � m, l � n)

=M�1

Âk=0

N�1

Âl=0

x(k, l)y(k � m, l � n) (12)

The standard way to perform image de-noising is to convolve the imagewith a Gaussian pulse. To implement this technique, start by defining theGaussian pulse associated with a 2-D signal x as

Gs(x) =1

4ps2 e

kxk2/4s2 (13)

where kxk2 = hx, xi. With two-dimensional signal (image) x the spatialde-noising technique we consider is

xde-noised = Gs ⇤ x. (14)

5

2.1 Spatial De-noising

2.1 Spatial De-noising. Implement the technique in (14) without usingthe 2-D DFT by convolving the image with the Gaussian pulse directly.Hint: MATLAB has a built in function to perform the convolution intwo dimensions. Try your implementation in the spatial domain out fors = 1, 4, 16 on sample images A and B. Do you observe significant de-noising performance differences when varying s?

6

The Discrete Cosine Transform and JPEG

Alec Koppel, Mark Eisen, Alejandro Ribeiro

March 20, 2015

For image processing applications, it is useful to consider the DiscreteCosine Transform (2d DCT) instead of the 2d DFT due to its superior em-pirical performance for signal compression and reconstruction tasks. Wefirst introduce the two-dimensional discrete cosine C

k,l(n, m) of frequen-cies k, l defined as

C

kl,MN

(m, n) = cos

kp

2M

(2m + 1)�

cos

lp

2N

(2n + 1)�

. (1)

Then the two-dimensional DCT of a signal x is given by substituting C

kl,Ninto the expression for the two-dimensional DFT, yielding

X

C

(k, l) :=1

MN

M�1

Âm=0

N�1

Ân=0

x(m, n) cos

kp

2M

(2m + 1)�

cos

lp

2N

(2n + 1)�

= hx, C

kl,Ni. (2)

Note that again this may be computed as an inner product in two dimen-sions, just like the 2d DFT. Crucial to the theory of image reconstructionand compression is the 2d inverse Discrete Cosine Transform (2d iDCT),which is the signal x

C

defined as

x

C

(m, n) :=M�1

Âk=0

N�1

Âl=0

c

k

c

l

X

C

(k, l) cos

kp(2m + 1)2M

�cos

lp(2n + 1)

2N

�,

(3)

where c

k

= 1 for k = 0 and c

k

= 2 for k = 1, . . . , N � 1. Analogousto the 2d DFT, we note that the sum in (3) allows us to represent anarbitrary two-dimensional signal as a sum of cosines, and hence we mayask how many cosines are necessary to represent the signal well in termsof reconstruction error. We explore this question in the first part of thislab. Henceforth you may assume that M = N so that signals are ofdimension N

2.

1

1 Image Compression

1.1 DCT in Two Dimensions

1.1 DCT in Two Dimensions. Write down a function which takes asinput a two-dimensional signal of duration N

2 and computes its two-dimensional DCT defined in (2).

1.2 Image Compression

1.2 Image Compression. When the signal dimension N

2 is very large itis difficult to represent it well across its entire domain using the same DFTor DCT coefficients. This is because the computation of the inverse in (3)uses the same DFT or DCT coefficients across the entire domain. In lab3 part 2 we designed an audio compression scheme by partitioning thesignal and computing the DFT of each piece so that our DFT coefficientsonly need to locally represent the signal over a small domain. We willimplement a two-dimensional analogue here.

Hence write down a function that takes in a signal (image) of sizeN

2 and partitions it into patches of size 8 ⇥ 8, and for each patch storesthe K

2 largest DFT coefficients and their associated frequencies. Yourpartitioning scheme should resemble the depiction below.

Write another function that executes this procedure for the two di-mensional DCT. Try both of these functions out on sample image A forK

2 = 4, 16, 32. Make sure to keep track of each patches’ frequenciesassociated with the dominant DCT coefficients.

1.3 Quantization

1.3 Quantization. A rudimentary version of the JPEG compression schemefor images includes partitioning the image into patches, performing thetwo-dimensional DCT on each patch, and then rounding (or quantizing)the associated DCT coefficients. We describe this procedure below inmore detail:

1. Perform the DCT on each 8x8 block

2. X

ij

(k, l) = X

C

(x

ij

), where x

ij

is the (i, j)th block of the image

2

3. We then quantize the DCT coefficients:

X

ij

(k, l) = roundh

X

ij

(k, l)

Q(k, l)

i(4)

Q(k, l) is a quantization coefficient used to control how much the (k, l)thfrequency is quantized. Since human vision is not sensitive to these“rounding” errors, this is where the compression takes place. That is, asmaller set of pixel values requires less bits to represent in a computer.

The standard JPEG quantization matrix that you should apply to eachpatch is based upon the way your eye observes luminance, and is givenas

Q

L

=

0

BBBBBBBBBB@

16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 36 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99

1

CCCCCCCCCCA

(5)

Write a function that executes the above procedure, making use ofyour code from questions 1.2 - 1.1.

1.4 Image Reconstruction

1.4 Image Reconstruction. Write down a function that takes in the com-pression scheme in question 1.2 , computes the iDCT of each patch, andthen stitches these reconstructed patches together to form the global re-constructed signal. Write another function that executes this procedurefor the the quantized DCT coefficients from question 1.3.

Run this functions on the results of questions 1.2 and 1.3 (associatedwith sample image A). Plot the reconstruction error r

K

versus K for yourcode from questions 1.3. What do you observe? Are you able to discernwhat is in the original image?

Play around with the quantization matrix. Do higher or lower val-ues of Q(k, l) yield better reconstruction performance? How much canyou alter the entries Q(k, l) and still obtain a compression for which theoriginal image is discernible to your eye?

Try out all of these compression schemes on sample image B as well.

3

Principal Components Analysis

Santiago Paternain, Aryan Mokhtari and Alejandro Ribeiro

March 30, 2015

At this point we have already seen how the Discrite Fourier Transformand the Discrete Cosine Transform can be written in terms of a matrixproduct. The main idea is to multiply the signal by a specific Hermitianmatrix. Let us remind the definition of Hermitian matrix.

Definition 1 (Hermitian Matrix) Let M 2 CN⇥N be a matrix with complexentries. Denote by M⇤ the conjugate of M. i.e for each entry i, j we have that(M⇤)ij = M⇤

ij. Then we say M is Hermitian if

(M⇤)T M = I, (1)

where I is the N ⇥ N identity matrix and (·)T denotes the transpose of a matrix.

For the Discrete Fourier transform we can define the following matrix

F =

2

66664

eH0N

eH1N

...eH(N�1)N

3

77775=

1pN

2

6664

1 1 · · · 11 e�j2p(1)(1)/N · · · e�j2p(1)(N�1)/N

......

. . ....

1 e�j2p(N�1)(1)/N · · · e�j2p(N�1)(N�1)/N

3

7775.

(2)Then if we consider a signal x(n) for n = 0..N � 1 as a vector

x =

0

BBB@

x(0)x(1)...x(N � 1)

1

CCCA, (3)

1

we can write the DFT as the product between F and the vector x i.e

Fx =1pN

2

6664

1 1 · · · 11 e�j2p(1)(1)/N · · · e�j2p(1)(N�1)/N

......

. . ....

1 e�j2p(N�1)(1)/N · · · e�j2p(N�1)(N�1)/N

3

7775

2

6664

x(0)x(1)...x(N � 1)

3

7775

=1pN

2

664

ÂN�1n=1 x(n)e�2p j 0

N n

ÂN�1n=1 x(n)e�2p j 1

N n

... ÂN�1n=1 x(n)e�2p j N�1

N n

3

775 =

2

6664

X(0)X(1)

...X(N � 1)

3

7775.

(4)

In PCA decomposition we define a new Hermitian matrix based on theeigenvectors of the covariance matrix of a dataset. Before defining thecovariance matrix we need to the define the mean signal.

Definition 2 Let x1, x2, ...xM be M different points in a dataset, then we candefine the mean signal as

x =1M

M

Âm=1

xm. (5)

We next define the empirical covariance matrix

Definition 3 (Covariance matrix) Let x1, x2, ...xM be M different points in adataset, then we can define the covariance matrix as

S =1M

M

Âm=1

(xm � µ)(xm � µ)T . (6)

Let v1, v2, · · · , vN be the eigenvectors of the covariance matrix. Then, asin the Discrete Fourier transform define the Hermitian matrix

P = [v1, v2, · · · , vN ] (7)

Then the PCA decomposition can be written as a product between thematrix P and the difference between the signal and the mean signal i.e

XPCA = P (x � µ) . (8)

Just as with the DFT and the DCT we can define the inverse operation toPCA decomposition, which gives us the signal based on XPCA. For thiswe need to compute (P⇤)T . Then the inverse transformation is given by

x = (P⇤)T XPCA (9)

2

In this Lab we are going to be doing PCA decomposition of faces, there-fore we need to deal with two-dimensional signals. A way of doing thisis to vectorize the matrices representing the images. Let x 2 MN⇥N bea N by N matrix. Let xk be the k-th column of the matrix, then we canrepresent the matrix x as a concatenation of column vectors

x = [x1, x2, · · · , xN ] . (10)

Then the one dimensional representation of the signal x can be obtainedby stacking the columns of x, this is

x =

2

6664

x1x2...

xN

3

7775(11)

With this idea we can treat two dimensional signals as if they were onedimensional and the explanation of PCA done before carries on. In thenext parts we are going to use PCA decomposition for image compres-sion. We can compress an image by keeping the eigenvectors associatedwith the larger eigenvalues of the covariance matrix. In the next week wewill be using these ideas to build a face recognition system.

In both Labs we will be working with the face dataset of AT&T. Youcan download the dataset in the following link http://www.cl.cam.ac.uk/Research/DTG/attarchive:pub/data/att_faces.zip. Theimage set has 10 different images of 40 different persons (cf Figure 1), eachone of the images is a 112 ⇥ 92 pixel image. Since we haven’t covered yetthe concept of covariance matrix we are providing the covariance matrixfor the set which was computed using (6) and the mean face of the dataset (c.f. Figure 2b) which was computed using (2).

1 PCA: Decomposition

1.1 Decomposition in principal components1.1 Decomposition in principal components . Build a function thatgiven the covariance matrix and K the number of desired principal com-ponents returns the K eigenvectors of the covariance matrix with thelargest eigenvalues. Check that the matrix containing the eigenvectorsis Hermitian.

1.2 Decomposition of a face in the space of eigenfaces1.2 Decomposition of a face in the space of eigenfaces. Build a func-tion that receives as inputs an image containing a face, the mean face andthe eigenfaces and returns the projection in the space of eigenfaces.

3

Figure 1: Dataset

4

(a) An example of the dataset (b) This is the mean face of the dataset

2 Reconstruction

2.1 Reconstruction of a face using K principal components2.1 Reconstruction of a face using K principal components. Build afunction that receives as inputs the mean face, K eigenfaces and the KPrincipal components and outputs a reconstructed face. Test this functionfor any of the faces of the data set for 5, 25 and 25 principal components.

2.2 Reconstruction error2.2 Reconstruction error. For any given number of principal compo-nents considered we can compute the reconstruction error as the norm ofthe difference between the original face and the reconstructed face. Com-pute this error ranging the number of eigenfaces used from 0 to the sizeof the training set. How many principal components are needed to obtaina 60% accuracy in the reconstruction?

5

Face Recognition

Aryan Mokhtari, Santiago Paternain, and Alejandro Ribeiro

April 6, 2015

The goal of this lab is to implement face recognition using PrincipalComponent Analysis (PCA). One of the most important applications ofthe PCA is mapping the dataset into a new space with smaller dimensionsuch that the error of reconstruction is minimized. Map reduction isspecially useful for the datasets where the dimension of sample pointsare large. Consider the dataset D := {x1, . . . , xM} where each samplepoint xi has dimension N. We map the signals into a space of dimensionk where k << N. To do this first we recap the idea of PCA in the followingsection.

1 Principal Component Analysis

In PCA decomposition we define a new Hermitian matrix based on theeigenvectors of the covariance matrix of a dataset. Before defining thecovariance matrix we need to the define the mean signal.

Definition 1 Let x1, x2, ...xM be M different points in a dataset, then we candefine the mean signal as

µ =1M

M

Âm=1

xm. (1)

We next define the empirical covariance matrix.

Definition 2 (Covariance matrix) Let x1, x2, ...xM be M different points in adataset, then we can define the covariance matrix as

S =1M

M

Âm=1

(xm � µ)(xm � µ)T . (2)

1

Let v0, v1, · · · , vN�1 be the eigenvectors of the covariance matrix that theycorrespond to the eigenvalues l0, . . . , lN�1, respectively. Notice that theeigenvalues are ordered as l0 � l1 � · · · � lN�1. Then, as in theDiscrete Fourier transform define the Hermitian matrix

T =hv

T0 ; v

T1 ; · · · ; v

TN�1

i2 RN⇥N (3)

where the ith row is the ith eigenvector of the covariance matrix. Thenthe PCA decomposition can be written as a product between the matrixT and the difference between the signal and the mean signal i.e

X

PCAi = T (xi � µ) for i = 1, . . . , M. (4)

Just as with the DFT and the DCT we can define the inverse operation toPCA decomposition, which gives us the signal based on XPCA. For thiswe need to compute T

H . Then the inverse transformation is given by

xi = T

HX

PCAi + µ (5)

As we discussed before, when we use all the eigenvectors v0, . . . , vN�1 formaking the PCA transform matrix T, the reconstructed vectors are equalto the original signals, i.e., xi = xi.

2 Dimension reduction

As we observed before, in DFT and DCT for compressing the signal wekeep the coefficients with the largest value. In PCA we use a differentmechanism for compressing the signal. We keep the coefficients that cor-responds to the largest eigenvectors. This scheme can be implemented byredefining the PCA transform matrix as

T =hv

T0 ; v

T1 ; · · · ; v

Tk�1

i2 Rk⇥N (6)

As you can see we only use the eigenvectors that correspond to the klargest eigenvalues for creating the transform matrix T. Therefore, thePCA-ed signals dimension is k which is smaller than N. The PCA-edsignals are given by

X

PCAi = T (xi � µ) for i = 1, . . . , M. (7)

To reconstruct the signal we use the inverse transformation which is givenby

xi = T

HX

PCAi + µ (8)

2

Observe that as with the DFT and the DCT the reconstruction error is notzero for PCA when we compress the signal. The average reconstructionerror can be computed as

1M

M

Âi=1

kxi � xik2. (9)

3 Face Recognition

In face recognition we have access to a set of data called Training set whichwe know their labels. The labels can be binary values, integer numbers,etc. In the face recognition application that we consider in this lab, thelabel of each image is the person who is in the image. The goal of facerecognition is assigning labels to a set of signals (images) called Test setwhich they are not classified. Different techniques can be used for classi-fication, but in this lab we use the nearest neighbor method.

Consider a training set Dtraining := {x1, . . . , xM} where their labels are{y1, . . . , yM}. Further, consider Dtest := {x1, . . . , xp} as a test set that thelabels of sample points are not given. To classify the label of a samplepoint xi, we compute the distance of xi from all the points in the trainingset and find the one that has the minimum distance,

x

⇤i = argmin

x2Dtraining

kxi � xkl . (10)

Observe that x

⇤i is the closet training point to the test point xi, therefore,

we assign the label of x

⇤i to the test point xi.

Notice that each pixel of images contains noise. If map the imagesinto the space of principal components we can improve the accuracy ofclassification by mapping the images into a space with more information.To do this, we map all the training points and test points into the spaceof the training points principal components and redo the nearest neigh-bor algorithm for classifying the points in the test set. Define S as thecovariance matrix of the training set,

Straining =1M

M

Âm=1

(xm � µtraining)(xm � µtraining)T , (11)

where µtraining is the average image of the training set. Consider thek eigenvectors v0, . . . , vk�1 of the training covariance matrix Straining to

3

define the PCA transformation as

T =hv

T0 ; v

T1 ; · · · ; v

Tk�1

i2 Rk⇥N (12)

We map all the training points xi 2 Dtraining into the space of k principalcomponents as

X

PCAi = T

⇣xi � µtraining

⌘for xi 2 Dtraining. (13)

Mapped training images X

PCAi form a PCA-ed training set DPCA

training. Wedo the same process for the images in the test point Dtest as

X

PCAi = T

⇣xi � µtraining

⌘for xi 2 Dtest. (14)

Likewise, the mapped images X

PCAi create the PCA-ed test set DPCA

test .Now we implement the nearest neighbor algorithm as in (10) for the PCA-ed training and test sets. I.e.,

X

⇤i = argmin

X2DPCAtraining

kX

PCAi � Xkl . for all X

PCAi 2 DPCA

test (15)

Notice that X

⇤i is the closest neighbor of the PCA-ed test point X

PCAi .

Therefore, we assign the label of image x

⇤i in the training set to the image

xi.

4 Creating training and test sets

For this lab we use the dataset provided by AT&T Laboratories Cam-bridge. The images of the dataset are shown in Fig 1. As it is shownthe dataset contains 10 different images of 40 people. The images are in”.pgm” format and the size of each image is 112 ⇥ 92, 8-bit grey levels.The images are organized in 40 directories (one for each person) namedas ”sX”, where X indicates the person number (between 1 and 40). In eachdirectory there are 10 different images of the selected person named as”Y.pgm”, where Y indicates which image for the specific person (between1 and 10). We pick some of the images to create a training set and use therest as a test set for face recognition.

4

Figure 1. Images of AT&T Laboratories Cambridge dataset.

4.1 Generating training and test sets

4.1 Generating training and test sets. Consider the vector ”num sample training”with size 1 ⇥ 40 that contains the number of images required for creatingthe training set from each directory. E.g., the first component of vector”num sample training” shows the number of required images from di-rectory s1 for creating the training set. The rest of the sample points inthe dataset are considered as the test points. Write a Matlab function thattakes as input vector ”num sample training” and returns as output a ma-trix called ”faces training” with dimension ”112⇥ 92⇥ size of the training set”and a matrix called ”faces test” with dimension ”112⇥ 92⇥ size of the test set”.Notice that the function should open each directory and store a specifiednumber of images from each directory in the matrix ”faces training” andthe rest in the matrix ”faces test”.

5

4.2 Generating training and test sets

4.2 Generating training and test sets. Use the function in Part 4.1 tostore all the first 9 images of 40 different categories in matrix ”faces training”and the rest in ”faces test”. Notice that the size of generated matricesshould be 112 ⇥ 92 ⇥ 360 and 112 ⇥ 92 ⇥ 40, respectively.

4.3 Generating training and matrices

4.3 Generating training and matrices. Note that the size of images is112 ⇥ 92, therefore, each image can be represented as a vector of length10304. By stacking sample vectors of size 10304 in a matrix we can gen-erate a training matrix of size ”360⇥10304” and a test matrix of size”40⇥10304”. Write a Matlab function that takes as input matrices ”faces training”and ”faces test” and returns training matrix Xtrain where each row con-tains one image of the training set and test matrix Xtest where each rowcontains one image of the test set.

4.4 Generating training matrix

4.4 Generating training matrix. Use the function in Part 4.3 and thematrices ”faces training” and ”faces test” generated in Part 4.2 to createthe training and test matrices Xtrain and Xtest.

5 PCA on the training and test sets

In this section we use Principal Component Analysis to reduce the size offeatures of samples in the training matrix. We also map the images in thetest set into the space of training set principal components.

5.1 PCA function

5.1 PCA function. Write a Matlab function that given the number ofprincipal components k and the training matrix Xtrain, returns the PCAedtraining matrix XPCA

train , the top k eigenvectors V, and the mean vector µ.Notice that the dimension of eigenfaces matrix V should be 10304 ⇥ k.

5.2 PCA for different choices of k5.2 PCA for different choices of k. Use the training matrix Xtrain gen-erated in Part 4.4 as the input of function in Part 5.1 to generate thePCAed training set. For the number of principal components considerk = 1, 5, 10, 20. Notice that since there are 4 different choices for the num-ber of principal components we expect 4 different PCAed training matri-ces XPCA

train with sizes 360⇥ 1, 360⇥ 5, 360⇥ 10, and 360⇥ 20 and their cor-responding eigenfaces matrices of sizes 10304 ⇥ 1, 10304 ⇥ 5, 10304 ⇥ 10,and 10304 ⇥ 20, respectively.

5.3 PCAed test point

5.3 PCAed test point. Write a Matlab function that given the test matrixXtest, mean value of features µ and the eigenfaces matrix V returns thePCAed test set as XPCA

test .

6

5.4 More PCAed test point

5.4 More PCAed test point. Use the function in Part 5.3 and outputsof Part 5.2 to apply principal component analysis on the tests point fordifferent values of principal components k = 1, 5, 10, 20. The outcome ofthis section should be 4 different PCAed test matrices XPCA

test with sizes10 ⇥ 1, 10 ⇥ 5, 10 ⇥ 10, and 10 ⇥ 20.

6 Nearest neighbor classification

Consider that we have an image form the test set. Our goal is to find theimage in the training set which is the most similar to the test image. Toclassify the closest image we use the nearest neighbor algorithm.

6.1 Nearest Neighbor function

6.1 Nearest Neighbor function. write a MATLAB function that giventhe PCAed training set XPCA

training and the PCAed test set XPCAtest returns

the sample of PCAed training set that has the minimum distance to thePCAed test point.

6.2 More Nearest Neighbor

6.2 More Nearest Neighbor. Use the function in Part 6.1 to find thenearest neighbors of PCAed test points in the test matrix XPCA

test given thePCA-ed training points in the matrix XPCA

train . Display some of the imagesin the test set and their nearest neighbors for different choices of k.

7

Signal Processing on Graphs

Santiago Segarra, Weiyu Huang, and Alejandro Ribeiro

April 12, 2015

In previous weeks, we have focused our attention on discrete timesignal processing, image processing, and principal component analysis(PCA). These three seemingly unrelated areas can be thought of as thestudy of signals on particular graphs: a directed cycle, a lattice and acovariance graph. Thus, the theory of signal processing for graphs canbe conceived as a unifying theory which develops tools for more gen-eral graph domains and, when particularized for the mentioned graphs,recovers some of the existing results.

1 Intro to graph theory

Formally, a graph is a triplet G = (V , E , W) where V = {1, 2, . . . , N} is afinite set of N nodes or vertices, E ✓ V ⇥ V is a set of edges defined asorder pairs (n, m) and W : E ! R is a map from the set of edges to scalarvalues, wnm. Weights wnm represent the similarity or level of relationshipfrom n to m. The adjacency matrix A 2 RN⇥N of a graph is defined as

Anm =

(wnm, if(n, m) 2 E ;0, otherwise.

(1)

In unweighted graphs, Anm is either 1 – if nodes n and m are connected –or 0 otherwise. For undirected graphs, the adjacency matrix is symmetric,i.e. wnm = wmn for all nodes n and m. When this is not the case, we saythat the graph is directed.

Graph signals are mappings x : V ! R from the vertices of the graphinto the real (or complex) numbers. Graph signals can be represented asvectors x 2 RN where xn stores the signal value at the nth vertex in V .Notice that this assumes an indexing of the nodes, which coincides withthe indexing used in the adjacency matrix.

1

The degree of a node is the sum of the weights of the edges incidentto this node. Formally, the degree of node i, deg(i) is defined as

deg(i) = Âj2N (i)

wij, (2)

where N (i) stands for the neighborhood of node i, i.e., all other nodesconnected to node i. The degree matrix D 2 RN⇥N is a diagonal matrixsuch that Dii = deg(i). In directed graphs, each node has an out-degree –sum of the weights of all edges out of the node – and an in-degree – sumthe weights of all edges into the node.

Given a graph G with adjacency matrix A and degree matrix D, wedefine the Laplacian matrix L 2 RN⇥N as

L = D � A. (3)

Equivalently, L can be defined elementwise as

Lij =

8><

>:

deg(i), if i = j;�wij, if (i, j) 2 E ;0, otherwise.

(4)

The Laplacian acts as a difference operator on graph signals. To see whythis is true, consider a graph signal x on graph G and define the newsignal y = Lx where each element yi is computed as

yi = [Lx]i = Âj2N (i)

wij(xi � xj). (5)

Notice that the element yi measures the difference between the value ofthe signal x at node i and at its neighborhood. The Laplacian has veryspecific spectral properties. In particular, the Laplacian of any graph ispositive semi-definite (all its eigenvalues are nonnegative) and has aneigenvalue of 0. Moreover, the multiplicity of the 0 eigenvalue is equal tothe number of connected components in the graph.

Given an arbitrary graph G = (V , E , W), a graph-shift operator S 2RN⇥N is a matrix satisfying Sij = 0 for i 6= j and (i, j) 62 E . That is, S cantake nonzero values in the edges of G or in its diagonal. Some commonchoices for S include the adjacency matrix A and the Laplacian L. Weconsider normal graph-shift operators, i.e. operators that can be writtenas S = VLV

H where the columns of V are the eigenvectors of S and L isa diagonal matrix containing the eigenvalues of S.

2

For a given graph-shift operator S = VLV

H , the Graph Fourier Trans-form (GFT) of x is defined as

x(k) = hx, vki =N

Ân=1

x(n)v⇤k (n). (6)

Equation (6) can be rewritten in matrix form to obtain

x = V

Hx. (7)

Since the columns of V are the eigenvectors vi of S, x(k) = v

Hk x is the

inner product between vk and x. We think of the eigenvectors vk as os-cillation modes associated to the eigenvalues in the same way that, indiscrete time signal processing, different complex exponentials are asso-ciated to frequency values. In particular, GFT is equivalent to DFT whenV

H = F, i.e. vk = ekN , the complex exponential vector.In order to measure how much a signal oscillates within a graph, the

concept of total variation can be extended from traditional signal pro-cessing. Classically, the total variation of a signal is defined as the sum ofsquared differences in consecutive signal samples, Ân (xn � xn�1)

2. Thisconcept can be extended to graphs where the notion of neighborhoodreplaces that of consecutive nodes to obtain

TVG(x) =N

Ân=1

Âm2N (n)

(xn � xm)2 wmn = x

T Lx. (8)

As can be seen from (8) the total variation of a signal in a graph can bewritten as a quadratic form that depends on the Laplacian of that graph.Total variation allows us to interpret the ordering of the eigenvalues ofthe Laplacian in terms of frequencies, i.e., larger eigenvalues correspondto higher frequencies (larger total variation). The eigenvectors associatedwith large eigenvalues oscillate rapidly whereas the eigenvectors associ-ated with small eigenvalues vary slowly.

The inverse graph Fourier transform (iGFT) of a graph signal x 2 RN

is given by

x(n) =N�1

Âk=0

x(k)vk(n), (9)

which can be rewritten in matrix form to obtain

x = Vx. (10)

3

The orthonormality of V ensures that, indeed, the GFT and iGFT areinverse operations. Orthonormality also allows the extension of otherclassical results to the graph domain, e.g., Parseval’s theorem.

2 Connection with traditional signal processing

Let us begin by analyzing the connection between graph signal process-ing and traditional finite discrete time signal processing. The latter is aparticular case of the former when the graph considered is a directed cy-cle, as we will discuss in this section. Recall that the adjacency matrix ofa directed cycle is

Adc =

2

666664

11

. . .1

1

3

777775, (11)

where the unspecified entries are zeros.

2.1 Generate a directed cyclic graph

2.1 Generate a directed cyclic graph. Write a function in MATLAB thattakes as an input the number N of nodes in a graph and generates theadjacency matrix of a directed cyclic graph as described in (11). Showyour output (through imagesc for example) for N = 20.

2.2 Compare your graph basis with the Fourier basis

2.2 Compare your graph basis with the Fourier basis. Fix N = 20 andconsider the graph-shift operator S1 = Adc equal to the adjacency matrixof the directed cycle. Consider the discrete Fourier basis F and show theresult of the following computation

FS1F

H = L1. (12)

What is the structure of matrix L1? What does this tell you about thecolumns of F

H? Confirm your answer by showing that the first column ofF

H indeed satisfies the property stated. What does this tell you about therelation between DFT and GFT for this particular graph-shift operator?

Consider now a symmetric graph with adjacency matrix Asc = Adc +A

Tdc and pick as a graph-shift operator its Laplacian S2 = Lsc. Repeat the

computation in (12) for S2. What is the relation between DFT and GFTfor this new operator S2?

4

3 The graph frequency domain

In this section we are going to analyze the frequency representation ofdifferent graph signals defined on a graph.

3.1 Compute the Graph Fourier Transform of a graph sig-

nal

3.1 Compute the Graph Fourier Transform of a graph signal. Write afunction in MATLAB that takes as an input a graph shift S 2 RN⇥N and asignal x 2 RN defined on the graph and generates as output x 2 RN , thegraph Fourier representation of x. Make sure to order the eigenvectors ofS in increasing order of absolute value of the associated eigenvalues.

3.2 Understanding the data

3.2 Understanding the data. Load the file graph sp data.mat. You willsee the adjacency matrix of a graph A 2 R50⇥50, and four graph signalscalled xi 2 R50 for i = 1, 2, 3 and y 2 R50. How many connected compo-nents does the graph have? Plot signals x1, x2 and x3. Can you tell whichone ‘varies faster’ in the graph domain?

3.3 Finding the frequency representation of signals

3.3 Finding the frequency representation of signals. For the remainderof the lab practice, we define as graph-shift S = L the Laplacian of theloaded graph with adjacency matrix A. Using your function in Section3.1, plot x1, x2, x3, i.e. the frequency representations of x1, x2, and x3,respectively. By looking at the plots, can you tell which signal variesslower and which one varies faster in the graph domain?

3.4 Quantifying the variation of signals

3.4 Quantifying the variation of signals. Use total variation throughthe Laplacian quadratic form in (8) to quantify the variation of a signal ina graph. Do these results confirm your intuition from Section 3.3?

3.5 Compute the inverse Graph Fourier Transform of a

graph signal

3.5 Compute the inverse Graph Fourier Transform of a graph sig-

nal. Write a function in MATLAB that takes as an input a graph shiftS 2 RN⇥N and the frequency coefficients x 2 RN of a graph signal andoutputs x 2 RN , the original signal. Make sure to order the eigenvectorsof S in increasing order of absolute value of the associated eigenvalues.

5

3.6 Reconstruction and Parseval’s theorem

3.6 Reconstruction and Parseval’s theorem. If you need to compressx1 by only keeping K = 5 frequency coefficients, which ones would youkeep? Perform the reconstruction and plot the original signal and thereconstructed one. Also, compute the energy of the error. Can you com-pute the energy of the reconstruction error without actually performingthe reconstruction? What quality of the GFT allows you to do this?

3.7 Denoising a graph signal

3.7 Denoising a graph signal. Assume that graph signal y is in factcomposed of a graph signal z of bandwidth 3 contaminated with whitenoise. Your objective is to recover z by keeping the correct frequencycoefficients. Plot y, y and your reconstruction of z.

6

Signal Processing on Graphs –Classification of Cancer Types

Weiyu Huang, Santiago Segarra, and Alejandro Ribeiro

April 20, 2015

In last week, we studied graph signal processing, which defines theconcept of frequency and graph Fourier transform for signals supportedon graphs. We observed that the traditional finite discrete time signalprocessing can be viewed as a particular case of the graph signal process-ing when the graph considered is a directed cycle. In the lab this week,we are going to complete our journey in ESE 224 with an applicationof graph signal processing to improve the classification of cancer typesthrough the use of genetic networks.

1 Review of graph theory and graph filters

Formally, a graph is a triplet G = (V , E , W) where V = {1, 2, . . . , N} is afinite set of N nodes or vertices, E ✓ V ⇥ V is a set of edges defined asorder pairs (n, m) and W : E ! R is a map from the set of edges to scalarvalues, wnm. Weights wnm represent the similarity or level of relationshipfrom n to m. The adjacency matrix A 2 RN⇥N of a graph is defined as

Anm =

(wnm, if (n, m) 2 E ;0, otherwise.

(1)

In unweighted graphs, Anm is either 1 – if nodes n and m are connected –or 0 otherwise. For undirected graphs, the adjacency matrix is symmetric,i.e. wnm = wmn for all nodes n and m. When this is not the case, we saythat the graph is directed.

Graph signals are mappings x : V ! R from the vertices of the graphinto the real (or complex) numbers. Graph signals can be represented asvectors x 2 RN where xn stores the signal value at the nth vertex in V .

1

Notice that this assumes an indexing of the nodes, which coincides withthe indexing used in the adjacency matrix.

The degree of a node is the sum of the weights of the edges incidentto this node. Formally, the degree of node i, deg(i) is defined as

deg(i) = Âj2N (i)

wij, (2)

where N (i) stands for the neighborhood of node i, i.e., all other nodesconnected to node i. The degree matrix D 2 RN⇥N is a diagonal matrixsuch that Dii = deg(i). In directed graphs, each node has an out-degree –sum of the weights of all edges out of the node – and an in-degree – sumthe weights of all edges into the node.

Given a graph G with adjacency matrix A and degree matrix D, wedefine the Laplacian matrix L 2 RN⇥N as

L = D � A. (3)

Given an arbitrary graph G = (V , E , W), a graph-shift operator S 2 RN⇥N

is a matrix satisfying Sij = 0 for i 6= j and (i, j) 62 E . For a given graph-shift operator S = VLV

H , the Graph Fourier Transform (GFT) of x isdefined as

x(k) = hx, vki =N

Ân=1

x(n)v⇤k (n). (4)

Equation (4) can be rewritten in matrix form to obtain

x = V

Hx. (5)

Since the columns of V are the eigenvectors vi of S, x(k) = v

Hk x is the

inner product between vk and x. We think of the eigenvectors vk as os-cillation modes associated to the eigenvalues in the same way that, indiscrete time signal processing, different complex exponentials are asso-ciated to frequency values. In particular, GFT is equivalent to DFT whenV

H = F, i.e. vk = ekN , the complex exponential vector.In order to measure how much a signal oscillates within a graph, the

concept of total variation can be extended from traditional signal pro-cessing. Classically, the total variation of a signal is defined as the sum ofsquared differences in consecutive signal samples, Ân (xn � xn�1)

2. Thisconcept can be extended to graphs where the notion of neighborhoodreplaces that of consecutive nodes to obtain

TVG(x) =N

Ân=1

Âm2N (n)

(xn � xm)2 wmn = x

T Lx. (6)

2

As can be seen from (6) the total variation of a signal in a graph can bewritten as a quadratic form that depends on the Laplacian of that graph.Total variation allows us to interpret the ordering of the eigenvalues ofthe Laplacian in terms of frequencies, i.e., larger eigenvalues correspondto higher frequencies (larger total variation). The eigenvectors associatedwith large eigenvalues oscillate rapidly whereas the eigenvectors associ-ated with small eigenvalues vary slowly.

The inverse graph Fourier transform (iGFT) of a graph signal x 2 RN

is given by

x(n) =N�1

Âk=0

x(k)vk(n), (7)

which can be rewritten in matrix form to obtain

x = Vx. (8)

The orthonormality of V ensures that, indeed, the GFT and iGFT areinverse operations. Orthonormality also allows the extension of otherclassical results to the graph domain, e.g., Parseval’s theorem.

Given a graph signal x with its GFT x, we can filter the graph signalby passing it through a filter H. In frequency domain, the filtered GFT isthe multiplication of the original GFT with the frequency response of thefilter, i.e.

ˆx = Hx. (9)

And the filtered signal in the graph domain x can be recovered by per-forming inverse GFT onto the filtered GFT,

x = V

ˆx. (10)

2 Genetic network

Let us begin by analyzing the genetic network describing gene-to-geneinteractions. The network is downloaded from the NCI Nature database(link: http://pid.nci.nih.gov/download.shtml). The network consists of2458 genes and loosely speaking, two genes are connected if the proteinsencoded by them participate in the same metabolism process.

3

2.1 Understanding the data

2.1 Understanding the data. Load the file geneNetwork rawPCNCI.mat.You will see the gene network A 2 R2458⇥2458. This is our adjacencymatrix for the following analysis. Does this graph contains self-loops? Isthe graph directed or undirected, weighted or unweighted? For graphspossessing this many nodes, it would be very useful to visualize the graphto get a sense of how does the graph looks like. Plot the graph using thespy command in MATLAB.

Construct L the Laplacian of the loaded adjacency matrix A. Supposewe define another adjacency matrix A with all self-loops removed, is L

the Laplacian of A different from L? Explain why.

2.2 Total variations

2.2 Total variations. For the graph-shift S = L, perform eigendecompo-sition and find its eigenvectors. Compute the total variation TVG(vk) foreach of its eigenvectors vk using (6). Plot the total variations for all eigen-vectors versus their corresponding eigenvalues. What can you say aboutthis? Recall that the total variation of a signal is defined to quantify howmuch a signal oscillates within a graph, what does your finding implyabout the ordering of frequencies, i.e. does eigenvectors associated withlarger eigenvalues oscillate more rapidly or more slowly?

3 Genetic profiles

In this section, we are going to study the genetic profiles of 240 patientsdiagnosed with different subtypes of breast cancer. We will see that byinterpreting these genetic profiles as graph signals defined on the geneticnetworks, we will be able to clearly distinguish patients from differentsubtypes.

Load the file signal mutation.mat. You will see the aggregated graphsignals X 2 R240⇥2458. The i-th row of this matrix represents the geneticprofile xi for the i-th patient. The n-th gene for this patient is mutated ifthe n-th entry of xi is 1 and is not mutated (normal) if the n-th entry is 0.

Patients diagnosed with the same disease may exhibit different phe-notypes, i.e. different variations of the same disease. The most effectivetherapies for different phenotypes may differ a lot and, for this reason,it is very beneficial if we can distinguish phenotypes based on the ge-netic profiles. Load the file histology subtype.mat and you will see a vectory 2 R240. The i-th element being 1 implies the i-th patient has serous sub-type breast cancer and being 2 represents endometrioid subtype breastcancer. Our goal is to better differentiate between patients with these twosubtypes based on graph Fourier analysis.

4

3.1 Distinguishing power

3.1 Distinguishing power. For the specific graph-shift operator beingthe Laplacian S = L = VLV

H , we want to find the oscillation modesvk such that the corresponding Graph Fourier Transform x(k) are themost different between patients with serous subtype and patients withendometrioid subtype. There are many ways to do this, and we considerthe following simple heuristic. First compute the GFTs xi = V

Hxi for all

patients. Then for each k, define the distinguishing power of the oscilla-tion mode vk as

DP(vk) =

��Âi:yi=1 xi(k)

Âi 1 {yi = 1} �Âi:yi=2 xi(k)

Âi 1 {yi = 2}

�� / Âi|xi(k)| , (11)

where 1 is defined as the indicator function with

1{A} =

(1, if A is true;0, otherwise.

(12)

I.e., DP(vk) computes the normalized difference between the mean GFTcoefficient for vk among patients with serous subtype and the mean GFTcoefficient among patients with endometrioid subtype. Generate a plot ofDP(vk) versus k for all frequency indexes k.

3.2 Interpretation

3.2 Interpretation. To have a better sense about the distribution of dis-tinguishing powers, generate a boxplot of DP(vk) for all k using the com-mand boxplot. In the boxplot, the central mark represents the median,the edges of the box are the 25th and 75th percentiles, and the whiskersextend to the most extreme data points that MATLAB considers not tobe outliers. The data points MATLAB considers as outliers are plottedindividually. Combining this analysis with the plot you generated, whatcan you say about the distribution of distinguishing power? Oscillationmodes with high DP contain distinguishing features of the two subtypesand oscillation modes with low DP hardly contain useful information. Inthe following section, we will design a graph filter in order to improveclassification accuracy of cancer subtypes. Following the study of ESE224this semester, what’s your best guess about the characteristics of reason-able graph filters?

4 Improving classifications using graph filter

We have applied k-Nearest Neighbors (k-NN) classifications a couple oftimes so far in the class and we will utilize it again in the cancer sub-type classification. We will describe the particular k-NN algorithm we

5

are going to use in this problem and introduce a few vectorized opera-tion functions in Section 4.1 to accelerate the running time for the k-NNalgorithm. Then we will design a graph filter to improve classificationaccuracy for cancer subtypes in Section 4.2.

4.1 k-NN for leave-one-out cross validation

We design the following procedure to test the improvement of filteredgraph signals Xf from the original graph signals X. In short, we performleave-one-out cross validation for a k nearest neighbors (k-NN) classifier.More precisely, we compute the pairwise distance between any pairs ofpatients using the graph signals X or the filtered graph signals Xf. Thenfor a particular patient, we look at the k nearest patients as given by thedistance being evaluated and assign to this patient the most common can-cer histology among the k nearest patients. We then compare the assignedhistology with her real cancer histology and evaluate the accuracy of theclassifier. Finally, we repeat this process for the 240 women consideredand obtain a global classification accuracy.

Write a function that given a matrix of graph signals (either the origi-nal ones or the filtered ones), the vector representing subtypes of patientsy, and the number of neighbors to be considered k, compute the globalclassification accuracy. Run this function for the original graph signalsand k = 3, 5, 7, report your accuracies.

Here are some tips to write a fast vectorized leave-one-out cross validationalgorithm, given arbitrary graph signals Z. In computing the pairwise distancebetween pairs of patients, we can use the commands d = squareform(pdist(Z)). Thefunction pdist outputs a vector containing the distances between each pair of ob-servations in Z and squareform converts this vector into the matrix form.

For leave-one-out cross validation, given the matrix representing the pairwisedistance, to select the nearest neighbors we can do the following. Start with thecommand [ ⇠, nn] = sort(d, 1, ’ascend’), whose second output is a matrix withthe i-th column representing the patient indexes ordered from the most similarpatient of the i-th patient to the most dissimilar one. Since the distance from apatient to herself is always 0, the most closest patient to any patient is herself.Considering this fact, the command nn label = y(nn(2:(k + 1), :)) gives us the labelsof the k nearest neighbors for all patients. In nn label, the i-th column representsthe labels of the k closest patient of the i-th patient. Finally, taking a mode withthe right dimension mode(nn label, 1) yields the prediction results from the k-NNclassifier. Comparing the results with the actual label y gives the accuracy.

6

4.2 Graph filters

We consider the following two graph filters to improve the classificationaccuracy results we computed in the previous section. The first graphfilter keeps only the information conveyed in the oscillation mode thatis most distinguishable for two subtypes. To be more specific, for thegenetic profile xi of each patient, its GFT xi is fed into a graph filter H1with the following frequency response,

H1(k) =

(1, if k = argmaxk DP(vk);0, otherwise.

(13)

Denote the filtered GFT as ˆxi and its inverse GFT as xi. We can then form

a filtered graph signal matrix with each of its row representing the filteredgenetic profile for the i-th patient. Run the k-NN classifier you wrote in4.1 with this filtered graph signal matrix and report your accuracies fork = 3, 5, 7. Compare the results with those obtained for the original graphsignals. What’s your observation?

We can consider a more general graph filter Hp with the followingfrequency response,

Hp(k) =

(1, if DP(vk) � p-th percentile of the distribution of DP;0, otherwise,

(14)where p is any real number between 0 and 100. In words, this family ofgraph filters keep the information conveyed in oscillation modes that aredistinguishable for two subtypes to some extent. The p here is chosen toselect the number of oscillation modes. One common choice of p is 50,and the corresponding graph filter keeps all information conveyed in the50% of the oscillation modes that have more distinguishing power andremoves information conveyed in the remaining 50% modes that cannotdifferentiate two subtypes very well. Run the k-NN classifier you wrotein 4.1 with the graph signal fed into this graph filter with your choice ofp (this choice is very robust so start with some guessing) such that theclassification error is much smaller than the classification error using theoriginal graph signals. Report your accuracies for k = 3, 5, 7 and at least5 different p of your choices for a wide range.

7

discrete sines, cosines, and complex exponentials

Documents