3 time-frequency analysis - new mexico state …jlakey/papers/book-401-530_chapter3.pdf · 48 3...

3

Time-frequency analysis

3.1 Time-frequency analysis

3.1.1 Fourier forever

Let f(t) be a mathematical idealization of some physical signal depending on time t. Perhaps f can beconsidered as a superposition of oscillating components but these oscillations have to be limited to somefinite extension in time. This is a fundamental problem of the Fourier inversion formula which states that,for a well-behaved signal f one has

f(t) =∫ ∞−∞

f(ξ)e2πitξ dξ

expressing a complex signal f as a superposition of exponentials e2πitξ. If f vanishes outside some finiteset then the exponentials, which extend over all time, must cancel each other in some fantastic way thatmakes it virtually impossible to quantify in any intuitive way which frequencies play a dominant role at anyparticular time t.

3.1.2 Frequency Local in time

Consider a physical signal to be a square integrable real-valued function of time, x(t). One can define acomplex extension z(t) = x(t)+ iy(t) bey letting y(t) be the inverse Fourier transform of −isgn (t)x(ξ) wheresgn denotes the signum function ξ/|ξ|. In this case z = x+ iy has only positive frequencies.

Exercise 3.1.1. Explain why z(t) has an extension to a complex argument t + is, s > 0 that is analyticin the upper half plane t + is : s > 0. You may assume that x(t) is a continuous, bounded, absolutelyintegrable function.

The analytic signal z has the polar form r(t)eiθ(t) where r =√x2 + y2 and θ = arctan y/x. The instan-

taneous frequency can be defined as dθ/dt. This point of view however is a little too simple because x(t)can be a superposition of multiple oscillating components and the instantaneous frequency cannot resolvemultiple oscillating contributions. We will return to this point later in this chapter. First we want to considersome fundamental issues governing the impossibility of joint time-frequency localization and, in view of theselimitations, mathematical tools that aim to characterize compositions of signals in terms of time localized os-cillations in view of such limitations. One typically refers to such tools as time-frequency representations. Aswith the case of Fourier transforms we will encounter both continuous and discrete parameter time-frequencyrepresentations.

3.1.3 The Heisenberg-Wiener inequality

Variance inequalities

Theorem 3.1.2. (Heisenberg uncertainty principle) If f ∈ L2(R) with ‖f‖2 = 1 then

‖xf(x)‖2‖ξf(ξ)‖2 ≥1

4π.

Moreover, one has equality if and only if f(x) = e−παx2

for some α > 0.

48 3 Time-frequency analysis

This inequality states that f cannot have most of its energy near zero in time or space, and most ofits energy near zero (or really any other point) in frequency. This type of inequality is called a varianceinequality because, when regarded as a continuous probability density on R, the variance of the quantity|f(t)|2 dt is defined as ‖xf(x)‖2 if

∫x|f(x)|2 dx = 0 while ‖ξf(ξ)‖2 is the variance of the density |f(ξ)|2 dξ

is∫ξ|f(ξ)|2 = 0. In quantum mechanics the Heisenberg inequality has an interpretation as joint variance of

position and momentum operators and can be construed, roughly, as saying that one cannot jointly measurethe position and momentum of a subatomic particle with arbitrary precision, a fact that was verified in thecase of an electron by photon scattering in the famoous Compton effect.

Our interest in uncertainty inequalities will take more of a macroscopic interpretation involving theformulation of a joint time-frequency picture of a signal.

Heisenberg’s inequality has an interesting and simple extension to the case of (possibly unbounded)operators on a Hilbert space H. Define the domain of a self-adjoint operator A to be the set of u ∈ H suchthat Au ∈ H.

Theorem 3.1.3. If A and B are self-adjoint operators on a Hilbert space H then, whenever u is in thedomain of both AB and of BA and a, b ∈ C one has

‖(A− a)u‖‖(B − b)u‖ ≥ 12

∣∣〈(AB −BA)u, u〉∣∣.

Equality holds precisely when (A− a)u and (B − b)u are pure imaginary scalar multiples of one another.

Exercise 3.1.4. Shows that, at least formally,

〈(AB −BA)u, u〉 = 2i=〈(B − b)u, (A− a)u〉.

Then apply Cauchy-Schwarz to conclude the theorem.

The Heisenberg inequality has a covariant form called the Robertson-Schrodinger inequality. It takes theform

(∆xj)2(∆ωj)2 ≥ 116π2

+ (Cov(xj , ωj))2

where the notation refers to covariance of operators (see, e.g. [?]).

Hermite functions

The Fourier transform exchanges differentiation and multiplication, specifically, ( ddtf)∧(ξ) = 2πiξf(ξ),writing D = 2πi

ddt and writing P (D) for a differential operator P (D) =

∑k akD

k we have, formally,F(P (D)) = P (ξ) where P (ξ) =

∑k akξ

k. Additionally, if P (t,D) is a homogeneous polynomial of degree min t and D meaning that it has the form

∑mk=0 akt

kDm−k then, also formally, F(P )(t,Dt) =∑akD

kξ ξm−k

where Dξ = i2π

ddξ since the Fourier inversion formula implies that F−1(Dξ) = t. This and the observation

about Gaussians being preserved leads to a description of L2-eigenfunctions for the Fourier transform on R.These eigenfunctions are the Hermite functions

hm(t) = 21/4−m (−1)m√πmm!

eπt2 dm

dtm(e−2πt2) (3.1)

Exercise 3.1.5. Show that hm is an eigenfunction of the Fourier transform with eigenvalue (−i)m.

Exercise 3.1.6. Explain why the Hermite functions are orthogonal with respect the the standard innerproduct on R.

It turns out that, in fact, the Hermite functions form an orthonormal basis for L2(R).

Exercise 3.1.7. Compute the moment∫th2m(t) dt.

3.1 Time-frequency analysis 49

Entropy inequality

There are a vast number of known alternative, precise mathematical statements of the fact that a functionand its Fourier transform cannot both be arbitrarily well localized. One important form has to do withinformation usually regarded in terms of entropy. For f ∈ L2(R) with ‖f‖2 = 1 one defines its entropyE(f) = −

∫|f |2 ln |f |. Entropy can take on positive or negative values (including infinity) but if f were

highly concentrated then it would have a large negative entropy. For example, if f =√N on [0, 1/N) and

zero elsewhere then (using the rule 0 ln 0 = 0) one would have E(f) = − 12 lnN whereas, if f = 1/

√N on

[0, N) and zero elsewhere then E(f) = + 12 lnN . This typical example indicates that entropy is a measure

of energy spread. A very illuminating mathematical discussion of entropy can be found in Landau [?]. Thefollowing entropy inequality was proved by Beckner [?]. It says that E(f) and E(f cannot both have largenegative values.

Theorem 3.1.8. If f ∈ L2(R), ‖f‖2 = 1 then E(f) + E(f) ≥ 12 (ln 2− 1).

Exercise 3.1.9. Compute E(g) for g(t) = e−πt2.

One can define the entropy of a vector z = (z1, . . . , zN ) similarly, namely E(z) = −∑Nk=1 |zk|2 ln |zk|.

A norm inequality for the discrete Fourier transform

We have seen that the DFT satisfies the Plancherel formula

‖z‖2 = ‖z‖2

where ‖z‖2 is the usual Euclidean norm. Now let ‖z‖p =(∑N

k=1 |zk|p)1/p for 1 ≤ p < ∞ and let ‖z‖∞ =

sup1≤k≤N |zk|.

Exercise 3.1.10. Prove that ‖z‖p defines a norm on CN , 1 ≤ p ≤ ∞ but that the triangle inequality canfail when p < 1.

Exercise 3.1.11. Explain why ‖z‖∞ ≤ 1√N‖z‖1.

Interpolation

A convexity principle known as the Riesz-Thorin interpolation theorem (e.g [?]) allows us to conclude fromPlancherel’s identity (that the DFT is unitary) and from the inequality ‖z‖∞ ≤ 1√

N‖z‖1 that

‖F(z)‖p′ ≤ N (p−2)/(2p)‖z‖p,1p

+1p′

= 1 (3.2)

whenever 1 ≤ p ≤ 2.Now define the quantity

Hp(z) =1

1− p2

lnN∑k=1

|zk|p =p

2− pln(‖z‖p‖z‖2

)Taking the logarithms of both sides of (3.2) gives

1p′

ln∑|zk|p

′≤(p− 2

p

)ln√N +

1p

ln∑|zk|p

ln√N ≤ p

(p− 2)p′ln∑|zk|p

′− p

p(p− 2)ln∑|zk|p

ln√N ≤ 1

2− p′ln∑|zk|p

′+

12− p

ln∑|zk|p

ln√N ≤ Hp(z) +Hp′(z),

which is equivalent to (3.2). Now consider Hp(z) as a function of p when z is fixed. Notice that if one has twoincreasing functions on (α, β] that are continuous and equal at β then the derivative of the smaller function


has to be at least as large as that of the larger function at β. In this case we are saying that that logarithmicderivatives of (‖z‖p′/‖z‖p) is at least the logarithmic derivative of N (2−p)/(2p) and this translates into thestatement that

−∑k

|zk|2 ln |zk| −∑k

|zk|2 ln |zk| ≥12

lnN (3.3)

which is a form of the entropy inequality for the DFT.

Exercise 3.1.12. Show that (3.3) becomes an identity in the special case when z is the constant vector allof whose entries are 1/

√N or when z = (1, 0, . . . , 0).

Fourier support properties.

If f ∈ L2(R) vanishes off a finite interval then the integral f(ζ) =∫

R f(t) e2πitζ dt converges absolutely toa differentiable function of the complex variable ζ. This means that f(ζ) is analytic and, therefore, canonly have isolated zeros in the complex plane. There are a lot of intriguing mathematical variations of thisfundamental principle concerning sets where Fourier transforms can vanish. One of the deepest versions ofthis principle is reflected in the following inequality due to Nazarov [?].

Theorem 3.1.13. There are absolute constants A > 0 and C > 0 such that for any f ∈ L2(R),∫|f |2 ≤ CAA|S||Σ|

(∫R\S|f |2 +

∫R\Σ|f |2

).

Here |S| denotes the total length of S when S can be expressed as a (possibly infinite) union of pairwisedisjoint intervals.

Exercise 3.1.14. Can a function f and its Fourier transform f both be supported on sets of the form∪∞i=1[αi, βi] such that

∑∞i=1(βi − αi) <∞? Explain.

Exercise 3.1.15. Let S = [−T/2, T/2] and Σ = [−Ω/2, Ω/2]. Compute the integrals above in the caseg(t) = e−πt

2.

Concentration inequalities

One says that f is ε-concentrated on A ⊂ R if∫

R\A |f |2 < ε2

∫R |f |

2. The following inequality was proved byDonoho and Stark.

Theorem 3.1.16. If f 6= 0 is ε-concentrated on A and f is δ-concentrated on B then |A||B| ≥ (1− ε− δ)2.

Exercise 3.1.17. Relate the Donoho-Stark concentration inequality to Nazarov’s inequality.

3.1.4 Finite Fourier inequalities

Exercise 3.1.18. Formulate a version of the Heisenberg variance inequality for the discrete Fourier trans-form.

Number theory plays an important role in the fast Fourier transform algorithm. In particular, if N = Pis a large prime number, then one cannot use any reduction argument to speed up computation of the DFT.Curiously, concentration and support properties of the finite Fourier transform depend in an equally crucialway on the composite nature ofN . Let #z denote the number of nonzero coordinates of z = (z1, . . . , zN ) ∈ CN

Exercise 3.1.19. Show that, for z ∈ CN , if ‖z‖2 = 1 then −∑|zk|2 ln |zk| ≤ ln #z.

Now one has the following corollary to (3.3).

Theorem 3.1.20. If z 6= 0, then#z #z ≥ N.

3.2 Time-frequency bases and frames 51

The entropy inequality does not tell us when this inequality becomes an identity. It turns out that equalityoccurs if and only if z is a shifted or modulated picket fence vector [?]. That is, if an appropriate shift ormodulate of z is a multiple of 1ZN/Q where Q divides evenly into N . In other words, 1ZN/Q is the vectorz such that zk = 1 if k is a multiple of Q and zk = 0 otherwise. The inequality between arithmetic andgeometric means implies that #z+#z ≥ 2

√N . One can infer a stronger inequality when P is prime (see [?]).

Corollary 3.1.21. If N = P is prime, then for z 6= 0,

#z + #z ≥ P + 1

Equality holds only when z is a modulated version of a multiple of the vector all of whose coordinates equalone.

Exercise 3.1.22. Write a matlab script to verify this statement experimentally.

Concentration inequalities can actually be used to say something about approximations from sparse data.

Exercise 3.1.23. Suppose that N = MP and that it is known that z is sparse in the sense that onlyK < minM,P entries are nonzero. What is the minimal number of coordinates of z required to reconstructz? Explain.

Exercise 3.1.24. Suppose that z is such that zk = 0 unless k = nL for some n where L divides N . Can thecomplexity of computing the DFT for such z be reduced? Explain.

The problem of estimating z based on the prior assumption that z has only a small number of nonzerocoefficients is a deep and challenging active area of applied mathematics. The work of Tao, Candes et al. [?,?]gives examples that build on earlier work of Donoho and Elad [?].

3.2 Time-frequency bases and frames

The rest of this chapter represents several aspects of the current state of the art that has evolved around tryingto make sense of representing harmonic oscillations locally in time when, in point of fact, the uncertaintyprinciple tells us that it is effectively impossible to do so. We will review a range of techniques that all, atsome level, attempt to represent a signal as a superposition of frequency information local in time. Although,at some level, all of these techniques try to accomplish the same basic task, the nuances that distinguish onetime-frequency representation from another make all the difference when it comes to particular applicationsthat put different levels of emphasis on the ability to recognize coherent structure within a signal versusthe ability to recover those significant features while setting aside features not of interest in an effective,automatic way. Because of these tools span quite a lot of mathematical technique, our treatment will be justa little beyond superficial. The goal is to give just enough detail to get a feel for the mathematical origins ofthe different methods and a taste of how the tools differ from one another and the purposes that underliethese differences.

3.2.1 Gabor bases

Fourier series provide expansions of periodic functions, but they can also be considered as local expansionsof functions in the sense that they only represent given functions for one period. Square integrable functionsare not periodic but, thought of as functions of time or space, they can have oscillating components thatemerge and decay over time or space. Sines and cosines will have some correlation with such components

Exercise 3.2.1. Let χ = χ[0,1)(x). Show that the functions ϕn,k(x) = χ(x− n) e2πikx form an orthonormalbasis for L2(R).

The functions ϕn,k are sometimes called Gabor functions (pron. ga´bor) after the Hungarian Nobellaureate Dennis Gabor. In general, if g ∈ L2(R) and if α > 0, β > 0 then the family gαn,βk(x) = e2πiβkx g(x−αn) is called a Gabor family G(g, α, β) generated by ϕ and the lattice αZ× βZ of time-frequency shifts. Thefunction g is sometimes called a Gabor window. There is a fairly well-developed theory now associated withGabor representations. Much of what was known by 2000 is discussed in Grochenig’s book [?]. We will reviewsome of those facts but we will also discuss some more recent developments.

First, there are fundamental limitations on Gabor orthonormal bases.

http://en.wikipedia.org/wiki/Dennis_Gabor


Theorem 3.2.2. (Balian-Low) Suppose that G(ϕ, α, β) forms an orthonormal basis for L2(R). Then thetime-frequency variance product ‖xg(x)‖2‖ξg(ξ)‖2 =∞

This tells us that the Gabor window cannot have good time-frequency localization. We can ask whetheran overcomplete Gabor representation can have good time-frequency localization (finite time-frequency vari-ance). This time we are in luck, but we need a little basic machinery to describe the main result and how itcan be applied.

Frames

Given a separable Hilbert space H, a countable subset fn is called a frame for H provided that there areconstants A,B such that for any f ∈ H one has

A‖f‖2 ≤∑n

|〈f, fn〉|2 ≤ B‖f‖2.

These inequalities imply that the frame operator is bounded and continuously invertible. Frames are neces-sarily complete sets but typically they are overcomplete or redundant. For example, any three unit vectorsin R2 that differ from one another by a rotation by 2π/3 will form a frame for the Hilbert space R2. In thecase of Gabor systems G(g, α, β) one defines the frame operator

Sg,α,βf =∑n,k∈Z

〈f, gαn,βk〉gαn,βk.

To g one can assign a canonical dual window γ = S−1g,α,βg and then one has the reproducing formula

f =∑n,k

〈f, γαn,βk〉 gαn,βk =∑n,k

〈f, gαn,βk〉 γαn,βk.

In this case one can say that f is expressed in a natural way as a superposition of time-frequency localizedGabor atoms if g (or γ) is time-frequency localized. At this stage we just have a couple of minor problems.First, what does γ look like, and second, if g is time-frequency localized in a suitable sense then will γ belocalized in the same sense? A third issue come in determining conditions on g, α, β such that one has aGabor frame. When g is the Gaussian function g(x) = e−πx

2one has the following frame density criterion.

Theorem 3.2.3. (Seip and Lyubarskii) When g(x) = e−πx2

the family G(g, α, β) forms a frame for L2(R)if and only if αβ < 1.

The product αβ can be regarded as a time-frequency density in the sense that 1/(αβ) is the number ofGabor time-frequency shifts per unit area. This has to be at least one in order that the family is complete inthe case of a Gaussian. Overcompleteness implies that typically there will be more than one dual function γto a Gabor frame generator g. The dual functionγ = S−1

g,α,βg is called the canonical dual. Except for a minortechnical condition, Wexler and Raz [?,?] characterized the Gabor duals in the analogous finite frame casethat will be discussed in a minute. In the case of Gabor frames for L2(R) this characterization was carriedout rigorously by Daubechies, H. Landau and Z. Landau as follows.

Theorem 3.2.4. (Wexler-Raz) The pair (g, γ) is a pair of dual Gabor windows in the sense that Sg,γ,α,βf =∑n,k〈f, γαn,βk〉 gαn,βk is the identity operator on L2(R), if and only if

1αβ

⟨γn/α,k/β , gm/α,`/β

⟩= δmnδ`k.

The frame operator Sg,α,β itself can be expressed as the composition of a coefficient mapping Tg,α,β(f) =〈f, gαn,βk〉 with its adjoint T ∗cnk =

∑n,k cn,k gαn,βk. The Gabor expansion of f then has the form

f = T ∗γ,α,β Tg,α,β . As noted, the main difficulty with this expansion is the computation of the dual functionγ. A very important consequence of the Wexler-Raz identity is that the canonical dual is the same as theso-called Wexler-Raz dual which is defined as follows:

γ = αβ T ∗g,1/β,1/α(Tg,1/β,1/αT ∗g,1/β,1/α)−1e0,0 (3.4)

where e0,0 is the coefficient sequence on Z × Z such that e0,0(n, k) = 1 if n = k = 0 and e0,0(n, k) = 0otherwise. What is important about the formula (3.4) is that it allows for a discrete calculation of the dualfunction γ. This calculation can be performed numerically as follows.


Proposition 3.2.5. (Neumann series expansion) Suppose that S is an operator on a Hilbert space H suchthat 0 < S < I is the sense that for any, for any x ∈ H one has 0 < 〈(I − S)x,x〉 < ‖x‖2. Then one canwrite

S−1 =∞∑k=0

(I − S)k. (3.5)

Formally, (3.5) is the same as the geometric series expansion 1x =

∑∞k=0(1− x)k when 0 < x < 1 with S

substituted in for x. In the case of the operator Tg,1/β,1/αT ∗g,1/β,1/α one has, at least formally

(Tg,1/β,1/αT ∗g,1/β,1/α)−1e0,0 =2/(αβ)A+B

∞∑k=0

(I − 2/(αβ)

A+BTg,1/β,1/αT

∗g,1/β,1/α

)ke0,0 = lim

K→∞eK

which allows one to compute eK recursively as

e0 =2/(αβ)A+B

e0,0, and eK+1 = e0 + eK − Tg,1/β,1/αT ∗g,1/β,1/αeK

which in turn allows one to writeγ ≈

∑n,k

µKn,kgn/β,k/α

where the coefficients µKn,k are also defined recursively.

Exercise 3.2.6. Find a recursive expression defining the coefficients µKn,k.

Discrete implementations of Gabor expansions

The linear time-frequency analysis toolbox was developed by the numerical harmonic analysis group inVienna. It contains utilities for computing discrete Gabor transforms from sampled data along with othertime-frequency utilities that will be discussed below. The same sorts of issues of time-frequency localizationin the continuous case of functions defined on R are present in the finite case but the context is different.Given a sample vector x of length L one has to ask: on what normalized time (or space) interval was thesignal sampled? Equivalently, what was the sample rate? For example, in the case of speech signals thesample rate might be 11025 or 22050 or 44100 samples per second, the latter determined by the response ofthe human auditory system and not necessarily by the real analogue signal generating the data. The samplerate then represents the finest time scale in the data. The Gabor shift parameter α then has to be encoded asa fraction of a unit of time, hence as a given number of samples. Thus one replaces the time-shift parameterα (in seconds, say) by the sample shift parameter a = α× Fs (in samples) where Fs denotes the number ofsamples per second. Next one has to consider the frequency shift parameter β. In standard DFTs the rowsof the DFT matrix are powers of the vector e−2πijk/N with N the length of the (possibly zero-padded)signal. Thus the normalized frequencies go up to the number of samples. In principle, the signal of interest isbandlimited to Fs/2 and this can be reflected in the discrete Gabor transform. Then β can be regarded as afraction of the normalized frequency and one can express β/Fs = 1/M as the number of Fourier modes thatwill be considered in the windowed data. Thus, the discrete Gabor transform of data x with smaple indicesrunning from 0 to L− 1 will take the form

c = DGT (x,g, a,M)

c(n+ 1, k + 1) =L−1∑`=0

x(`)e−2πi`k/M g(`− an+ 1)

k = 0, . . . ,M − 1;n = 0, . . . N − 1 = L/a− 1.

The ltfat syntax is

[c,Ls]=dgt(f,g,a,M);fr=idgt(c,gd,a,Ls)

http://www.univie.ac.at/nuhag-php/ltfat/


where f is the input vector x having Ls entries, g is the Gabor window with Gabor dual gd, a is the sampleshift parameter and M is the normalized frequency parameter which is referred to in the ltfat documentationas the number of channels. Practically, it is the effective number of Fourier modes to be considered in thewindowed data. Gabor window and dual window design for finite implementations is discussed briefly in theltfat help. in Figure 3.1 the Gabor transform of the signal buellershort is plotted on a log intensity scale.The parameters were chosen as follows in order to give a fairly robust time-frequency picture.

g=pgauss(1024,256,512);[c,Ls]=dgt(x,g,2,1024);

In general, the more robust the picture, the more redundancy required. This means small a and large M .Minimum redundancy would require large a and small M ; aM = 1 being the smallest possible choice. Oneother thing worth mentioning is that the DGT treats data as being a single period of a periodic sequence.The window functions used are also periodic so that there are no edge effects.

500 1000 1500 2000 2500 3000 3500 4000

50

100

150

200

250

Fig. 3.1. Log scale intensity plot of Gabor transform of bueller signal of length L = 8192.

3.2.2 Compression using Gabor transforms

Having a discrete implementation of the Gabor transform allows one to perform signal processing tasks suchas compression and denoising. The upper left part of Figure 3.1 indicates the presence of frequencies in thefirst half of the signal that are not present in the second half. The lower part indicates the presence of threeparallel harmonics with frequencies that gradually increase over time.

The Gabor transform is not so useful for compression of information that is dispersed in the time-frequencyplane such as the bueller speech signal. See Figures ?? and 3.3 where the sparser picture just shows thattop ten percent magnitude terms. The problem is that one has to compute a rather larger number of Gaborcoefficients to get a coherent time-frequency picture. In the given example, L = 8192, a = 8 and M = L/aso there are (L/a)2 ≈ 106 Gabor coefficients as opposed to only 103 samples, so this hardly amounts tocompression. On the other hand, the Gabor transform could be a useful means of denoising signals when thenoise is evenly diluted over the time-frequency plane. See Appendix 3.7.1 for matlab code.


100 200 300 400 500 600 700 800 900 1000

50

100

150

200

250

300

350

400

450

500

Fig. 3.2. Log scale intensity plot of Gabor transform ofbueller signal of length L = 8192.

100 200 300 400 500 600 700 800 900 1000

50

100

150

200

250

300

350

400

450

500

Fig. 3.3. Log scale intensity plot of Gabor transform oftop ten percent of Gabor coefficients.

3.2.3 Denoising using Gabor transforms

To test the denoising performance of Gabor transforms we added uniform random noise to the signalbuellershort with amplitude about one fifth of the original signal amplitude. Of course, results will varywith intensity and structure of the noise. To denoise we took the Gabor transform of the noisy signal asshown in Figure 3.5 (left) and zeroed out all coefficients not in the top 10 percent magnitude. The residualis compared to the noise in Figure 3.6. This type of denoising is the same as we applied when using theFourier transform for denoising the same signal. The result sounds much better in the Gabor case becausethe Gabor transform isolates the noise from the spectrum of the signal locally in time in the Gabor case.

100 200 300 400 500 600 700 800 900 1000

50

100

150

200

250

300

350

400

450

500

Fig. 3.4. Log intensity plot of Gabor transform of noisybueller signal.

100 200 300 400 500 600 700 800 900 1000

50

100

150

200

250

300

350

400

450

500

Fig. 3.5. Log intensity plot of top ten percent of noisyGabor coefficients.

[use noisy and show soft thresholding/shrinkage]

3.2.4 Short-time Fourier transform and Spectrograms

Gabor coefficients are integrals of the form

S(f, g)(x, ξ) =∫f(t)g(x− t)e−2πitξ dt (3.6)

where g(x) is actually replaced by its reflection g(x) = g(−x) and x takes the value nα and ξ the valuekβ. However, if one is willing to compute all of the values then one ends up with the short-time Fouriertransform S(f, g)(x, ξ) which is a mapping from a function f(t) to a function of the variable (t, ξ). As an


1800 2000 2200 2400 2600 2800 3000 3200 34000

0.05

0.1

0.15

2000 2200 2400 2600 2800 3000 3200 3400

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Fig. 3.6. Top plot shows noise added to Gabor signal Bottom plot shows residual cleaned signal from top 10 percentof Gabor coefficients minus noisy bueller signal. This residue looks much the same as the noise and the cleaned signalsounds much like the original bueller signal.

integral it is linear in f . In fact it is also (conjugate) linear in g but one usually regards g as a fixed windowfunction then regards f 7→ Sg(f) = S(f, g) as a linear mapping. The short-time Fourier transform satisfiesa remarkable inversion property. Suppose that ‖g‖2 = 1. Then

f(t) =∫ ∫

S(f, g)(τ, ξ) g(t− τ)e2πiτξ dτ dξ (3.7)

The inversion formula is very similar, on the one hand to the Fourier inversion formula (corresponding to thecase where g is replaced by the Dirac point mass δ here) as well as to the Gabor representation formula –in the limit as the time and frequency shift parameters tend to zero. Formula 3.7 is interpreted in the senseof convergence in L2(R). We will not justify the formula rigorously (see [?]) but we will give a formal proofbased on the formal identity

∫e2πitξ dt = δ. Then∫ ∫

S(f, g)(τ, ξ) g(t− τ)e2πitξ dτ dξ

=∫ ∫ ∫

f(s)˜g(τ − s)e−2πisξ ds g(t− τ)e2πitξ dτ dξ

=∫f(s)

∫g(s− τ)g(t− τ)

∫e−2πi(s−t)ξ dξ dτ ds

=∫f(s)

∫g(s− τ)g(t− τ)δ(s− t) dτ ds

= f(t)∫|g(t− τ)|2 dτ = f(t).

In fact, one can show that S(f, g) is energy preserving in the sense that, when ‖g‖ = 1,∫ ∫|S(f, g)(t, ξ)|2 dt dξ =

∫|f(t)|2 dt.


3.2.5 The time-frequency toolbox (tftb)

The Time-frequency toolbox is a collection of time-frequency representation tools that was developed atCentre National de la Recherche Scientifique (CNRS) in France. While matlab has a built in spectrogramfunction in its signal processing toolbox, this function is proprietary. tttftb has substantial overlap with ltfatit has many additional time-frequency representation utilities with the primary goal of providing good toolsfor analyzing – as opposed to processing – data. The tftb utility for computing spectrograms is called tfrsp.Because of the highly redundant nature of the spectrogram, a typical laptop would not be able to processtfrsp of a signal like buellershort which has 8192 samples. One possibility is first to downsample. Thematlab signal processing toolbox has a utility for downsampling. But it is simple enough to write a script.

d=2; % downsample ratefor i=1:floor(length(x)/d)xdownsampled(i)=x(d*i);end

Figure 3.7 shows a full spectrogram of the bueller signal downsampled by a factor of two. The pictureis relatively clean compared to the Gabor pictures because of the higher redundancy. With x the vector ofbueller data, the picture was produced as follows.

xdownsampled=xdownsampled’[tfr,t,f]=tfrsp(xdownsampled);imagesc(log(1+10*abs(tfr)));

200 400 600 800 1000 1200 1400 1600 1800 2000

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

Fig. 3.7. Log intensity spectrogram of bueller signal computed using tfrsp.

3.2.6 Time-frequency representations

Wigner distribution

In the von-Neumann model for quantum mechanics particles are modeled as so-called states – unit vectorsin a separable Hilbert space (say L2(R) – with |ϕ(t)|2 dt thought of as a probability density. Then a quantity

http://tftb.nongnu.org/


like∫t|ϕ(t)|2 dt can be thought of as the expected location of a particle modelled by ϕ. Wigner sought a

mathematical object to model the joint distribution of such a density in time and frequency. Such a jointdensity W (f) should satisfy, among other things,

• TF1 W is bilinear and W (f, f) ≡W (f)• TF2 If W (f) = W (g) then f = cg for some c ∈ C with |c| = 1.• TF3 W (f)(t, ξ) ≥ 0• TF4

∫W (f)(t, ξ) dξ = |f(t)|2,

∫W (f)(t, ξ) dt = |f(ξ)|2

• TF5 If f is supported in [a, b] then W (f) is supported in [a, b]× R

It turns out that these properties are incompatible. Wigner proposed as a substitute a mapping (f, g) 7→W (f, g), the so-called Wigner distribution, that satisfies all properties except (TF3), but requires insteadthat W (f) be real-valued. The Wigner distribution is defined as

W (f, g)(t, ξ) =∫

e−2πiτξf(t+

τ

2)g(t− τ

2)dτ (3.8)

Exercise 3.2.7. With W (f) = W (f, f) verify properties (TF2), (TF4) and (TF5).

Exercise 3.2.8. Show that for g = e−πt2, W (g)(t, ξ) = e−π(t2+ξ2).

Exercise 3.2.9. Denote Daf(t)(t) =√af(at), Ebf(t) = e2πibtf(t) and Taf(t) = f(t − a). Compute

W (Da)(f)(t, ξ), W (Ebf)(t, ξ) and W (Taf)(t, ξ). Also compute W (f)(t, ξ) where f is the Fourier transformof f . Express your answers in terms of W (f)(t, ξ).

Exercise 3.2.10. (Hard) Show that shifted, dilated and modulated Gaussians are the only L2 functionshaving completely nonnegative Wigner distributions.

Moyal’s formula

The Wigner distribution is a unitary mapping from L2(R) to L2(R2, a fact that is known as Moyal’s formula.

Theorem 3.2.11. If f1, f2, g2, g2 ∈ L2(R) then

〈f1, f2〉〈g1, g2〉 = 〈W (f1, f2), W (g1, g2)〉.

On a formal level the proof is much the same as that of the inversion formula for the short-time Fouriertransform, that is, it involves changes of order of integration and the formal identity

∫e2πixξ dξ = δx.

〈W (f1, f2), W (g1, g2)〉 =∫ ∫ ∫

e−2πiτ1ξf1

(t+

τ12)g1

(t− τ2

2)dτ1

∫e2πiτ2ξf2

(t+

τ22)g2

(t− τ2

2)dτ dξdt

=∫ ∫

f1

(t+

τ12)f2

(t+

τ22)g2

(t− τ2

2)g1

(t− τ2

2) ∫

e−2πi(τ1−τ2)ξ dξ dτ1 dτ2dt

=∫ ∫

f1

(t+

τ12)f2

(t+

τ22)g2

(t− τ2

2)g1

(t− τ2

2)δτ1=τ2=τ dτ2 dτ1dt

=∫ ∫

f1

(t+

τ

2)f2

(t+

τ

2)g2

(t− τ

2)g1

(t− τ

2)dτdt

=∫ ∫

f1(u)f2(u)g2(u− τ)g1(u− τ) dτdu = 〈f1, f2〉〈g1, g2〉

by using the substitution u = t+ τ/2. This formally proves Moyal’s identity.Figure 3.8 shows log scale absolute value image of a Wigner distribution of buellershort. Cross-term

interference is very pronounced in this picture. It illustrates the fact that good mathematical properties donot necessarily correspond to a clean graphic.


200 400 600 800 1000 1200 1400 1600 1800 2000

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

Fig. 3.8. Log intensity Wigner distribution of bueller signal computed using tfrwv.

3.2.7 Wigner distribution and spectrogram

The spectrogram of f with window g is Spec(f, g)(t, ξ) = |S(f, g)(t, ξ)|2. It is an energy distribution in thesense that it is nonnegative and has the property that

∫ ∫|S(f, g)(t, ξ)|2 dt dξ = ‖f‖2‖g‖2. However, unlike

the Wigner distribution it typically will not generate the correct marginal values when integrating over oneof the variables. Here, the Wigner distribution has another remarkable property.

Proposition 3.2.12. The spectrogram Spec (f, g)(t, ξ) is the bivariate convolution of the Wigner distribu-tions of f and g, that is,

Spec (f, g)(t, ξ) =∫ ∫

W (f)(s, η)W (g)(t− s, ξ − η) ds dη.

Proposition 3.2.12 follows from taking f1 = f = 2 = f and g1 = g2 = g in Moyal’s identity and using thetime-frequency shift covariance properties in Exercise 3.2.9.

Since convolution is a sort of averaging this says that SpecS(f, g)(t, ξ) is an averaged or smoothedversion of W (f). In particular, when g = e−πt

2, W (g) = e−π(t2+ξ2) and Spec (f, g)(t, ξ) then represents

a Gaussian averaging of W (f). Remarkably, Spec (f, g)(t, ξ) is always nonnegative even though W (f) israrely completely nonnegative. This observation suggests that it might be possible to obtain time-frequencydistributions having other desirable properties, i.e. substituting nonnegativity for some other property, byreplacing the time-frequency kernel K(s, t; η, ξ) = W (g)(t− s, ξ− η) by some other time frequency kernel K.When K has the form K(s, t; η, ξ) = P (t− s, ξ − η) the resulting time-frequency distribution

CP (f, g)(t, ξ) =∫ ∫

W (f)(s, η)P (t− s, ξ − η) ds dη (3.9)

is said to be a Cohen class distribution after its inventor Leon Cohen. The Wigner distribution correspondsto P (t−s, ξ−η) = δstδξη. When P (t, ξ) = g(t)H(−ξ) CP is called a pseudo smoothed Wigner distribution andwhen g is Gaussian it is the same as the spectrogram with window g, but in general it is not a spectrogram.The utility of Cohen’s class kernel generally will depend on what application the user has in mind, suchas reducing interference among signal components in the time-frequency plane. In this case, it is useful todesign a kernel having specific properties. How to do so is discussed in the tftb reference manual. The tftbtoolbox has a variety of time-frequency distribution tools and the general usage is


[tfr,t,f]=tfrname(signal);

where tfrname is the name of the distribution. For example, the spectrogram name is tfrsp. Some alternativetime-frequency distributions are provided in Figure 3.10.

200 400 600 800 1000 1200 1400 1600 1800 2000

100

200

300

400

500

600

700

800

900

1000

Fig. 3.9. Log intensity plot smoothed pseudo Wigner dis-tribution of bueller signal.

200 400 600 800 1000 1200 1400 1600 1800 2000

100

200

300

400

500

600

700

800

900

1000

Fig. 3.10. Log intensity plot Choi-Williams distributionof bueller signal.

3.2.8 Time-frequency reassignment

Because of the uncertainty principle, no reasonable time-frequency distribution can localize energy in anideal manner. To some extent, methods to clean up time-frequency representations amount to attempts toanswer: what should an ideal tie-frequency representation of the data look like? Time-frequency reassignmentamounts to one such method. Recall that the expected value of a random variable is E(p) =

∫xp(x) dx. In

the case of a time-frequency distribution one can also define the expected value around a point. In the caseof the spectrogram one can define the center of gravity of the time-frequency distribution around (t, ξ) as

t(f ; t, ξ) =∫ ∫

sW (f)(t− s, ξ − η)W (g)(s, η) ds dη|S(f, g)(t, ξ)|2

ξ(f ; t, ξ) =∫ ∫

ξW (f)(t− s, ξ − η)W (g)(s, η) ds dη|S(f, g)(t, ξ)|2

.

One then defines the reassigned spectrogram RSpec (f, g) by

RSpec (f, g)(t, ξ) = Spec (f, g)(t, ξ).

The reassigned spectrogram will no longer be a bilinear mapping since expected values of the distributionsare not linear. On the other hand, remarkably in the case of the spectrogram, some of the other covarianceproperties are linear in expectation and thus are preserved.

Although the definition of RSpec suggests a terribly complex implementation, the reassignment map thatsends (t, ξ) to (t, ξ) actually simplifies dramatically, [?] (??).

Proposition 3.2.13. Let φ(t, ξ) = φ(f, g)(t, ξ) = argS(f, g)(t, ξ) denote the phase of the short-time Fouriertransform of f . Then

t = −∂φ∂ξ

(t, ξ)

ξ = ξ +∂φ

∂t(t, ξ)

Numerical implementation using these rules is not completely effective so one substitutes instead thefollowing

t ≈ t−<S(f, tg(t))(t, ξ)S(f, g)(t, ξ)

|S(f, g)(t, ξ)|2

ξ ≈ ξ −=S(f, ddtg(t))(t, ξ)S(f, g)(t, ξ)

|S(f, g)(t, ξ)|2

3.3 Return to time-frequency orthonormal bases: local trigonometric bases and Wilson bases 61

Exercise 3.2.14. Comment on the quality of approximation using these formulas.

The tftb command for reassigned spectrograms is tfrrsp. The tftb has reassignment tools for a numberof time-frequency distributions beyond the spectrogram.

[tfr,rtfr,bar]=tfrrname(signal);

where tfrrname refers to the time-frequency distribution name, for example, tfrrsp in the case of thespectrogram. The extra ‘r’ is for reassigned. The vector ‘bar’ refers to the local centroid mapping.

Exercise 3.2.15. Give some reasons why reassignment might not be a good idea for transforms like Gabortransforms that are not highly redundant.

Figure 3.12 illustrates a reassigned spectrogram and tfrrspwv of the bueller signal.

200 400 600 800 1000 1200 1400 1600 1800 2000

100

200

300

400

500

600

700

800

900

1000

Fig. 3.11. Log intensity plot of reassigned spectrogram ofbueller signal.

200 400 600 800 1000 1200 1400 1600 1800 2000

100

200

300

400

500

600

700

800

900

1000

Fig. 3.12. Log intensity plot of reassigned smoothedpseudo Wigner distribution of bueller signal.

3.3 Return to time-frequency orthonormal bases: local trigonometric bases andWilson bases

3.3.1 Wilson bases

The Balian-Low theorem 3.2.2 says that if e2πiktg(t− n) forms a Riesz basis for L2(R) then tg(t) /∈ L2 orξg(ξ) /∈ L2(R). It came as a surprise then when K. Wilson suggested the possibility of finding an orthonormalbasis for L2(R) consisting of alternating windowed sines and cosines as follows. Set

ψnk =

√

2w(t− n/2) if k = 0, n even2w(t− n/2) cos 2πkt k = 1, 2, 3, . . . , n even

2w(t− (n+ 1)/2) sin 2π(k + 1)t k = 0, 1, 2, . . . , nodd

This says that on consecutive intervals of unit length (the origin is slightly exceptional here), one alternatesbetween sines and cosines as the local basis elements. The technique for designing an appropriate windowfunction is a bigger excursion than we want to take at this stage. Utilities for constructing discrete orthonor-mal Wilson bases can be found in the linear time-frequency analysis toolbox ltfat. In Figure 3.13 we haveplotted the lower half of the Wilson transform of x = buellershort as follows.

gamma=wilorth(128,1024); % orthogonal windowc=dwilt(x,gamma,64);

The window length and channel parameters govern the time versus frequency localization of the Wilsonbases elements and the Wilson transform respectively. The Wilson transform plot exhibits essentially thesame structure as a nonredundant Gabor transform, cf. Figure 3.1 but perhaps with better time-frequencylocalization.


10 20 30 40 50 60

70

80

90

100

110

120

Fig. 3.13. Log intensity plot of Wilson coefficients of bueller.

3.3.2 Local trigonometric bases

Because the window function can have exponential decay in both time and frequency, the Wilson basis hasexcellent time-frequency localization tradeoffs. A more flexible construction but one for which the windowfunction has compact support in time is the so-called local trigonometric bases. We have already seen inessence one form of this construction with the bell functions in Chapter 2. Here we want to consider a similarbell function construction but instead of the dilation condition

∑∞j=−∞ b2(2jξ) = 1 we want b to satisfy the

shift condition∞∑

n=−∞b2(t− n) = 1. (3.10)

More general constructions are outlined in [?,?] among others.As in Chapter 2, let φ(x) be a nonnegative, symmetric function supported in [−1, 1] having integral π/2.

Define θ(x) =∫ x−∞ φ(t) dt so θ(x) − π/4 nondecreasing, antisymmetric and has lower and upper bounds

±π/4. In particular, θ(−x) = π/2 − θ(x). Set θε(x) = θ(x/ε), sε(x) = sin θε(x) and cε(x) = cos θε(x). Thensε(x) = 0 if x < −ε, sε(x) = 1 if x > ε, sε(0) = 1/

√2 and sε(−x) = cε(x) as is easily checked by properties of

sine and cosine. Now set b(ξ) = s1/2(ξ)c1/2(ξ − 1). Since s1/2(x) = 0 if x < −1/2 and c1/2(x) = 0 if x > 1/2it follows that b(x) = 0 outside of [−1/2, 3/2].

Proposition 3.3.1. The function b(x) just defined satisfies (3.10).

Proof. We will just check that b2(x) + b2(x − 1) = 1 on [1/2, 3/2], the interval where these two overlap.In general, only two consecutive shifts will overlap and these overlaps will follow the same pattern as on[1/2, 3/2]. First, b(x) = c1/2(x − 1) on 1/2 ≤ x ≤ 3/2 while b(x − 1) = s1/2(x − 1) on 1/2 ≤ x ≤ 3/2.Therefore, for 1/2 ≤ x ≤ 3/2 one has

b2(x) + b2(x− 1) = c21/2(x− 1) + s21/2(x− 1) = cos2(θ1/2(x− 1)) + sin2(θ1/2(x− 1)) = 1

because cos2 θ + sin2 θ = 1. It is important here that the same function θ1/2(x − 1) is input into sine andcosine so that the Pythagorean identity can be applied.

3.3 Return to time-frequency orthonormal bases: local trigonometric bases and Wilson bases 63

Theorem 3.3.2. The functions enk =√

2b(x − n) cos((k + 1

2 )π(x − n)), n ∈ Z and k = 0, 1, . . . form an

orthonormal basis for L2(R).

Proof. One must show that overlapping functions are orthogonal. We prove this for the case when one ofthe functions lives on [−1/2, 3/2] so n = 0. The integral defining 〈e0k, em`〉 = 0 unless m = 0 or m = ±1.We will consider the case m = 0 and leave the other cases as an exercise.

〈e0k, e0`〉 = 2

(∫ 1/2

−1/2

+∫ 3/2

1/2

b2(t) cos (k +

12

)πt cos (`+12

)πt

).

The first integral can be split into∫ 0

−1/2+∫ 1/2

0. On [−1/2, 0], b(t) = s1/2(t) and on [0, 1/2) b(t) = c1/2(t).

Also, cosine is an even function so∫ 0

−1/2

+∫ 1/2

0

b2(t) cos (k +

12

)πt cos (`+12

)πt =∫ 1/2

0

(s21/2(t) + c21/2(t)) cos (k +

12

)πt cos (`+12

)πt

=∫ 1/2

0

cos (k +12

)πt cos (`+12

)πt

by the Pythagorean identity. For the integral∫ 3/2

1one uses the fact that cos (k + 1/2)πt = − cos (k + 1/2)π(2− t)

and that, on [1, 3/2), b(t) = c1/2(t− 1) to write∫ 3/2

1

b2(t) cos (k +12

)πt cos (`+12

)πt =∫ 3/2

1

c21/2(t− 1) cos (k +12

)π(2− t) cos (`+12

)π(2− t)

=∫ 1

1/2

c21/2(1− t) cos (k +12

)πt cos (`+12

)πt

=∫ 1

1/2

s21/2(t− 1) cos (k +

12

)πt cos (`+12

)πt

where we have used the fact that cε(−u) = sε(u). Therefore, adding∫ 3/2

1/2to∫ 1

1/2and using the Pythagorean

identity we end up with the integral of the cosine terms alone over [1/2, 1). Altogether, then,

〈e0k, e0`〉 = 2∫ 1

0

cos (k +12

)πt cos (`+12

)πt

= 2∫ 1

0

sinπkt sinπ`t dt =∫ 1

0

[cosπ(k − `)t− cosπ(k + `)t

]dt =

12δk`

which was to be shown. The completeness of the system enk follows from completeness of the trigonometricsystem over each of the unit intervals [n, n+ 1).

Exercise 3.3.3. Show that 〈e0k, e1`〉 = 0 for all k and `. Explain why, in general, all inner products〈enk, em`〉 can be reduced to calculating inner products on [0, 1] to conclude that the enk form an or-thonormal family.

Though we will not go into detail here, local trigonometric bases can in fact be adapted to any partition ofthe real line into intervals In whose left endpoints tn form a strictly increasing sequence such that limn→±∞ =±∞. The main idea is to use a sequence εn of cutoffs in such a way that the bells bn = sεn(t−tn)cεn+1(t−tn+1)satisfy

∑n b

2n = 1 and such that the local trigonometric basis elements alternate polarities at the endpoints

in the same sense as the cosines just considered.

3.3.3 Discrete implementations

Discrete implementations of the local trigonometric bases are often called Malvar bases because H. Malvarwas the first to use them in signal processing applications. The really amount to nothing other than sampledversions of the local trigonometric bases. Coefficient pictures produced by discrete analogues of the systems


enk just discussed will look very much like the corresponding Wilson basis pictures such as in Figure 3.13.As will be discussed in more detail later, it is possible to use recursive decision trees to decide whether to splita given interval for local trigonometric analysis into two subintervals with corresponding decompositions foreach subinterval. Such splittings give rise to families of local trigonometric bases indexed by interval splittingsand one can ask which, among this family of bases, represents given data in the most efficient way. Efficiencywill be discussed in Chapter 5. The local trigonometric functions are called cosine packets and routines forimplementing cosine packet analysis can be found in the Stanford WaveLab package WaveLab802. Figure3.14 shows the pattern of nonzero cosine packet basis coefficients in an analysis of buellershort. Althoughthe intensity does not show up in this scheme, the pattern essentially follows that of the other time-frequencyanalysis tools.

0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

Phase plane: CP Best Basis; Bueller

Time

Frequency

Fig. 3.14. Cosine packet analysis of Bueller signal.

3.4 Sampling and time-frequency localization

The theory of functions that are nearly jointly in the range of PΩ and QT was developed in a seminal seriesof papers by Landau, Slepian and Pollack in the 1960s, [?]. We want to take a slightly different approachfollowing more recent work of Khare and George [?] and of Walter et al., e.g. [?]. The goal is to giveintuitive and computationally useful meaning to, though not an actual proof of, a result that was stated andproved rigorously by Slepian et al. [?]. It says, in effect, that the space of functions essentially timelimitedto [−T/2, T/2] and essentially bandlimited to [−Ω/2, Ω/2] has dimension essentially TΩ. In other words,the dimension of the space of essentially time -and -bandlimited signals is proportional to the area of thetime-frequency region. Further discussion of theoretical results as well as other numerical approaches can befound in Hogan and Lakey [?].

The operator PΩ projects onto bandlimited functions so one can express it in terms of integration againstthe sinc function. Setting P = P1 we have

Pf(x) = f ∗ sinc (x) =∫ ∞−∞

f(x− t) sinπtπt

dt.

3.4 Sampling and time-frequency localization 65

If g = QT f is in the image of the operator QT then g is not bandlimited but PQT f is and we can write

PQT f(x) = QT f ∗ sinc (x) =∫ T/2

−T/2f(t)

sinπ(x− t)π(x− t)

dt.

If f itself is in PW then we can apply the sampling theorem f =∑f(k)sinc (x− k) and write

PQT f(x) = QT f ∗ sinc (x) =∫ T/2

−T/2

∑k∈Z

f(k)sinπ(t− k)π(t− k)


dt

=∑k∈Z

f(k)∫ T/2

−T/2

sinπ(t− k)π(t− k)


dt =∑`∈Z〈f(`), sT (k, x)〉`2

where sk,T is the partial correlation

sT (k, x) =∫ T/2

−T/2sinc (t− k) sinc (t− x) dt.

Of course, when T → ∞ this converges to δx−k and one recovers the sampling theorem. Now consider thecase in which f = ϕn is the n-th eigenfunction of the operator PQT . Here we make use of the fact that theoperator PQT has a discrete spectrum λ0 ≥ λ1 ≥ . . . ↓ 0 as a self-adjoint operator on the Hilbert space PW.If ϕ is an eigenfunction of PQT with eigenvalue λ then

PQTϕ(m) = λϕ(m) =∑`∈Z〈ϕ(`), sT (`,m)〉`2 .

In other words, the sample vector vλ = ϕ(m) is a λ-eigenvector of the matrix AT (m, `) = sT (`,m). Thismeans that the eigenvalue/eigenvector problem for the prolate spheroidal wave functions can be reduced tothat of the discrete matrix A = Am,`.

Exercise 3.4.1. Fix T be be a fairly large even integer, say T = 10. Estimate the entries of the matrix ATby first approximating the sinc function by means of Taylor polynomials centered at the origin of sufficientlyhigh order, on the one hand, and by using Legendre polynomials on the other. Then compute the svd of thematrix to obtain approximate sample eigenfunctions and plot several of them.

Now consider the operator QTPQT . The only difference between this operator and PQT is that theelements of the range of PQT are now truncated to [−T/2, T/2] so the eigenfunctions can be considered aseigenfunctions of PQT restricted to [−T/2, T/2]. This is no great observation, but here is a surprising one.

Proposition 3.4.2. The eigenfunctions of PQT are orthogonal on the whole real line and also on the interval[−T/2, T/2], that is, if ϕn is the λn eigenfunction of PQT then∫ T/2

−T/2ϕn(t)ϕm(t) dt = λnλmδnm = λnλmδnm

∫ ∞−∞

ϕn(t)ϕm(t) dt

Proof. Orthogonality on all of R follows from the fact that eigenvectors coming from different eigenvalues areorthogonal. To prove orthogonality on [−T/2, T/2] we use Parseval’s theorem together with an interestinglittle fact that the eigenfunctions are, in a sense, invariant under the Fourier transform. First, if c = TΩ, thetime-frequency area associated with PΩQT , then, setting fα(x) = f(x/α)/

√α

PαΩQT/αfα(x) = PαΩ((QT f)α)

= ( (QT f)α(1)[−αΩ/2,αΩ/2])∨

= (√α(QT f)(αξ)(1)[−αΩ/2,αΩ/2])∨

=√α(((QT f)(1)[−Ω/2,Ω/2)1/α)∨

= (PΩQT f)α.


In other words, dilation in a sense commutes with time-frequency localization, and the eigenfunctions of theoperator PαΩQT/α have the form ϕα where ϕ is an eigenfunction of PωQT f .

What does the Fourier transform do to a λ-eigenfunction? Well,

λ(QTϕ)∧(ξ) = (QTPΩQTϕ)∧(ξ)= PT (PΩQTϕ)∧(ξ)= PTQΩ(QTϕ)∧(ξ)

where QΩg(ξ) = g(ξ)1[−Ω/2,Ω/2](ξ). This says that (QTϕ)∧(ξ) is an eigenfunction of the operator PTQΩ =(PΩQT )TΩ and, by what we just observed it follows that the eigenfunctions of PTQΩ are unitary dilationsby a factor T/Ω of the eigenfunctions of PΩQT which tells us that (QTϕ)∧(ξ) = ϕT/Ω(ξ). Therefore, byParseval, ∫ T/2

−T/2(PΩQTϕn)(PΩQTϕm) =

∫ ∞−∞

(QTPΩQTϕn)(QTPΩQTϕm)

=∫ ∞−∞

(QTPΩQTϕ∧) (QTPΩQTϕm)∧

= λnλm

∫ ∞−∞

(QTϕn)∧(QTϕm)∧

= λnλm

∫ ∞−∞

ϕn,T/Ω(ξ)ϕm,T/Ω(ξ) = λnλm

as claimed.

Problem 3.4.3. Do the sample sequences of the prolate spheroidal wave functions satisfy some extrapolationproblem?

In other words, one would like to determine from the sample values of ϕm inside of [−T/2, T/2] the valuesoutside of [−T/2, T/2]. This problem is highly ill-posed but it is less ill-posed when we observe that ϕ is ananalytic function and even less so when we observe that it is an eigenfunction.

3.4.1 Numerical generation of PSWFs

Figure 3.15 shows numerically generated prolate spheroidal wave functions where T = 10 and Ω = 1. Thefigure illustrates the fact that the first several eigenfunctions are highly concentrated in [−T/2, T/2], butthat the concentration of ϕn decreases with n and once n.[TΩ] at most half of the energy of ϕn is localizedinside [−T/2, T/2]. Here is a brief description of how they were created. First a matlab function sincmatrixwas used to generate a partial matrix sT (k, `) for values k and ` running from −N to N for some userinput N . The integral defining sT (k, `) was computed using matlab’s built in quad function for numericalestimation of integrals. Matlab also has a built in sinc function but for earlier versions the sinc functioncan be input manually, with a small correction in the denominator to avoid division by zero at t = k, `.Computing sT (k, `) is computationally intensive but only has to be done once. The eigenvectors are thenestimated numerically by using the matlab built in svd. These eigenvectors are the samples of the PSWFs.Finally, one multiplies the matrix containing the eigenvectors of sT by a matrix containing densely sampledvalues of the shifted sinc functions. The columns of the resulting product are densely sampled approximateprolate spheroidal wave functions. The approximations here depend on two things: (i) the error tolerance inthe quadrature defining sT and (ii) the parameter N governing the size of the partial matrix of sT . In fact,the entries of sT decay fairly rapidly away from the diagonal. This is illustrated in Figure 3.16 showing thatthe entries sT (k, `) are significant only when k, ` are approximately between −5 and 5 and when k ≈ `. Inaddition, one sees in Figure 3.17 that the partial sinc matrix sT has only about 14 significant eigenvaluesand only 10 eigenvalues greater than 1/2. In fact, a theorem due to Landau [?] states that, in general, PΩQThas at most [ΩT ] + 1 eigenvalues larger than 1/2. In our case, Ω = 1 and T = 10 so our numerical resultsare certainly consistent with the theory. Taking N any larger will provide slightly better approximations butat a higher computational cost.

Once one has the samples of the PSWFs ϕn one can compute the projection of any f ∈ PW onto therange of PΩQT simply by computing the `2(Z) inner product (or, numerically, the partial inner product) ofthe samples of f with the samples of each ϕn. This is the same as multiplying the partial sample vector bythe orthogonal matrix obtained in the svd above. Then one expands each of the ϕn’s as before.

3.5 Another look at frequency 67

−10 −5 0 5 10

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

phi

2

phi3

phi10

Fig. 3.15. Numerical PSWFs ϕ2, ϕ3 and ϕ10 for Ω = 1 and T = 10.

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

Fig. 3.16. Image of 41 × 41 partial matrix sT (k, `) forT = 10

2 4 6 8 10 12 14 16 18

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 3.17. Eigenvalues of partial matrix sT

3.5 Another look at frequency

A basic tenet of time-frequency analysis is that all signals of interest can be expressed as superpositionsof damped oscillators. In a lot of cases the signal is really a superposition of not too many oscillators,plus random effects, but there are two fundamental reasons why untangling the individual components isproblematic. The first is that, generally, one does not know the precise nature of the oscillator. The second isthat, generally, one does not know the precise nature of forcing or damping effects. Because there are so manyunknowns one oversimplifies by treating all components as sinusoidal, then undersimplifies by expressing asignal as a complicated superposition of sinusoids. Sinusoidal oscillations, if they are undamped, obey theharmonic oscillator equation

d2x

dt2= −ω2x(t)

This equation is derived from Newton’s law F = ma and Hooke’s law F = −kx with ω = k/m. If, instead,the oscillations are linearly damped (e.g. thermal loss proportional to velocity) then the equation is justslightly more complicated,


d2x

dt2+

b

m

dx

dt+ ω2x(t).

Damping proportional to velocity or viscosity damping is an oversimplification; in many cases Coulomb orfrictional damping also is significant. One can also call Hooke’s law into question. For example, if one allowsspring stiffness to depend slightly on displacement then one is quickly led to oscillators such as the duffingoscillator

d2x

dt2+

b

m

dx

dt+ ω2x(t) + βx(t)3.

Thus, even for small amplitude vibrations and for the simplest possible nonlinearities, the result of treatingoscillations as superpositions of pure sinusoids leads to nontrivial bandwidth. Finally, there is the possibilitythat a signal is generated by a coupled system of oscillators.

It is impossible to take all of these issues into account when trying to express a system or signal as asuperposition of damped vibrations or a measurement thereof. In many applications it is more fundamentalthat time-frequency analysis can point to fundamental changes over time in the oscillatory behavior ofobserved signals, as time-frequency representations attempt to do. But when a signal is fundamentally asuperposition of a small number of oscillators, it makes sense to try to identify waveforms beyond sinusoids.

3.5.1 Defining the spectrum of a real signal

3.5.2 Instantaneous frequency and the Hilbert transform

At the beginning of this chapter we discussed the polar or amplitude-phase decomposition z = reiθ of acomplex signal z(t) = x(t) + iy(t) with r =

√x2 + y2 and θ = arctan(y/x). At any time t, r represents

the magnitude of z and eiθt represents the position of a point on the unit circle. In the ideal case z = eiθt,expressed in radians, the signal z oscillates at a rate θ radians per unit time and we call θ the instantaneousfrequency. When z is not a pure exponential we define the instantaneous phase at time t as θ(t) and dθ/dtis called the instantaneous frequency.

These are nice definitions but they suffer a fundamental problem, which is that, ordinarily, measurablesignals are real-valued. In order to make sense out of instantaneous phase one has to make sense out of thevirtual complex signal companion of x(t).

When x is a square integrable signal of time there is a somewhat canonical way of manufacturing ananalytic signal from x. This tool is called the Hilbert transform and it is defined in the following way:

f 7→ f 7→ f1[0,∞) 7→ (7→ f1[0,∞))∧ ≡12

(I + iH)f.

The operator H is called the Hilbert transform and can be defined analytically by the integral formula

Hf(t) = p.v.1π

∫ ∞−∞

f(s)t− s

ds.

Here p.v. stands for principal value meaning that the integral is rigorously defined by taking an appropri-ate limit near the singularity s = t in the denominator of the integrand. Corresponding discrete Hilberttransforms can be defined for discrete and finite signals respectively.

Exercise 3.5.1. Give a reasonable definition of a Hilbert transform on CN .

The instantaneous frequency of f is that of its analytic extension (I + iH)f/2. While instantaneousfrequency can thus be defined in rigorous mathematical terms, its physical interpretation is still problematic.For one thing, the phase θ(t) may not be a differentiable function of t. But just as critically, instantaneousfrequency may not reflect the nature of the signal if it is composed of multiple oscillating components. Weare still haunted by the fundamental tradeoff between localization in time and localization in frequency.

3.5.3 Monocomponent versus multicomponent

Instantaneous frequency only yields one value at any given time. This is fine if a signal is comprised of asingle oscillating component but not if several components are present. One way to quantify the amount offrequency variation a physical signal possesses is in terms of bandwidth.

3.6 Empirical mode decomposition 69

3.5.4 Empirical bandwidth

As before let z(t) = r(t)eiθ(t). Express the Fourier transform of z as Z(ξ) which has expected value

〈ξ〉 =∫ξ|S(ξ)|2 dξ.

By Plancherel’s theorem,

〈ξ〉 = <∫z(t)

1i

dz

dtdt

= <∫ (dθ

dt− i

r

dr

dt

)r2(t) dt =

∫dθ

dtr2(t) dt

by equating real and imaginary parts. In this notation one can define the bandwidth ν in terms of instanta-neous amplitude and frequency averages as

ν2 =ξ2 − 〈ξ〉2

〈ξ〉2=

1〈ξ〉2

∫(ξ − 〈ξ〉)2|Z(ξ)|2 dξ

=1〈ξ〉2

∫z(t)

(1i

d

dt〈ξ〉2

)2

z(t) dt

=1〈ξ〉2

∫ ((drdt

)2

+(dθdt− 〈ξ〉

)2

r2(t))2

dt

For a narrowband signal both terms of the last integral have to be small meaning that both the amplitudeand the instantaneous frequency have to vary slowly. In order that the Fourier transform is supported in[0,∞), one should have ) ≤ ν ≤≤ ξ. However, this definition of bandwidth still takes global averages of localinformation which can lead to negative frequencies.

When signals are generated by a Gaussian stationary process meaning that random errors follow anormal distribution and the nature of the distribution does not change over time, it is possible to computethe expected number of zero crossings per unit time. If the average value of the signal is zero and the signalis essentially one oscillating component, it makes sense to define the frequency locally approximately as thenumber of zero crossings per unit time, for a short time average, provided a local maximum or minimumintervenes between consecutive zero crossings. When the signal does not have an average value of zero, thisapproach can be misleading.

Exercise 3.5.2. Compute the instantaneous frequency of the analytic signal z(t) whose real part is x(t) =a+ sin t for (i) a = 0, (ii) a = 2/3, (ii) a = 4/3. What does this say about using the analytic signal to defineinstantaneous frequency?

3.6 Empirical mode decomposition

3.6.1 Intrinsic modes

Huang et al define an intrinsic mode function (IMF) to be any function that satisfies the following: (i) inthe whole data set, the number of extrema and the number of zero crossings differ by at most one. (2) Atany point, the mean value of the envelope defined by the local maxima and the envelope defined by the localminima is zero.

This last condition is not completely well defined because one has to use some interpolation schemeto produce these envelopes from the maxima and minima. But it can be achieved by adding envelopeinterpolation points by reflecting each extremum in the data across the y-axis. The resultant waveform doesnot always possess well-defined instantaneous frequency [?] (it will introduce an alias in the instantaneousfrequency for nonlinearly deformed waves), but a relatively weak one in comparison with the effect of globalnonstationarities.

Exercise 3.6.1. (Easy) For which values of a is x(t) = sin t+ a an IMF? Is e−πt2

sinβt and IMF?


The term intrinsic mode function refers to the oscillation mode embedded in the data. An IMF, is meantto have only one oscillation mode, although we will see that algorithms for extracting IMFs do not alwayssucceed in this regard. But they do succeed in extracting riding waves that arise in the analytic extensionof a+ sin t, for example. An IMF is not restricted to be narrowband and can is typically nonstationary. Forinstance, any frequency modulated (FM) signal can be an IMF.

Instantaneous frequency of IMD

Suppose that x(t) is an IMD and let z(t) = r(t)eiθt be its analytic extension. Then its Fourier transform is

Z(ξ) =∫ ∞−∞

r(t)e2πi( θ2π−tξ) dt

The frequency contributing most to Z(ξ) at any given time satisfies

d

dt

(θ(t)2π− tξ

)= 0 or

dθ

dt= ξ.

In this sense, an IMD represents a true oscillation.

3.6.2 Sifting: the Empirical Mode Decomposition

Empirical Mode Decomposition

While IMD captures some essential features of the notion of an oscillation of finite duration, data generated bya superposition of several nonstationary components will not necessarily reveal itself readily as a superpositionof IMDs so, to be of practical use, there has to be a method to disentangle IMD components. Huang et al. [?]suggest a sifting method called the empirical mode decomposition. The crucial step is to identify characteristicscales for different oscillations.Step 1: Extrema. The first step is to identify all local maxima and minima in the data X(t). Here one justneeds enough measurements/samples so that any reasonable notion of bandwidth is taken into account.Step 2: Envelopes. One interpolates all of the local maxima data to produce an upper envelope. Typicallycubic spline interpolation is used. The matlab command spline can be used for this. One does the samewith all local minima information to produce a lower envelope.Step 3: Extract zero mean signal. Denote by m(t) the average of the upper and lower envelopes. Thenh(t) = X(t)−m(t) has average zero. The signal h may not be an IMF because there can still be extrema withthe wrong sign. This is a result of undershoot and overshoot of the interpolation process, which can resultfrom nonlinearities in the data. Additionally, in real oscillations the envelope mean value will not generallybe zero, though it might be close. Results also depend on the interpolation method.Step 4: Iteration over zero mean extractions. One repeats the sifting method on h and on subsequentlyproduced means until some stopping criterion is satisfied. The desired stopping criterion is that h producedafter some number of iterations is an IMD. Huang et al. suggest that if the relative difference in `2 norm(on samples) of successive h’s is small enough then one should also stop. Sufficiently small can be madea tunable, data dependent parameter. Huang et al. suggest 0.2 to 0.3 for the relative error. The resultingcomponent C = C1 should contain the finest time scale (highest frequency) oscillation of the data X(t).Step 5: Iterate on residual signal. This step repeats steps (1)–(4) to define more IMFs. If R1 = X −C1

then iteration on R1 yields a component C2 and so on so that X + R1 + C1 = R2 + C2 + C1 . . . . It makessense to adopt two separate stopping criteria based on environmental effects. The first stopping criteria isthat the component or residue becomes small after some number of steps. A second criterion is that theresidue does not have any local maxima after some number of steps, in which case it is called a trend.

The components Ci should be locally almost orthogonal in the sense that Ci+1 locally corresponds toa mean value and Ci locally corresponds to a difference from that mean. Huang et al. provide a measurecalled index of orthogonality, essentially the ratio of the magnitudes of the cross correlations of differentcomponents to the total signal energy.

Results of the empirical mode decomposition applied to the data buellershort are given in Figure 3.18.The first IMF is plotted over the original data in Figure 3.19.

3.6 Empirical mode decomposition 71

Spectrogram intensity plots of sums of the first few IMFS are given in Figure 3.20. Each of the figuresillustrates one nontrivial issue with EMD, namely that multiple overlapping frequencies can show up in asingle IMF.

0−0.5

00.5

Bueller IMF 1

0−0.5

00.5

0−0.5

00.5

0−0.5

00.5

0−0.2

00.2

0−0.1

00.1

0−0.05

00.05

0 1000 2000 3000 4000 5000 6000 7000 8000 9000−0.05

00.05

Bueller IMF 8

Fig. 3.18. First eight IMFs of bueller data. EMD identified 13 IMFs but the last several have small amplitude.

3.6.3 Hilbert spectrum

One of the basic premises of EMD is that the complex envelope of x(t) provides an inappropriate measure ofinstantaneous spectrum because it tries to reduce all of the spectral information in a possibly multicomponentsignal to a single number. When x is expressed as a superposition of several component IMFs, it becomesappropriate to compute the instantaneous frequencies of each of the IMFs. That is, let Zi denote the analyticextension of Ci and let Zi(t) = Ri(t)eiθi(t). This decomposition then resembles the Fourier decomposition ofa finite superposition of oscillating components, but in the Fourier series case each component is stationary.Huang et al. refer to the time-frequency distribution H(t, ξ) of the amplitude as the Hilbert spectrum of x. Itis a time-frequency distribution in the same manner as the spectrogram, but it is defined intrinsically basedonly on very simple suppositions about what properties an oscillating component should have.

3.6.4 Limitations of EMD

Empirical mode decomposition has several drawbacks. For one, it does not necessarily track coherent compo-nents whose frequencies change with time. Another drawback, illustrated in Figure 3.21 shows that implicitmode functions sometimes can be aggregates of multiple frequency oscillations. The beating pattern in thefirst IMF of the buellershort data happens because the IMFs do not take into account fine informationabout derivatives at zero crossings. In this case, the outer envelope has the form cos (β − α)t and the inneroscillation has the form cos (β − α)t with similar amplitudes so that one gets effectively twice the productcosαt cosβt. Such behavior arises in real systems. The shortcoming here is that nothing in the EMD algo-rithm prevents the envelope itself from having a zero. In terms of physical systems the question becomesone of whether one wishes to represent a coupled oscillator as a single component or as a superposition ofmultiple components.


0 1000 2000 3000 4000 5000 6000 7000 8000 9000−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Bueller1st IMF

Fig. 3.19. Bueller data and first IMF.

First 1 IMFs

First 3 IMFs First 4 IMFs

Fig. 3.20. Sums of first few IMFs. The sum of the first four captures most of the spectrogram in Figure 3.7

3.7 Appendices to Chapter ?? 73

1350 1400 1450 1500 1550 1600 1650 1700 1750 1800

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Fig. 3.21. Beating in first IMF of Bueller signal illustrates one limitation of EMD.

3.7 Appendices to Chapter ??

3.7.1 Code for Gabor compression

function y=gaborcompress(X,P);

% gaborcompress reconstructs an approximation of% of a one-dimensional signal from P percent of its largest% gabor coefficients% X should be a 1xN row vector% N should be at least 32.

L=2^(floor(log2(length(X)))); % dyadic lengthX=double(X(1:L)); % floating pointp=(100-P)/(100);%g=pgauss(L/4,L/16,L/8);a=8;M=L/a; % number of channels

% default shift factordg=candual(g,a,M);

tic % start clock% start clock[dgtX,Ls] = dgt(X,g,a,M);disp([’-----------’]);

disp(’Forward DGT’);toc % stop clock


disp([’-----------’]);s1=size(dgtX,1)s2=size(dgtX,2)s=s1*s2

figure;imagesc(log(1+10*abs(flipud(dgtX(1:floor(s1/2),:)))));

ticfcsort = sort(abs(dgtX(:))); % sort fourier coeff by magnitudesfcerr = cumsum(fcsort.^2); % sum of squaresfcerr = flipud(fcerr); % decreasing orderfthresh = fcsort(floor(p*s)); % specify thresholdcf_X = dgtX .* (abs(dgtX) > fthresh); % keep largedisp([’-----------’]);

disp(’Sorting/thresholding’);tocdisp([’-----------’]);

figure;imagesc(log(1+10*abs(flipud(cf_X(1:floor(s1/2),:)))));

ticidgt_X = idgt(cf_X,dg,a,Ls);disp([’-----------’]);

disp(’Inverse DGT’);tocdisp([’-----------’]);figure;y=real(idgt_X);subplot(1,2,1);plot(X);axis([1 length(X) min(X) max(X)]);title(’original data’);subplot(1,2,2)plot(y);axis([1 length(X) min(X) max(X)]);title(’reconstruction from large Gabor coefficients’);

3 time-frequency analysis - new mexico state …jlakey/papers/book-401-530_chapter3.pdf · 48 3...

Documents