university of miami tonality estimation using …vaibhav, chhabra (m.s., music engineering...

UNIVERSITY OF MIAMI

TONALITY ESTIMATION USING

WAVELET PACKET ANALYSIS

By

Vaibhav Chhabra

A Research Project

Submitted to the Faculty of the University of Miami

in partial fulfillment of the requirements for

the degree of Master of Science

Coral Gables, Florida

May 2005

UNIVERSITY OF MIAMI

A research project submitted in partial fulfillment of

the requirements for the degree of

Master of Science

TONALITY ESTIMATION USING

WAVELET PACKET ANALYSIS

Vaibhav Chhabra

Approved:

________________ _________________

Ken Pohlmann Dr. Edward Asmus

Professor of Music Engineering Associate Dean of Graduate Studies

________________ _________________

Colby Leider Dr. Paul Mermelstein

Assistant Professor of Music Engineering Professor of Electrical Engineering

DEDICATION

They say that one’s experience is what defines an individual. After all, you are

what you are because of your experiences. On that note I would like to dedicate this work

to all those who have contributed to my experience in this journey. For what I have

learned has laid the foundation for what I will learn.

I would also like to thank my family who has always been supportive of me, my

brother Ruchir who is a natural send-master, Papa and Ma thanks for keeping the faith.

All the Chacha’s, Chachi’s and cousins, thank you all for the support.

Next on my thank you list are my Tae Kwon Do buddies. Sensei Jeff thanks for

all of your advice, some day I’ll be teacher like you. Rico, training with you was an honor

(congratulations on your black-belt). Nat (Ryu) sparring with you was almost like

dancing, I’m sure you’ll be an awesome martial artist. Sensei Chikaco, Sensei Gerard,

Sensei Kat, Lora, Lila, Becky, Erwin thank you all, I’ll miss you guys.

Last and definitely not the least are my friends in Miami. Jon (silent warrior) for

teaching me all his DSP jutsu’s, Vishu(2nd best table-tennis player at UM), Marc (water),

Lindsey (Defense-Master), Rian (Gentoo), Becky(Mini Marina), Jess, Jose, Doug (me

and my beer), Drew (Send-Master), Neri (p3), Tiina(Jo-master), Rob.R, Rob.B (adidas-

w), Kai(thanks for taking care of my knee), Joe Abbati (Rathskellar) ...

Rene Descartes - “I think therefore I exist”

Vaibhav Chhabra (Meno) - “I exist therefore I oscillate”

SEND WITH RESPECT ^_^

ACKNOWLEDGMENT

I would like to especially thank my advisor and mentor, Ken Pohlmann, who has

generously given of his precious time and provided me with several great opportunities

during my time at UM. I would also like to thank my thesis committee for being patient

in spending time with me and guiding me at time when I needed them the most. I hope

we keep in touch and have opportunities to work again.

i

Table of Contents Chapter 1: Introduction......................................................................................................................................................................1

1.1 Masking.................................................................................................................................................................1

1.1.1 Threshold of Masking...............................................................................................................................2

1.2 Tonality..................................................................................................................................................................3

1.2.1 Common Tonality Classification Methods................................................................................................4

1.2.2 Typical Psychoacoustic Model....................................................................................................................4

1.2.3 Psychoacoustic Model 1...............................................................................................................................5

1.2.4 Psychoacoustic Model 2...............................................................................................................................8

Chapter 2: Signal Representation................................................................................................................................................12

2.1 Periodic Signals...............................................................................................................................................12

2.2 Fourier Series....................................................................................................................................................13

2.3 Fourier Transform...........................................................................................................................................14

2.3.1 Fourier Transform Derivation...................................................................................................................15

2.3.2 Dirac Delta Function...................................................................................................................................17

2.3.3 Fourier Coefficients.....................................................................................................................................18

2.3.4 Fourier Coefficients Derivation................................................................................................................19

2.4 Hilbert Transform...........................................................................................................................................22

2.4.1 Analytic Signal.............................................................................................................................................22

2.4.2 Hilbert Transform Theory....................................................................................................................23

2.4.3 Phase Rotation..........................................................................................................................................24

2.4.3 Complex Envelope......................................................................................................................................26

2.4.4 Advantages of the Complex Envelope....................................................................................................27

2.5 Summary.............................................................................................................................................................30

Chapter 3: Time to Frequency Mapping.................................................................................................................................31

3.1 Quadrature Mirror Filter (QMF)...............................................................................................................31

3.2 Aliasing and Imaging.....................................................................................................................................32

3.3 Distortion Transfer Function.......................................................................................................................34

ii

3.4 Polyphase Decomposition............................................................................................................................34

3.4.1 Perfect Reconstruction................................................................................................................................35

3.5 Paraunitary Property......................................................................................................................................36

3.5.1 Unitary Matrix..............................................................................................................................................36

3.6 Summary.............................................................................................................................................................37

3.6.1 Advantages of Paraunitary Filter Banks..................................................................................................37

Chapter 4: Short-Time Fourier Transform (STFT)............................................................................................................38

4.1 Analysis of the STFT equation..................................................................................................................39

4.2 STFT as a Bank of Filters............................................................................................................................40

4.3 Effects of Windowing....................................................................................................................................41

4.3.1 Choice of the Best Window.......................................................................................................................42

4.4 Summary.............................................................................................................................................................43

Chapter 5: The Wavelet Transform..........................................................................................................................................44

5.1 Weakness of the STFT..................................................................................................................................44

5.2 STFT to Wavelets...........................................................................................................................................45

5.2.1 Modifications on the STFT........................................................................................................................46

5.3 Inverse Wavelet Transform.........................................................................................................................48

5.4 Orthonormal Basis..........................................................................................................................................49

5.5 Wavelet Packet Analysis..............................................................................................................................50

5.5.1 Discrete Wavelet Transform......................................................................................................................51

5.6 Wavelet Packet Tree Representation.......................................................................................................52

5.6.1 Energy Representation................................................................................................................................52

5.6.2 Index Representation...................................................................................................................................53

5.6.3 Filterbank Representation..........................................................................................................................54

Chapter 6: Analysis and Results..................................................................................................................................................56

6.1 Detection Scheme...........................................................................................................................................57

6.1.1 Frequency Breakdown................................................................................................................................59

6.1.2 Detector Pseudocode Methodology.........................................................................................................60

6.1.3 Detection Process.........................................................................................................................................61

6.2 Node Reconstruction......................................................................................................................................63

iii

6.3 Tonality Estimation........................................................................................................................................64

6.3.1 Auto-Correlation Function.........................................................................................................................65

6.3.2 Auto-Covariance..........................................................................................................................................67

6.3.3 Type-I Analysis............................................................................................................................................68

6.3.4 Type-II Analysis..........................................................................................................................................69

6.4 Tonality Index (Time-Domain)..................................................................................................................73

6.5 Tonality Index (Frequency-Domain).......................................................................................................75

6.5.1 Comparison with Model 2..........................................................................................................................76

Chapter 7: Conclusions and Recommendations.....................................................................................................................81

References............................................................................................................................................................................................83

Appendix...............................................................................................................................................................................................85

iv

List of Figures:

Figure 1.1: General block diagram of a perceptual coder........................................................................................3

Figure 1.2: General block diagram of a psychoacoustic model............................................................................5

Figure 1.3: Tonal components identified in Model 1..................................................................................................6

Figure 1.4: Maskers Decimation in Model 1...................................................................................................................7

Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1.............................................................................8

Figure 1.6: Example of a predicted masking threshold for a masker...............................................................10

Figure 1.7: General block diagram of Model 2............................................................................................................10

Figure 2.1: A continuous-time signal.................................................................................................................................12

Figure 2.2: Continuous-time sinusoidal signal.............................................................................................................13

Figure 2.3: Discrete-time unit impulse (sample).........................................................................................................17

Figure 2.4: The Dirac Delta Function................................................................................................................................18

Figure 2.5: Dirac Delta in Time-Domain.........................................................................................................................18

Figure 2.6: Dirac Delta in Frequency-Domain.............................................................................................................18

Figure 2.7: Frequency Response of Rectangular Pulse............................................................................................19

Figure 2.8: Periodic Square Wave.......................................................................................................................................20

Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave...............................................................21

Figure 2.10: Cosine Wave Properties.................................................................................................................................24

Figure 2.11: Sine Wave Properties......................................................................................................................................24

Figure 2.12: Rotating Phasors to create a sine wave out of a cosine................................................................25

Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative

frequencies by +90°......................................................................................................................................................................26

Figure 2.14: Spectral Properties of the Complex Exponential............................................................................26

Figure 2.15: Spectral Properties of the s(t).....................................................................................................................28

Figure 2.16: The Modulated Signal and its Envelope...............................................................................................28

Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal..........30

Figure 3.1: QMF filter-bank....................................................................................................................................................31

Figure 3.2: Aliasing......................................................................................................................................................................32

Figure 4.1: FFT Block Diagram............................................................................................................................................38

Figure 4.2: STFT Represented in terms of a Linear System.................................................................................40

Figure 4.3: Rearranged STFT Representation in terms of a Linear System................................................41

Figure 4.4: STFT viewed as a Filter-Bank......................................................................................................................41

Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples...........................................................42

v

Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed

function v(t)......................................................................................................................................................................................44

Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b)................47

Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©.......................................................................................48

Figure 5.4: 3-level Wavelet decomposition tree..........................................................................................................51

Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of

discrete wavelet transform..............................................................................................................52

Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal.........................................................................................53

Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal.............................................................................................54

Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....54

Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....55

Figure 5.10: Discrete wavelet packet tree (analysis stage)....................................................................................55

Figure 6.1: General block diagram of the proposed model...................................................................................57

Figure 6.2: level-1 Wavelet Packet Decomposition of a signal having multiple tone (4kHz,

10kHz, 15kHz)................................................................................................................................................................................58

Figure 6.3: level-3 Wavelet Packet Decomposition of multiple tones (4kHz, 10kHz, 15kHz)........58

Figure 6.4: level-3 Wavelet Packet Index Tree and the Coefficients of the Terminal Nodes...........60

Figure 6.5: level-2 Wavelet Packet Energy Tree Detector Code Pointers....................................................60

Figure 6.6: level-2 Wavelet Index Tree used to trace the Nodes that are sent to the tonality

analyzer...............................................................................................................................................................................................61

Figure 6.7: level-2 Wavelet Packet Energy Tree Detector Stage-I: nodes (4), (5) and (6) are

analyzed first....................................................................................................................................................................................62

Figure 6.8: level-4 Wavelet Packet Energy Tree Detector Stage-II: nodes (4), (5) and (6) are

analyzed first; green lines represent the nodes that are going to be analyzed by the tonality

analyzer...............................................................................................................................................................................................62

Figure 6.9: level-4 Wavelet Packet Energy Tree Detector Stage-III: nodes (3) is analyzed; green

lines represent the nodes that are going to be analyzed by the tonality analyzer......................................63

Figure 6.10 Wavelet Energy Tree: The white-arrows showing the two nodes used to calculate our

tonality.................................................................................................................................................................................................64

Figure 6.11: Auto-correlation Function of a Pure Tone..........................................................................................65

Figure 6.12: Auto-correlation Function of White Noise.........................................................................................66

Figure 6.13: Auto-correlation Function of Band limited Noise (0-22kHz)..................................................66

vi

Figure 6.14: Energy Tree, where the blue lines represent the nodes from which the tonality value

is calculated.......................................................................................................................................................................................68

Figure 6.15: A 4kHz tone with selected path (red arrows) and nodes used to calculate tonality

value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]......68

Figure 6.16: A 4kHz tone with -0.9dB white-noise added, selected path (red arrows) and nodes

used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-

covariance [right figure]............................................................................................................................................................69

Figure 6.17: A Snare Crash......................................................................................................................................................70

Figure 6.18a: Auto-Covariance of White-Noise..........................................................................................................70

Figure 6.18b: Auto-Covariance of Band-limited 0-22kHz Noise......................................................................71

Figure 6.18c: Auto-Covariance of Pure-Tone (1kHz)..............................................................................................71

Figure 6.19: A Snare crash analysis(a) Wavelet Tree, (b) Auto-Covariance..............................................72

Figure 6.20: Snare Crash (Last Frame) Auto-Covariance......................................................................................72

Figure 6.21: Tonality Index (Time-Domain) with Input Signal consisting of 1kHz tone then

Bandlimited Noise (0-22kHz) of power -20dB............................................................................................................73

Figure 6.22: Time-Domain plot of test signal (1kHz tone then Bandlimited Noise (0-22kHz) of

power -20dB)...................................................................................................................................................................................74

Figure 6.23: Tonality Index (Time-Domain) with Input Signal consisting of white noise (power -

20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)....................74

Figure 6.24: Time-Domain plot of test signal of white noise (power -20dB) followed by a 1kHz

tone and then Bandlimited Noise (0-22kHz; power -0.9dB)................................................................................75

Figure 6.25: Frequency Map of Wavelet Tree: The red arrows represent the generated path which

consist of an array of nodes from which the last node value are taken (blue lines) to map................76

Figure 6.26a: Tonality Index – Model 2 (1kHz)..........................................................................................................77

Figure 6.26b: Tonality Index – Proposed Model (1kHz)........................................................................................77

Figure 6.27a: Tonality Index – Model 2 (4kHz)..........................................................................................................78


Figure 6.28a: Tonality Index – Model 2 (6kHz) .........................................................................................................79


vii

VAIBHAV, CHHABRA (M.S., Music Engineering Technology) Tonality Estimation using Wavelets (May 2005) Packet Analysis Abstract of a Master’s Research Project at the University of Miami. Research project supervised by Professor Ken Pohlmann. No. of pages in text: 124 Abstract: Perceptual audio coding is a novel approach to compress audio by taking

advantage of models of the human auditory system also known as psychoacoustic models.

The quality and efficiency of the encoding process depends highly on how these models

accurately characterize the nature of the audio signal, in particular its tonality attributes.

This paper explores various analysis techniques using wavelet packet tree decomposition

to accurately estimate tonality by exploiting energy and statistical information. More

specifically, the tonality estimation is based on the correlation information of the nodes

and uses wavelets such as Haar and Daubechies 1 for decomposing the signal.

1

Chapter 1: Introduction

In recent years, several advancements have been made in the field of audio

coding. One must realize that no matter how many advancements we make, the ultimate

receiver of the analysis-coding-transmission-decoding-synthesis chain is the human

auditory system. In fact, all perceptual audio coders or lossy audio coders solely rely on

the exploitation of this system. A model based on this system, also known as the

psychoacoustic model, exploits properties and tolerances of the human auditory system to

remove irrelevant components of the audio signal which are those components that do not

contribute to the auditory impression of the acoustic stimulus. Thus, these irrelevant

components of the audio signal may be removed from the initial stages of the signal

communication chain (analysis/coding) resulting in more information capacity, which can

be used to code relevant audio components. This operation is called irrelevancy reduction

and is based on the concept of masking.

1.1 Masking

Masking refers to the total and relative inaudibility of one sound component due

to the presence of another one [Ferreira, A.J.S, 1995], with particular relation to

amplitude, frequency, time [Zwicker, E, Fastl, H, 1990] and space [Blauert, J, 1993].

There are two types of masking: frequency masking (simultaneous masking) and

temporal masking. Frequency masking can also be looked as the excitation in the

cochlea’s basilar membrane that prevents the detection of a weaker sound being excited

in the same area in the basilar membrane, whereas temporal masking takes place without

the presence of the masker and maskee. One might think of it as the auditory path delay

2

between auditory neuron to the brain or the time taken by the individual to give meaning

to the auditory information.

Among the usually considered aspects of masking, simultaneous masking is, by

far, the most important source of irrelevancy reduction. It is because of this that most

perceptual audio coders involve time to frequency domain mapping in the form of sub-

band or a transform filterbank.

1.1.1 Threshold of Masking

In the context of frequency domain audio coders, the masker is the input audio

signal, containing coherent (tone-like) or incoherent (noise-like) components, and the

maskee is the quantization noise. It should be noted that the ultimate goal of a perceptual

coder is to generate a good estimate for the profile of the quantization noise that does not

cause noticeable impairments when actually added to the original signal [Ferreira, A.J.S,

1995]. In other words, the noise profile, also called Threshold of Masking [Johnston,

J.D,1988], should be optimally shaped in frequency, time and space in such a way that

the quantization noise can be efficiently masked. Several studies [Moore, C.J.B,

1982][Hellman, R.P, Harvard University] reveal that the threshold of masking may vary

substantially as a function of noise-like or tone-like nature of audio signals. As a

consequence, this aspect has a significant influence on the quality and efficiency of the

audio encoding process [Ferreira, A.J.S, 1995], as shown in Figure 1.1

3

Figure 1.1: General block diagram of a perceptual coder Introduction to Digital Audio Coding and Standards by Marina Bosi, Richard E. Goldberg

1.2 Tonality

One of the key components in the psychoacoustic model is the calculating of

tonality, whose values are used to calculate the Signal to Mask ratio which determines the

absolute masking threshold of the input signal. Different values for masking have been

reported in [Hellman, R.P, Harvard University 02138] for tone masking noise versus

noise masking tone. In [Hellman, R.P, Harvard University 02138] and [Zwicker, E, Fastl,

H, 1990] it is clear that a narrow band of noise masks a tone much more effectively than

a tone masking it. In fact, the masking effects of tone and noise of equal intensity vary by

20dB. It is interesting to note that bandlimited noise with constant SPL and varying

bandwidth flattens the masking function whereas increasing the SPL and keeping the

bandwidth constant narrows the masking function [Hellman, R.P, Harvard University

02138]. In particular, a signal can be quantized using more or less bits according to its

tonality properties, which emphasizes on the importance of accurately estimating tonality

leading to improved bit allocation.

digital in Analysis filterbank

Quantization and coding

serial bitstream multiplexing

bit stream

Calculation of masking threshold based on psychoacoustics

4

1.2.1 Common Tonality Classification Methods

Tonality in most audio coders is generally evaluated by taking a short segment of

audio samples (eg. 512 or 1024 samples) and making a spectral analysis, using for

example a FFT (fast Fourier transform). For example power and phase evolution of each

spectral component are examined, making it possible to infer the tonal behavior of the

signal at different regions of the spectrum. An average tonality measure for the whole

analyzed signal segment can be computed using the Spectral Flatness Measure (SFM).

The SFM is defined as the ratio of the geometric mean of the power spectrum to the

arithmetic mean of the power spectrum [Ferreira, A.J.S, 1995]. Once calculated, its

values are converted to dB values with the reference set to SFMdBmax=-60dB as an

estimate for tone-like signals. The SFMdB is finally converted to a tonality coefficient α

whose values range from [0,1]. The lower values indicate a global noise-like behavior

and the higher value a global tone-like behavior. This particular method is used in

psychoacoustic model-2 MPEG-AUDIO standard to classify tonality.

1.2.2 Typical Psychoacoustic Model

A typical psychoacoustic model consists of modeling a cochlea filter during its

initial stages which, models the energy or phase information based on the ear. This is

accomplished by applying the spreading function. A spreading function is a function that

models the spreading of masking curves thus, modeling the energy excitation along the

basilar membrane. This information is then passed to the tonality estimator which

determines the relevant and irrelevant components of the signal and helps in the

estimation of the masking threshold which eventually gives rise to the absolute threshold

as shown in Figure 1.2

5

Figure 1.2: General block diagram of a psychoacoustic model Johnston, J.D

1.2.3 Psychoacoustic Model-1

The psychoacoustic model 1 in the MPEG standard performs an FFT to the input

data which is windowed using a Hanning window of length 512 samples for layer I and

1024 for layer II and III. An overlap of N/16 is done between the adjacent frames, where

N is the number of samples in that frame. After applying the FFT, the signal level for

each spectral line k is calculated

)3/8][/4(log1096 2210 kXNdBLk += (1.1)

for k = 0,...N/2-1

where 1/N2 factor comes from the Parseval’s theorem and takes in account the positive

frequencies components of the spectrum. The other factor of 2 deals with the scaling of

the amplitude of the spectral components to 1 from ½, followed by the 8/3 factor which is

due to the reduction in gain of the Hanning window [Bosi, M., Goldberg, R., 2003].

Once the spectral components are calculated the sound pressure level in each sub-

band is calculated Lsb[m] corresponding to maximum amplitude FFT spectral line. The

FFT spectral line is chosen in such a way that it corresponds to the maximum scale factor

(scf).

dBmscfLmL ksb 10)768,32][(log20,max][ max10 −= (1.2)

Having calculated the sound pressure level, we next compute the mask threshold in order

to calculate the signal to mask ratio (SMR) which leads us to the tonality estimation

Cochlear Filter Modeling

TonalityEstimation

Absolute Threshold

Threshold Estimation

6

process where the model identifies peaks that have 7dB more energy than it neighboring

spectral lines as it tonal components [MPEG Standard, ISO11172-3],

dBLL jkk 7≥− + (1.3)

j is the index that varies in central frequency.

This is based on the assumption that local maximum within a critical band represents a

tonal component, as shown in Figure 1.3.

Figure 1.3: Tonal components identified in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.

if Lk represents a tonal component then adjacent spectral components centered at k are

added to define a tonal masker LT, the other components are summed to gives us the

noise maskers LN. Based on this information the spread of the masking curves are defined

by applying the spreading function [MPEG Standard, IS011172-3].

Having defined the tonal and non-tonal components, the number of maskers is

reduced prior to computing the global masking threshold by eliminating maskers whose

levels are below the level of the threshold in quiet. Also, maskers extremely close to

7

stronger maskers are eliminated. If two or more components are separated in frequency

by less than 0.5 bark then only components with the highest power is retained [Johnston,

J.D, Brandenburg, K, 1990, MPEG Standard, IS011172-3, Bosi, M., Goldberg, R., 2003].

Figure 1.4: Maskers Decimation in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.

Based on this information the individual masking thresholds are calculated and

summed along with the power of the threshold in quiet to give us the global masking

threshold, which leads to the calculation of the SMRs in each sub-band. This is done by

taking the difference between the maximum sound pressure level along with the

minimum global masking level of that sub-band. A general block diagram of the whole

process is shown in Figure 1.5

8

Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1 MPEG Standard ISO11172-3

1.2.4 Psychoacoustic Model-2

The psychoacoustic model 2 in the MPEG standard also performs an FFT to the

Hann windowed input block, of size 1024 for all layers, however for layers II and III the

model computes two FFTs for each frame. This model uses the output of the FFT

analysis to calculate the masking curves and its associated signal to mask ratio for the

coder sub-bands [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3]

For the SPL calculation the model groups frequency lines into “threshold

calculation partitions” whose widths are roughly 1/3 of a critical band. For a sampling

rate of 44.1 kHz one single masker SPL is derived by summing the energies in each

partition [MPEG ISO11172-3, Annex D, Table 3-D.3b]. The total masking energy of the

input audio frame is then calculated by convolving a spreading function with each of the

maskers in the signal, which leads us to the tonality calculation.

FFT

Compute SPL in each sub-band

Estimate Tonal and Non-Tonal

Masker Decimation

Masking Thresholds

Global Masking Threshold

SMR Calculated

Compare to Loudness Curve

9

The tonality index in the model revolves around the core concept of how

predictable a signal is from the prior two frames [Brandenburg, K, Johnston, J.D 1990].

For each frame ‘m’ and for each frequency line ‘k’, the signal amplitude, Am[k], and

phase, ϕm[k], are predicted by linear extrapolation from the prior values as follows: [Bosi,

M., Goldberg, R., 2003, MPEG Standard, ISO11172-3]

][][][][

][][][][

211

21

kkkkkAkAkAkA

mmmm

mmmm

−−−

−−

−+=′−+=′

θθθθ (1.4)

where ][kAm′ and ][kmθ ′ represent the predicted values. These values are then mapped into

an “unpredictability measure” defined as:

][][

][sin][][sin][][cos][][cos][][

22

kAkAkkAkkAkkAkkA

kCmm

mmmmmmmmm ′+

−′′+−′′=

θθθθ (1.5)

where ][kCm is equal to zero when the current value is exactly predicted and its

equal to one when the power of either the predicted or actual signal is dramatically higher

than the previous frames [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3].

This unpredictability measure is then weighted with the energy in each partition, thus

giving us the partitioned unpredictability measure, which is then convolved with the

spreading function. The result of this convolution is normalized with the normalizing

coefficient (normb) derived from the spreading function

∑=

= max

0

),(

1b

bbbbb bvalbvalsprdngf

normb (1.6)

and then mapped onto a tonality index which is a function of the partition number whose

values vary from zero to one.

)(log43.0299.0 bebb cbt −−= (1.7)

10

The tonality index is then used calculate the masking down-shift ∆(z) in dB which

dependents of the tonal characteristics determined by the tonality index bbt . This down-

shift value is different for tonal and non-tonal signals as shown in Figure 1.6.

∆tone masking noise = 14.5 + z dB [‘z’ is the bark value from 0-24] (1.8)

∆noise masking tone = C dB [C varies from 3-6dB]

Figure 1.6: Example of a predicted masking threshold for a masker Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”

Once the global masking threshold is calculated it is eventually used for calculating SMR

values in each partition by comparing itself to the threshold in quiet and then taking its

maximum. A general block diagram of the whole process is shown in Figure 1.7

Figure 1.7: General block diagram of Model 2 Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”

UnpredictabilityLevels (Cw)

SignalLevels (eb)

SpreadUnpredictability

Level (Ctb)

Tonality Indices (tbb)

FFT

Masking Levels (thrw)

SMR per Sub-band

Spread SignalLevels (ecb,,en,nbb,nbw)

11

In general, these two psychoacoustic models are very similar in their masking

threshold calculations but vary in their tonality classification scheme. The basic problem

that arises in the classification of tonality in current perceptual models is the analysis tool

that is used to analyze the frequency content of the incoming audio segment data, also

know as the Fourier transform. Its design necessitates the trade-off between time domain

and frequency domain resolution. The more the frequency resolution the more spectral

components are used resulting in a masking function that can be estimated with better

accuracy. On the other hand, a higher spectral resolution yields lower time resolution.

The solution to this problem would be to replace the tools used to analyze the audio

segment data, with tool that has better flexibility in adapting to the signals coherent state.

These requirements are met by a variant of the short-time Fourier transform also known

as the Wavelet transform.

The purpose of this thesis is to explore tonality estimation using wavelet packet

analysis (wavelet transform in a tree structure) based on the coherence and energy

distribution of the input audio segment, which could be eventually used in a

psychoacoustic model to increase coding efficiency and quality. The first theory section

is presented in Chapters 2 and 3 which develops relevant theory used in the later chapters.

The second section, Chapters 4 and 5, discusses Fourier analysis and introduces Wavelet

theory, which leads us to Chapters 6 and 7 that focuses on the tone-detector, tonality

analyzer and experiments that compliment their performance.

12

Chapter 2: Signal Representation

To understand the basic concept of Fourier analysis we must understand how it is

used to represent a signal. A periodic signal can oscillate with a time period T and

frequency f. We proceed to our first step in the analysis of complex exponentials and

sinusoidal signal and see how both of them are related to each other. Once this gap has

been bridged we can further draw conclusions on how the Fourier series leads to the

Fourier transform and also state certain interesting characteristics of the Fourier

transform.

2.1 Periodic Signals

A signal can be classified as periodic or aperiodic. A periodic continuous-time

signal x(t) has the property that there is a positive value of T for which [Oppenheim, A.V

& Willsky, A.S, 1996]

)()( Ttxtx += (2.1)

in other words if the signal were shifted to the left it would repeat itself with a

fundamental period T as shown in Figure 2.1.

Figure 2.1: A continuous-time signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

It is this property that exponentials share, specifically, tjetx 0)( ω= .This can be easily

shown when we equate the above equation with

13

)(0)( TtjeTtx +=+ ω (2.2)

Tjtjtj eee 000 ωωω =

10 =Tje ω (2.3)

Based on this result, we can conclude that a complex exponential is periodic for any

value of T if ω= 0 and if ω ≠ 0, then it has fundamental period of To (the smallest positive

value) equal to oωπ2 similarly, the sinusoidal signal )cos()( φω += tAtx o is periodic with

a fundamental period of o

oTωπ2

= , as shown in Figure 2.2

Figure 2.2: Continuous-time sinusoidal signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

2.2 Fourier Series

Sinusoidal waves and complex exponentials are periodic signals with fundamental

period o

oTωπ2

= and fundamental frequencyoTπω 2

0 = . We can therefore extend our

expression of complex exponential by associating the signal with a set of harmonic

related exponentials.

14

tT

jktjkk eet

)2(0)(

πωφ == , k = 0,±1, ±2,... (2.4)

Each of these signals are multiples of the fundamental frequency ωo hence being periodic

too. When k = +1 and k = -1 both signals have a fundamental frequency equal to ωo

which are collectively referred to as the first harmonic components. Similarly, when k =

+2 and k = -2 they refer to the second harmonic components. More generally, the

components for k = +N and k = -N are referred to as the Nth harmonic components.

A complex sinusoidal signal can be represented as a linear combination of harmonically

related complex exponentials of the form:

∑ ∑+∞

∞−

+∞

∞−

==t

Tjk

ntjk

n eCeCtx)2(

0)(π

ω (2.5)

This representation is also known as the Fourier series representation. Note that the

complex exponential can be written as sines and cosines

)sin()cos()( 00 tjtetx tjwo ωω +== , therefore making the Fourier series:

∑ ∑+∞

−∞=

+∞

−∞=

+=n n

nn tBjtAtx )sin()cos()( 00 ωω (2.6)

Where An are the coefficients of the cosines and Bn the coefficients of the sines

nnn jBAC += and nnn jBAC −=−

2.3 Fourier Transform

Before we derive the Fourier transform from the Fourier series let’s understand

what a transform is and why we need it. A transform is a mathematical operation that

takes a function or sequence and maps it into another one. In our case the Fourier

transform maps a time domain function or sequence in to the frequency domain.

Transforms are useful; they may give us additional or hidden information about the

15

original function. Most of the time transform equations are easier to solve than the

original equation. They may require less storage space and hence be used for data

compression or reduction. Operations such as convolution are easier to apply on a

transformed function, rather than the original function.

Fourier said “An arbitrary function, continuous or with discontinuities, defined in a finite

interval by an arbitrarily capricious graph can always be expressed as a sum of

sinusoids”. This is seen in the Fourier series. It is by manipulating this series we can

derive the Fourier transform.

2.3.1 Fourier-Transform Derivation

This can be shown by multiplying both sides eq. (2.5) with tjnwoe− to obtain

[Oppenheim, A.V & Willsky, A.S, 1996]

∑+∞

−∞=

−− =k

tjntjkk

tjn eeCetx 000)( ωωω (2.7)

Integrating both sides from 0 to0

2ωπ

=T , we have

∫ ∫ ∑ −+∞

−∞=

− =T

tjnT

k

tjkk

tjn dteeCdtetx0 0

000)( ωωω

Here, T is the fundamental period of x(t), and consequently, we are integrating over one

period. Now interchanging the order of integration and summation yields:

⎥⎦

⎤⎢⎣

⎡= ∫∑∫ −

+∞

−∞=

− dteCdtetxT

tnkj

kk

Ttjn

0

)(

0

00)( ωω (2.8)

The evaluation of the bracketed integral is straightforward. Rewriting this integral using

Euler’s formula, we get:

16

∫ ∫ ∫ −+−=−T T T

tjn tdtnkjtdtnkdtetx0 0 0

00 )sin()cos()( 0 ωωω (2.9)

Since the integral may be viewed as measuring the total area under the functions over the

interval and we are integrating over an interval (of length T), we see that for k≠n, both

the integrals on the right-hand of eq. (2.9) are zero. For k=n, the integrand on the left-

hand side of eq (2.9) equals 1, and thus, the integral equals T (the right-hand side). We

therefore have:

∫⎩⎨⎧

≠=

=−T

tnkj

nknkT

dte0

)(

,0,

0ω

and consequently, the right-hand side of eq (2.8) reduces to Cn giving:

∫ −=T

tjnn dtetx

TC

0

0)(1 ω (2.10)

Note, that this equation looks very similar to the Fourier Transform:

)()( ∫+∞

∞−

−= dtetfF tjωω (2.11)

Here, we have written an equivalent expression for the Fourier series in terms of

the fundamental frequency ωo and the fundamental period T. Equation (2.5) is referred to

as the synthesis equation and eq. (2.12) as the analysis equation. The set of coefficients of

Cn are often called the Fourier series coefficients or the spectral coefficients of x(t).

These complex coefficients measure the portion of the signal x(t) that is at each harmonic

of the fundamental component. It’s interesting to note that when n=0 then eq. (2.9)

becomes:

∫=T

dttxT

C0

0 )(1 (2.12)

17

This is a simple average value of x(t) over one period.

2.3.2 Dirac Delta Function

One of the best ways to understand Fourier analysis is to analyze a square pulse

train. A square pulse can be viewed as a magnified version of a Dirac delta function δ(t)

which is defined in the continuous domain. The equivalent version of the Dirac delta in

the discrete domain is known as the Unit Step or Kronecker Delta, as shown in Figure 2.3

⎩⎨⎧

=≠

=0,10,0

][nn

nδ (2.13)

Figure 2.3: Discrete-time unit impulse (sample) (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

The Dirac delta function δ(t) is zero for t ≠ zero, but is infinite at t = 0 in such a way that

its integral is unity. This function is one that is infinitesimally narrow, infinitely tall, yet

integrates to unity. Perhaps the simplest way to visualize this is as a rectangular pulse

from 2ε

−a to 2ε

+a with a height ofε1 as shown in Figure 2.4. As we take the limit of

this, 0lim0

→→

εε

we see that the width tends to zero and the height tends to infinity as the

total area remains constant at one as shown in Figure 2.5. The impulse function is often

written as δ(t) [Selik, M. & Baraniuk, R].

∫+∞

∞−

= dtt)(1 δ (2.14)

18

Figure 2.4: The Dirac Delta Function (Selik, M. & Baraniuk, R., The Impulse Function, Connexions)

Since it is quite difficult to draw something that is infinitely tall, we represent the Dirac

with an arrow centered at the point it is applied.

2.3.3 Fourier Coefficients

The relationship between time and frequency representation are mutual. A sharp

spike in the time domain, represented by a unit dirac delta function, is represented as a

superposition of all frequencies with equal amplitudes in the frequency domain and vice

versa in the time domain [Calvert, J.B]. This is shown in Figures 2.5, 2.6 below

Figure 2.5: Dirac Delta in Time-Domain Figure 2.6: Dirac Delta in Frequency-Domain (Calvert, J.B., Time and Frequency) (Calvert, J.B., Time and Frequency)

The use of complex exponential in the Fourier transform is very convenient, since

complex coefficients generated by it can be expressed using magnitude and phase. As

ε/1

2/ε− 2/ε

19

mentioned earlier, analyzing the square pulse in the frequency domain yields more

insight into this relationship. When we have a signal of certain duration, such as a

rectangular pulse, the frequency representation is no longer like that of the dirac delta.

Interestingly, the frequency response of a rectangular pulse is a sinc function whose

central lobe’s width is inversely proportional to the width of the rectangular pulse. This is

seen in Figure 2.7

⎪⎩

⎪⎨⎧

=

xxc sin

1sin (2.15)

Figure 2.7: Frequency Response of Rectangular Pulse

2.3.4 Derivation of Fourier Coefficients

To confirm the above relationship and see the mathematical beauty behind it, let’s

consider a periodic square wave over one period as shown in Figure 2.8[Oppenheim, A.V

& Willsky, A.S, 1996]

⎪⎩

⎪⎨⎧

<<

<=

2/,0

,1)(

1

1

TtT

Tttx (2.16)

(Calvert, J.B., Time and Frequency)

for x = 0

otherwise,

20

-T0/2 T0/2 -T0 T0

x(t)

t

Figure 2.8: Periodic Square Wave (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

This signal is periodic with fundamental period T and fundamental frequencyTπω 2

0 = .

Due to its periodic nature, let’s analyze the pulse centered at t=0, where –T/2 ≤ t < T/2. It

is this interval over which the integration is performed. Using these limits of integration

we have n=0 and therefore eq. (2.10) becomes

∫−

==1

1

10

21 T

T TTdt

TC (2.17)

As mentioned earlier C0 is interpreted as a dc or constant component, which in this case

equals the fraction of each rectangular pulse during which x(t)=1. For n ≠ 0, eq. (2.10)

becomes:

∫−

−

−− −==1

1

1

1

00

0

11 T

T

T

T

tjkwtjk e

Tjkdte

TC

ωω

This can be rewritten as

⎥⎦

⎤⎢⎣

⎡ −=

−

jee

TKC

TjTj

k 22 1010

0

ωω

ω (2.18)

Noting that the term in the brackets is sin(kωoT1), we can therefore express the Fourier

coefficients as:

21

πω

ωω

kTk

TkTk

Ck)sin()sin(2 10

0

10 == , where ω0T = 2π (2.19)

In Figure 2.9 the coefficients are plotted for a fixed T1 and several values of T.

Although our time domain signals are real, the frequency domain representations may be

complex (Ck coefficients). For this specific example, the Fourier coefficients are real and

consequently, they can be depicted graphically with only a single graph. So, as we

change the interval length of the square wave T, we also in turn change the width of the

rectangular pulse width which affects the width of the center lobe of the sinc function.

The narrower the width of the rectangular pulse the wider the width of the center lobe of

the sinc function becomes, since area under the region has to be conserved.

Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave: (a) T0=4T1; (b) T0=8T1; (c) T0=16T1 (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)

22

2.4 Hilbert Transform

The complex exponential is a vital component of the Fourier and the short time

Fourier transform. It acts as a kernel and extracts phase and magnitude information from

the analyzed signal, therefore it is important to get a good perspective of the exponential

term of the Fourier transform eq. (2.11) and how it is related to the phase. The ability of

the complex exponential to act as a modulator and frequency shifter helps to understand

the filter-bank structure of the short-time Fourier transform (STFT). In eq. (2.11) we

have:

)()( ∫+∞

∞−

−= dtetfF tjωω where:

)sin()cos( tjte tj ωωω +=− (2.20)

This eq. (2.20) can be interpreted as an analytic signal of a cosine.

2.4.1 Analytic Signal

An analytic signal is a complex signal created by taking a signal and then adding

in quadrature its Hilbert Transform. It is also called the pre-envelope of the real signal

[Langton, C, Signal Processing & Simulation Newsletter]. It can be defined as:

∧

+ += )()()( tgjtgtg (2.21)

Substituting cosωt for g(t) in eq. (2.21) we get:

tjetjttg ωωω =+=+ )sin()cos()(

Before we go any further let’s understand the Hilbert transform and how it is related to

the analytic equation.

23

2.4.2 Hilbert Transform Theory

The Hilbert transform is related to the Fourier series, which is a representation of

a signal as a summation of sines and cosines eq. (2.2.3). By analyzing the building blocks

of the Fourier series we can understand the Hilbert transform. In general, the Hilbert

transform acts as a filter that changes the phase of the spectral components depending

on the sign of their frequency. It only effects the phase of the signal and has no effect

on the amplitude [Langton, C, Signal Processing & Simulation Newsletter]. Let’s take a

look at how we have come to this conclusion.

Recall that the Fourier series can be written as:

∑ ∑+∞

−∞=

+∞

−∞=

+=n n

nn tBjtAtx )sin()cos()( 00 ωω (2.22)

where:

C A jBn n n= + and C A jBn n n− = − (2.23)

An and Bn are the spectral coefficients of cosine and sine waves. The phase of the signal

is calculated by

n

n

AB1tan −=ϕ (2.24)

Cosine waves are 90° out of phase compared to sine waves and vise versa. So if a wave is

strictly in terms of cosines then the Bn component of eq. (2.6) is zero, therefore the phase

of the signal is zero. One way to look at the phase is the angle between real and

imaginary axis, which implies the spectral components of the signal to lie on the real axis

as shown in Figure 2.10

24

Figure 2.10: Cosine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

Similarly, the sine terms have its An component of eq. (2.6) as zero, therefore the phase

of the signal is 90°. In other words the phase of the sine terms is not symmetric where it

has a +90 for positive frequencies and -90 for negative frequencies. This symmetrical

concept is clearly presented by the variable Q in Figure 2.11

Figure 2.11: Sine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.4.3 Phase Rotation

In the above section we described important characteristics of sin and cosine

terms in the spectral domain. The term Q is directly related to the phase of the signal. So,

if we were to turn the cosine into a sine, we need to rotate the negative frequency (-Q)

component of the cosine by +90° and the positive frequency component (+Q) by -90° In

v ( t )

t

f

A /2 A/2

Real

[V]

F r equency

[ V]

Magnitud e S p ectru m

-f + f Q+

Q-

A/2 A /2

Spectral Amplitude

v( t )

t

f F r equency

[V]

Real

[V]

Q+

Q-

Spectral Amplitude Magnitud e S p ectru m

A/ 2A/2A/2

A/2

-f +f

25

other words we need to multiply the –Q component by j and the +Q component by –j as

shown in Figure 2.12 [Langton, C, Signal Processing & Simulation Newsletter]

Figure 2.12: Rotating Phasors to create a sine wave out of a cosine (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) Therefore for any signal g(t) its Hilbert Transform is:

G fj for f

j for f^

( ) =− >

<0

0 (2.25)

(The hat over G(f) is a typical way of representing a time domain signal as a Hilbert

Transform)

For example, applying the Hilbert transform on a cosine term gives us a sine term.

Applying it again gives us a negative cosine term and further application gives us a

negative sine term and then at last our original cosine.

ttttt ωωωωω cossincossincos →−→−→→

For this reason Hilbert transform is also called a “quadrature filter” [Langton, C,

Signal Processing & Simulation Newsletter]. As seen in the following Figure 2.13

R e a l

[V ]

A /2A /2

+9 0 °- 90° Q+

Q-

26

Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative frequencies by +90° (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.4.3 Complex Envelope

Based on our knowledge of the analytic signal and the Hilbert transform we can

now analyze the complex exponential. The analytic signal of a cosine, knowing that its

Hilbert transform is a sine, is given by:

tjetjttg ωωω =+=+ )sin()cos()( (2.26)

We know that the spectral components of a cosine term lie on the real axis and the

spectral components of a sine term are asymmetrical in nature and lie on the imaginary

axis (sec. 2.4.2). It is interesting to note that the analytic signal of a cosine (complex

exponential) has its spectral components all in the positive domain of the real axis even

thought it consists of both cosine and sine terms, as shown in Figure 2.14

.

Figure 2.14: Spectral Properties of the Complex Exponential (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)

v ( t )

t

f R e a l

[ V ] s q r t ( 2 ) * s q r t ( 2 ) *

R ea l

[φ ]

- 90 °

+ 9 0°

27

We now can define the complex envelope as:

g t g t e j f tc+ =( )

~( ) 2π (2.27)

where,~

)(tg is the complex envelope of the signal )(tg . Rewriting this equation (eq. 2.28)

and taking its Fourier transform (eq. 2.29) reveals the complex envelope is just a

frequency shifted version of the analytic signal [Langton, C, Signal Processing &

Simulation Newsletter]:

tfj cetgtg π2)()(~

−+= (2.28)

GG f f for f

G for ffor f

f

c~ ( )( )( ) =

− >=

<

⎧

⎨⎪

⎩⎪

2 00 0

0 0 (2.29)

It is this feature that is used in linear system theory, where e-jωt acts as a

modulator. We will see its application in the coming sections of the short-time Fourier

transform. One might ask why do we need this representation, is there an advantage?

Here is an example which shows the advantages of complex envelopes.

2.4.4 Advantages of the Complex Envelope

To illustrate the advantages of the complex envelope let us consider an example

where the signal s(t) is a base-band signal:

ttts 3sin62cos4)( −= (2.30)

(Note: for simplification purposes the 2π factor is omitted)

with phase and magnitude properties are shown in Figure 2.15.

28

Figure 2.15: Spectral Properties of the s(t) (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) Now lets multiply the following signal with cos(100t) to modulate and make it a band-

pass signal to give us:

ttstg 100cos)()(^

=

tttttg 100cos3sin6100cos2cos4)(^

−= (2.31)

It is important to note that the envelope of the modulated signal is the information signal.

In Figure 2.16 the solid line represents this information signal.

Figure 2.16: The Modulated Signal and its Envelope (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) After simplifying the signal using trigonometric identities we will take the Hilbert

Transform of )(^

tg and create its analytic signal. The steps are as follows:

The trigonometric identities used to simplify eq. (2.31) are:

sin cos sin( ) sin( )A B A B A B=

+ + −2

cos cos cos( ) cos( )A B A B A B=

+ + −2

2 3-2-3

23

29

Using these identities we get:

tttttg )1003sin(3)1003sin(3)1002cos(2)1002cos(2)(^

−−+−−++= (2.32)

Applying the Hilbert Transform to each term gives us:

g t t t t t^( ) sin( ) sin( ) cos( ) cos( )= + + − + + + −2 2 100 2 2 100 3 3 100 3 3 100 (2.33)

Now we will create an analytic signal by adding the original signal eq. (2.32) with its

Hilbert Transform eq. (2.33):

)()()( tgjtgtg )+=+

))1003cos(3)1003cos(3)1002sin(2)1002sin(2()1003sin(3)1003sin(3)1002cos(2)1002cos(2)(

ttttjtttttg

−+++−+++−−+−−++=+

(2.34)

Rearranging eq. (2.33) using the Euler representation gives us:

( ) tj

etttg100

3sin62cos4)( −=+ (2.35)

It is interesting to see that eq. (2.4.12) and eq. (2.4.7) are similar except for the

modulator (ejωt). On analyzing the complex envelope s(t) and the analytic signal eq.

(2.35) in the frequency domain it becomes clear why this representation is advantageous

when the analytic signal is viewed as a pass-band signal. Now taking the Fourier

Transform of eq. (2.35) and (2.30), It is clear that the analytic signal based on the

sampling theorem needs a higher bandwidth than the complex envelope. So, with this

method of coding it is easier to separate the information signal s(t) from the carrier. This

concept is used in time to frequency transforms when STFT is represented as a bank of

filters or pass-bands.

30

Spectrum of the Complex Envelope s(t) Spectrum of the Analytic Signal

Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.5 Summary

In this chapter we have seen the basic building blocks that are required to

represent a signal. We have seen that a complex function can be represented through its

simple building blocks, also known as the basis function. Using these building blocks we

form a compressed representation of a complex function. One generalized view of a

complex function is of the form:

( ) ( )ii

i FunctionSimpleweightFunctionComplex •= ∑

In our case sinusoids (complex exponential) are the building blocks of the Fourier

transform, where for each frequency of the complex exponential the sinusoids at that

frequency are compared to the signal. Based on the analysis the frequency correlation are

determined. The spectral coefficients are high if the correlation is high and vice versa.

Along with this we have also seen how the complex exponential term acts as a modulator

(frequency shifter) and how it affects the signal as a complex envelope. This gives us

insight into how the Fourier transform can be viewed as linear system. Taking this

knowledge we shall proceed to the next chapter where we will discuss the short-time

Fourier transform and the conditions that gave rise to such a concept.

4 6

F r eque nc y 102

F r e qu en cy

4

6

10332

31

Chapter 3: Time to Frequency Mapping

3.1 Quadrature Mirror Filter (QMF)

The QMF filter-bank gained its popularity with the introduction of decimators and

expanders in its structure. The system was introduced in the mid seventies [Croisier, et

al., 1976] and has been since studied by researchers. A simple QMF filterbank consists of

two banks, typically low-pass and high-pass bandlimted to a total width ofπ . The input

signal x(t) is then filtered by H0(z) and H1(z) which are further decimated by a factor of 2

(down-sampled by factor 2, where odd samples are removed and even ones are kept)

resulting in )(0 nv and )(1 nv . These decimated signals are then sent through expanders

(up-sampled by the same factor as decimated, in our case 2) which are passed down to

filters F0(z) and F1(z) whose purpose is to cancel all types of distortions.

Figure 3.1: QMF filter-bank

There are four types of distortions caused by the filterbank structures. They are

aliasing, amplitude distortion, phase distortion and quantization effects. It was found that

in the case of M channel filter banks, the conditions for alias cancellation and perfect

reconstruction are much more complicated. This was the reason pseudo QMF techniques

were introduced [Nussbaumer, 1981], as means of approximating alias cancellation.

Vetterli and Vaidyanathan later showed that the use of polyphase components leads to

considerable calculation simplification in filter-bank theory. A technique for the design of

)(0 zH

)(1 zH )(1 zF

)(0 zF

2

2 2

2

)(nx

)(^

nx

Analysis Synthesisbank bank

)(0 nx )(0 nv

)(1 nv)(1 nx

)(0 ny

)(1 ny

32

M channel perfect reconstruction systems was developed [Vaidyanathan, 1987a,b], based

on polyphase matrixes with the so-called paraunitary property. This same property also

finds application in the theory of orthonormal wavelet transforms [Vaidyanathan, P.P,

1993].

3.2 Aliasing and Imaging

In theory, a perfect ideal filter is one that has high stop-band attenuation which

has the least aliasing effects. Aliasing can be defined in a broad sense as frequency

confusion cause by decimation of a signal. The decimation (down-sampling) process

causes overlap between the adjacent sub-bands, as shown in Figure 3.2

Figure 3.2: Aliasing

When substantial energy for a bandwidth exceeds the ideal pass-band region,

aliasing has greater effect on the integrity of the signal. In principle it is possible to

choose filters that do not overlap, but this causes severe attenuation in the region of no

overlap. Boosting frequencies in that region will result in severe amplification of noise

(coding noise, channel noise, filter roundoff noise). A solution to this problem might be

increasing the filter order but this can be expensive computationally. The overlapping

response is therefore more practical. Even though this causes aliasing, the effect can be

cancelled by carefully designing the synthesis filters [Vaidyanathan, P.P, 1993].

Let’s examine Figure 3.1 in the z-domain to get a better understanding of the process.

The input signal x(n) can be expressed as:

ω 0

2π π

)(1 zH)(0 zH

overlap

33

)()()( zXzHzX kk = (3.1)

where k = 0,1. The z-transform of the decimated signal )(nvk can be expressed as:

⎥⎦

⎤⎢⎣

⎡+=

−)()(

21)( 2

121

zXzXzV kkk (3.2)

for k = 0,1. Upsampling the decimated signal yields:

)]()([21)()( 2 zXzXzVzY kkkk −+== (3.3)

The reconstructed signal is

)()()()()( 1100

^zYzFzYzFzX += (3.4)

Substituting eq. (3.2.3) in eq. (3.2.4) we obtain:

[ ] [ ] )()()()()(21)()()()()(

21)( 11001100

^zXzHzFzHzFzXzHzFzHzFzX −−+−++=

In matrix form

[ ]43421444 3444 21)(

1

0

)(

10

10^

)()(

)()()()(

)()()(2

zfzH

zFzF

zHzHzHzH

zXzXzX ⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡−−

−= (3.5)

Here the matrix H(z) is know as the alias component matrix. It can be noted that

)()( )( πω−=− jeXzX takes into account aliasing due to decimation and imaging due to

expanders. So, it is clear that we can cancel aliasing by choosing the filters such that the

quantity )()()()( 1100 zFzHzFzH −+− is zero. Implying that )()( 10 zHzF −= ,

)()( 01 zHzF −−= in order to satisfy the above condition.

34

3.3 Distortion Transfer Function

Apart from aliasing, amplitude distortion and phase distortion are related to the

distortion transfer function T(z). The distortion transfer function or ‘overall’ transfer

function is defined as:

[ ])()()()(21)( 1100 zFzHzFzHzT += (3.6)

and is related to the input signal:

)()()(^

zXzTzX = (3.7)

Expressing T(z) in terms of magnitude and phase:

)()()( ωφωω jjj eeTeT = (3.8)

we can represent eq. (3.7):

)()()( )(^

ωωφωω jjjj eXeeTeX = (3.9)

If )( ωjeT ≠ 0 then we have amplitude distortion. Similarly if T(z) does not have linear

phase, X(z) suffers from phase distortion.

3.4 Polyphase Representation

Examining the matrix representation eq. (3.5) we can in principle cancel aliasing

by solving for the synthesis filters from )()()( 1 ztzHzf −= , where )()()( zfzHzt = but

this results in calculating f(z) explicitly as:

)()(det)()( zt

zHzAdjHzf = (3.10)

This is possible unless the determinant is not equal to zero. Also the zeros of the quantity

det H(z) are related to the analysis filters Hk(z) in a very complicated manner, thus

making it difficult to ensure if they are inside the unit circle which is necessary for

35

stability of Fk(z) [Vaidyanathan, P.P, 1993]. It is for this reason that the polyphase

representation comes in handy.

3.4.1 Perfect Reconstruction

We now can express Hk in eq. (3.1) as an M channel filter-bank, where:

∑−

=

−=1

0)()(

M

l

Mkl

lk zEzzH (3.11)

Similarly, the synthesis filters Fk can also be expressed as:

)()(1

0

)1( MM

llk

lMk zRzzF ∑

−

=

−−−= (3.12)

Examining eq. (3.11) and eq. (3.12) we can generate the matrix E(z) and R(z), which are

related to each other as:

IzEzR =)()( (3.13)

Our aim is to obtain the reconstructed signal unchanged, and this is only possible if each

matrix nullifies the effect of the other. In other words the product of the two polyphase

matrix equals an identity matrix. The condition still holds if we replace eq. (3.13) with:

IczzEzR m0)()( −= (3.14)

Since, the output of the filter-bank structure is just the delayed version of the input, we

can make this kind of modification to the equation. A matrix representation of this case

eq. (3.14) is:

⎥⎦

⎤⎢⎣

⎡= −

−−

00

)()( 10 rM

r

m IIz

czzEzR

The reconstructed signal is of form )()( 0

^nncxnx −= , where n0 = Mm0 + r + M-1 for

some integer ‘r’ with 0 ≤ r ≤ M-1 and m0.

36

3.5 Paraunitary Property

In the above section we expressed an M channel filter in terms of polyphase

matrices E(z) and R(z). It should be noted that if these filters are FIR and the filter-bank

has perfect reconstruction property, then the polyphase matrix E(z) has to satisfy the

condition that the determinant of E(z) must be a delay [Vaidyanathan, P.P, 1993] where:

kzzE −= α)(det , α ≠ 0, K= integer (3.15)

A causal transfer matrix H(z) is said to be lossless or paraunitary if (a) each entry Hkm(z)

is stable and (b) H(ejω) is unitary.

Before we examine the paraunitary property we must understand what a unitary matrix is.

3.5.1 Unitary Matrix

A complex matrix A is said to be unitary if A§A = I, for example:

A = ⎢⎣

⎡i1

21 ⎥

⎦

⎤−−

1i A* = ⎢

⎣

⎡− i1

21 ⎥

⎦

⎤−1i

(3.16)

A‐1 = ⎢⎣

⎡−−

i1

21 ⎥

⎦

⎤1i A§ = A*T = ⎢

⎣

⎡i1

21 ⎥

⎦

⎤−−

1i (3.17)

then A§ = A‐1

This property complements the paraunitary property which states that a matrix function

H(z) is said to be paraunitary if it is unitary for all values of the parameter z

IzHzH T =− )()( 1 , for all z ≠ 0 (3.18)

eq. (3.18) define a lossless system to be causal, stable and paraunitary.

37

3.6 Summary

3.6.1 Advantages of Paraunitary Filter Banks

The advantages of applying the paraunitary property to E(z) is that no matrix

inversion is involved in the design. The synthesis filters are FIR and have the same length

as the analysis filters, and can be obtained by time-reversal and conjugation of the

analysis filter coefficients. If the paraunitary matrix E(z) is implemented as a cascade

structure then the perfect reconstruction property still holds in spite of multiplier

quantization. The cascade paraunitary structure also ensures that the computational

complexity is low. Last but not the least the filter banks with paraunitary E(z) can be used

to generate an orthonormal basis for wavelet transforms. The orthonormal basis property

is discussed in Chapter 6 which leads to orthonormal basis tree structured filter banks,

also know as wavelet packets.

38

Chapter 4: Short-Time Fourier Transform

In section 2.3.3 we observed the relationship between time and frequency

representation. We observed that if an impulse in the time domain was viewed as a very

narrow rectangular pulse then its frequency representation would be a sinc function

whose central lobe is affected by the width of the impulse in the time domain. This in

turn reflects on the localization property of the Fourier transform which rejects the notion

of “frequency that varies with time.” According to Fourier analysis, a single frequency is

always associated with infinite time duration, as shown in Figure 4.1. To deal with this

time localization problem, the sampled signal can be windowed.

The basic mechanics of the discrete Fourier transform is to multiply the analyzed

signal with an impulse train of a certain sampling frequency, abiding by the sampling

theorem. Assuming the sampled signal is harmonic over N samples we then window the

digitized signal. Starting with the fundamental frequency we multiply the signal by a

complex exponential and perform summation (calculating the area under the curve) of the

result. Recall that the Fourier transform equation is the summation of ftjetf π2)( − over an

interval. This term can be interpreted as the block diagram, in Figure 4.1, where the

cosine and sin multipliers are part of the complex exponential.

)(tf

)2cos( tπ

)2sin( tπ

)2cos()( ttf π

)2sin()( ttf π

Figure 4.1: FFT Block Diagram

39

When the cosine and sine multipliers are multiplied, the area of the resultant

signal is considered. If the resulting area is zero then there is no correlation. Harmonic

frequencies of sine and cosines are multiplied and Fourier coefficients are thus derived.

Though this approach may seem trivial, the discrete Fourier transform has drawbacks

based on it formulation. For it work flawlessly it must have a sample space ranging from

negative infinity to positive infinity, which is not practical. So, in order to tackle this

problem, a discrete set of samples was windowed and then applied to the Fourier

transform. The window is then shifted in uniform amounts and the above computation is

repeated. This is also known as the short-time Fourier transform.

4.1 Analysis of the STFT Equation

The short-time Fourier transform consist of three main components. The signal to

be analyzed x(n), the window function v(n), and the Fourier transform kernel or the basis

function nje ω− . First the signal is multiplied with the window signal, which is typically of

a finite duration. After this is done the kernel is applied to the product x(n)v(n) to

calculate the Fourier transform. The window is then shifted and the process is repeated

again.

nj

n

jSTFT emnvnxmeX ωω −

∞

−∞=

−= ∑ )()(),( (4.1)

The function ),( meX jSTFT

ω has two variables ω and m. The frequency variable ω is

continuous and ranges from ‐π ≤ ω < π. The shift variable m is typically an integer

multiple of some fixed integer. Essentially the window captures features of the signal

around m and helps to localize time domain data.

40

4.2 STFT as a Bank of Filters

Since most signal processing is done using linear time-invariant signals, it is

beneficial to explore this representation of STFT. Furthermore, this interpretation helps

us to generalize the STFT to obtain more flexibility [Vaidyanathan, P.P, 1993]. In section

2.4.3 we discussed the complex envelope and showed how the complex exponential acts

as a modulator that performs a frequency-shift. More specifically, it shifts the Fourier

transform towards the left by cf . The STFT can be looked as a bank of band-pass filters.

The basic block diagram of a single frequency channel as shown in Figure 4.2

Figure 4.2: STFT Represented in terms of a Linear System

To gain further insight, of this let’s modify the eq. (4.1) by multiplying it with mje ω− :

)()()(),( nmj

n

mjjSTFT emnvnxemeX −−

∞

−∞=

− −= ∑ ωωω (4.2)

This equation represents an LTI system as shown in Figure 4.3, where m

represents the center of the STFT window. Although k is not mentioned in the above

equation it is related to m in such a way that m is the integer multiple of k. So, if the

window were to shift it would be from v(n), v(n-k), v(n-2k) and so on. In this example let

k = 1, so the output is constant like a traditional Fourier transform. The impulse response

of the LTI system is a band-pass filter of the form njenv 0)( ω− whose frequency

representation is )( )( 0ωω−− jeV . The output sequence t0(n) is therefore a band-pass filter,

∑∞

∞−

)(ns)(nx

njemnv 0)( ω−−

),()( 0 meXny jSTFT

ω=

41

whose pass-band is centered around ω0. The modulator acts as a frequency shifter which

re-centers the frequency response around zero [Vaidyanathan, P.P, 1993].

Figure 4.3: Rearranged STFT Representation in terms of a Linear System

Examining this in the frequency domain we see that the STFT reduces to a

filterbank with M band-pass filters with response )()( )( kjjk eVeH ωωω −−= , as shown in

Figure 4.4. The pass-band of )( ωjk eH is centered around ωk, where k = 0, 1, 2 ..., M-1

Figure 4.4: STFT viewed as a Filter-Bank

4.3 Effects of Windowing

Unlike the traditional Fourier transform the STFT is uniquely defined based on

the type of window chosen v(n). The choice of window governs the tradeoff between

time localization and frequency resolution. It is interesting to see that as the window

function gets wider the frequency information gets more localized and vice versa. Figure

4.5 shows how a small window of 512 samples has wider lobes compared to the larger

window of 2048 samples. This confirms our previous statement that wider windows have

better frequency resolution or better information localization in the frequency domain.

)(nx

)(0 nt

nje 0ω−

),()( 00 neXny j

STFTω=njenv 0)( ω−

(modulator)

ω

0H 1H 2H 1−MH 0H

1−Mω2ω1ω0ω

π2

42

Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples

4.3.1 Choice of the Best Window

Earlier we know that a narrow window in the time domain leads to a broader

frequency transform and vice versa. To make this concept more precise the rms (root-

mean squared) duration of a signal was introduced [Gabor, 1946], [Papoullis 1977a]. The

two non-negative quantities Dt and Df are defined as the duration energy in the time and

frequency domain:

∫+∞

∞−

= dttvtE

D t )(1 222 (4.3)

∫+∞

∞−

ΩΩΩ= djVE

D f222 )(

21π

(4.4)

Where E is the window energy, that is ∫= dttvE )(2 and v(t) is the signal in the

time domain. Interesting enough, the rms duration of a triangular waveform is smaller

than that of a rectangular one even though they both share the same duration. This is

because of the t2 factor in the definition of D2t increases for non-zero values of v(t). It

turns out based on the uncertainty principle the product of ft DD cannot be arbitrarily

43

small. ft DD ≥ 0.5 if and only if 0,)(2

>= − ααtAetv . Therefore, the optimal window

would take the form of a Gaussian waveform, with its ideal length being infinity

[Vaidyanathan, P.P, 1993].

4.4 Summary

In this chapter we have seen the transition of the traditional Fourier transform to

the short-time Fourier transform. We have come to realize that the windowing function is

what uniquely defines the STFT, which underlines its weakness. The STFT is the result

of the evolution of the Fourier transform in order to gain better flexibility in localizing

time and frequency. In the next chapter we shall explore a new way of analyzing a time

signal in order to gain better frequency resolution.

44

Chapter 5: The Wavelet Transform

The short-time Fourier transform is a convenient way to analyze the frequency

information of a signal. However, we know that audio signals and most of the signals that

exist in the real world are very dynamic in nature. Information can be hidden by means of

modulation. To get a better understanding of the weakness of the STFT, let’s consider

Figure 5.1 which shows two cases.

5.1 Weakness of the STFT

In the first case, x(t) is a high-frequency signal and v(t) is the window function. It

is apparent from part (a) of the above figure that the window function captures many

cycles of the input signal x(t) as compared to part (b). Thus, the accuracy of the estimated

Fourier transform is poor at low frequencies, and improves as the frequency increases

[Vaidyanathan, P.P , 1993]. To gain more information about the signal it would be

appropriate to have a window whose width adjusts with the frequency of the input signal.

An ideal filter bank structure would have narrow bandwidths (wider windows) at low

frequencies and wider bandwidths (shorter windows) at high frequencies. Keeping this

)()( tvtx

)(tv

)(a )(b

Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed function v(t)

tt

45

concept in mind one can tackle this problem by replacing the window function v(t) with a

function of both frequency and time, so that the time domain window gets wider (narrow

bandwidth) as frequency decreases and vice versa. This way the window function

captures the same amount of zero crossings of the input signal irrespective to the change

in frequency. Furthermore, as the window gets wider, it is also desirable to have wider

step sizes for moving the window. This also means that the decimation ratio also

increases as you go higher in frequency.

5.2 STFT to Wavelets

When the STFT is viewed as a bank of filters it consists of band-pass filters of

equal bandwidth, which are obtained by modulating component mje ω− . It is this that

restricts the time resolution of the STFT. To overcome this, one must abandon this

modulation scheme and replace it with a function of both frequency and time, Thus

obtaining filters )(thk where a is greater than one and k is an integer:

)()( 2/ tahath kkk

−−= (5.1)

Here k plays the role of frequency, thus frequency scaling the response rather than

frequency shifting it (STFT case). The scale factor 2/ka − in eq. (5.2.1) is meant to ensure

that the energy dtthk∫∞

∞−

2)( is independent of k. An equivalent representation of eq. (5.1)

in the frequency domain can be written as:

)()( 2/ Ω=Ω kkk jaHajH (5.2)

We know the ear can be viewed as a set of non-linear band-pass filters whose frequency

resolution decreases with the increase in frequency. Based on this analogy let’s assume

)( ΩjH as a band-pass filter with cutoff frequencies α and β. Since discrete systems are

46

efficiently described in powers of two let’s assume a = 2 and β = 2α. We can define the

center frequency to be the geometric mean of the two cutoff edges, that is [Vaidyanathan,

P.P, 1993]:

222 kkk

−− ==Ω ααβ (5.3)

5.2.1 Modification of the STFT

If we consider the continuous version of eq. (4.1) we get:

dtthtxejX kkSTFTk )()(),( −=Ω ∫Ω− ττ τ (5.4)

here τke Ω− is the kernel, Ωk is the frequency (analog domain) and hk are the filters.

Keeping this equation structure we substitute eq. (5.1) in eq. (5.4) and get:

dttahtxea kjk k ))(()(2/ ∫∞

∞−

−Ω−− −ττ

(5.5)

Now we know from our earlier discussion that the bandwidth of Hk(jΩ) gets

smaller as k (frequency) increases. With this varying bandwidth the windowing (time

domain) is also affected. As the window size varies one must account for the step sizes,

so we replace the continuous variable τ with Tnak , where n is an integer. This means

that the step size for window movement is Tak and it increases with k. In other words the

window movement increases as the center-frequency Ωk of the filter decreases.

eq. (5.5) takes care of the first case of changing bandwidths. Removing the kernel and

taking account of the step size ( Tnak ) as frequency resolution (bandwidth) changes we

can modify eq. (5.5) and get:

∫∞

∞−

−= dttTnahtxnkX kkDWT )()(),(

(5.6)

47

Note the above equation represents the convolution between x(t) and hk(t)

evaluated at a discrete set of points nakT. In other words, the output of the convolution is

sampled with spacing akT [Vaidyanathan, P.P, 1993]. To summarize the fundamental

differences between the STFT and Wavelet transform one can look at the time-frequency

plot. The STFT has uniform time and frequency spacing whereas the wavelet transform,

the frequency spacing gets smaller at lower frequencies, and the corresponding time

spacing get larger [Vaidyanathan, P.P, 1993], as shown in Figure 5.2

Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b) Vaidyanathan, P.P, “Multirate Systems and Filter Banks”

The beauty of wavelets is that the wavelet transform is not explicitly implemented by a

moving window because there is in reality no unique window, as seen in equation (5.6).

The system is in essence a filter bank, and is somewhat analogous to the family of

windows.

Another general form of the wavelet transform is:

∫∞

∞−⎟⎟⎠

⎞⎜⎜⎝

⎛ −= dt

pqtf

pqpX CWT

1),( (5.7)

t (time)

Ω (frequency) Ω1 2Ω1 ...

3T 2T T

Ω0/4 Ω0/2 ...

t (time)

Ω (frequency)

4T 3T 2T T

(a) (b)

48

here p and q are real-valued continuous variables, where kap = , Tnaq k= and f(t) = h(-

t). This is known as the continuous wavelet transform. The variable p can also be

considered as a scaling function, where the scale factor of a wavelet is inversely related to

the frequency, in other words the larger the scale of the wavelet the lower the frequency

of the wavelet, the narrower the bandwidth and vice versa. The variable q can be looked

upon as the translation parameter which is responsible for the shifting of the wavelet. It’s

step size movement increases as the value of k increases. This can be seen in Figure 5.3

Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©

5.3 Inversion of the Wavelet Transform

The original signal x(t) is reconstructed from the wavelet coefficients. The

reconstruction of XDWT depends on the filter h(t), and the parameters a and T which

completely characterizes the transformation. Changing a will change the spacing of the

49

band-pass filters and the frequency resolution of filter-banks as described earlier in

section 5.2. If the inverse transform exists it appears as:

∑∑=k n

knDWT tnkXtx )(),()( ψ (inverse DWT) (5.8)

5.4 Orthonormal Basis

A subset v1,.....vk of a vector space V, with the inner product is called

orthonormal if <vi,vj> = 0 when i ≠ j. That is when the vectors are mutually

perpendicular. Moreover, they are all required to have length one: <vi,vi> = 1. An

orthonormal set must be linearly independent (linear combination of functions cannot be

expressed equal to zero) so that it is a vector space basis for the space it spans. Such a

basis is called a orthonormal basis [Eric W. Weisstein et al].

Of particular interest is the case where )(tknψ is a set of orthonormal funtions,

where the integral of the basis )(tknψ and its conjugate is equal to unity.

∫∞

∞−

−−= )()()()(kn* mnlkdttt lm δδψψ (5.9)

applying the orthogonality property to eq. (5.6):

dtttxnkX DWT )()(),( kn*ψ∫

∞

∞−

=

we conclude:

)()( *2/ tanThat kkkn

−− −=ψ (5.10)

)(* tnTah kk −= (5.11)

But we have )()( tft =ψ so that, in the orthonormal case, )()( * thtf −= thus

50

)()( * thtf kk −= which is very similar to the perfect reconstruction paraunitary QMF

banks.

5.5 Wavelet Packet Analysis

Wavelet Packets are smooth versions of Walsh functions [Coifman, R, Ronald &

Wickerhauser, V, Mladen]. Walsh functions consist of trains of square pulses with -1 and

+1 states, such that transitions may only occur at fixed intervals of a unit time step, the

initial state is always +1.

It is a generalization of the wavelet decomposition that offers a rich range of

possibilities to analyze a signal. The wavelet packet analysis is a tree structured filter

bank that splits the signal in two sub-bands, and after decimating, each sub-band is again

split into two and decimated. The sub-bands are then recombined, two at a time, by use of

two-channel synthesis banks. Each node in the tree structure represents a subspace of the

original signal. Each subspace is the orthogonal direct sum (direct sum of two subspaces)

of its two children nodes. The leaves of every connected subtree give an orthonormal

basis. This procedure permits the segmentation of acoustic signals into those dyadic

windows best adapted to the local frequency content [Coifman, R, Ronald &

Wickerhauser, V, Mladen]. The low-pass sub-band of the decomposition is known as

approximation and the high-pass sub-band of the decomposition is known as detail. For

an n-level decomposition, there are n+1 possible ways to decompose the signal [Matlab

documentation, wavelet packet analysis]. A three level wavelet decomposition tree is

shown in Figure 5.4 where, the signal S can be reconstructed by adding the

approximation and its previous details.

51

S = A1+D1

= A2+D2+ D1

= A3+D3+ D2+D1

Figure 5.4: 3-level Wavelet decomposition tree

5.5.1 Discrete Wavelet Transform

The information achieved from the continuous wavelet transform (CWT) is often

redundant in nature. In fact CWT computed by computers is actually discretized versions

of itself.

∑∞

∞−

− −= dttTnahtxnkX kk

kDWT )()(2),( 2/ , k, n are integers (5.12)

An elegant way to represent the non-redundant data in the time-frequency plane

would be to sample the plane on a dyadic (octave) grid. This representation is ideal since

it maps the frequency spectrum on a logarithmic scale similar to that of the ear. The

dyadic sampling of the time-frequency plane can be achieved by series of up/down

sampling operations. This approach gives us a multi-resolution representation as seen in

Figure 5.5

1A

1A

1A

S

1D

2D

3D

52

5.6 Wavelet Packet Tree Representation

The wavelet packet tree can be represented in many ways. Among them the most

important to us are the Energy Representation, Index Representation, and the Depth

Representation.

5.6.1 Energy Representation

The energy representation of the wavelet decomposition tree displays the energy

of each node in the tree. The frequency response of the wavelet plays an important role

on how the energy is distributed along the wavelet decomposition tree which is

determined by the cut-off frequency of the wavelet. The pseudocode methodology of

calculating the energy of the wavelet tree is as follows:

1) get all coefficients of the parent node

2) calculate the total energy of the parent node

∑=n

nTotal CE 2 , where Cn are the wavelet coefficients (5.13)

3) get terminal nodes

4) calculate the energy of the terminal nodes

∑=n

nalNodesTer CE 2min , where Cn are the wavelet coefficients of the terminal nodes

Ω (frequency)

α/4 α/2 α β=2α

1 2

2

Magnitude Sampleat nT

Sampleat 2nT

Sampleat 4nT

H0

H1

H2

x(t) XDWT(0,n)

XDWT(1,n)

XDWT(2,n)

Wavelet Coefficients

Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of discrete wavelet transform

53

5) Represent the energy in terms of a percentage

Total

alNodeTeralNodeTer E

EE min

min*100

=

Below is the energy representation of a 1kHz signal, analyzed using a Haar wavelet at

depth level 3.

Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal

5.6.2 Index Representation

The index representation represents the index value of the each node in the tree.

The values progress from left to right in an ascending order. It is important to note that

the order of the index node does not change even if one of the nodes is not decomposed

in to its children. Figure 5.7 is an index representation of a 1kHz signal with node 5 not

decomposed.

54

Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal

This type of representation is important in terms of coding the detection algorithm which

relies on keeping track of the terminal nodes (7,8,9,10,5,13,14).

5.6.3 Filterbank Representation

One can also look at the wavelet tree decomposition as a tree of filter banks where

each parent node is split in to its low-pass and high-pass. These nodes in turn can also act

as parent nodes and so on. Figure 5.8 shows us the mapping of the wavelet tree in Figure

5.9

Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree

AA2

A1 D1

DA2 AD2 DD2

DAA3 ADA3 DDA3 AAD3 DAD3 ADD3 DDD3

Magnitude

AAA3

Frequency

55

Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree

The discrete wavelet packet tree obtains it multi-resolution representation by

sampling the time-frequency plane on a dyadic (octave) grid. This is done by down

sampling by a factor of two in the analysis stage and up-sampling by a factor of two in

the reconstruction stage. Figure 5.10 is a discrete wavelet packet tree in its analysis stage.

Note that the amount of samples decreases as the depth of the wavelet tree increases, also

the bandwidth of the filters decreases by half during each decomposition.

Figure 5.10: Discrete wavelet packet tree (analysis stage)

1A

2AA

3AAA

S

1D

2DA

3DAA 3DDA3ADA

2DD

3DDD3ADD

2AD

3AAD 3DAD

Length: 512B: 0 ~ π

g[n] h[n]

g[n] h[n]

g[n] h[n]

2

d1: Level 1 DWTCoeff.

Length: 256B: 0 ~ π/2 Hz

Length: 128B: 0 ~ π /4 Hz



…a3….

Length: 64B: 0 ~ π/8 HzLength: 64

B: π/8 ~ π/4 Hz

22

22 22

2222

|H(jw)|

wπ/2-π/2

|H(jw)|

wπ/2-π/2

|G(jw)|

wπ-π π/2-π/2

a2

a1

Level 3 approximation

Coefficients

Length: 256B: π/2 ~ π Hz

Length: 128B: π/4 ~ π/2 Hz

56

Chapter 6: Analysis and Results

To estimate tonality it is important to understand the masking phenomenon. In

regards to masking we are more concerned with instantaneous masking or frequency

masking. Data suggest that a pure tone and narrow-band noise of equal intensity and

equal loudness have different masking ability. Narrow band noise in particular is

complicated by the fact that a reduction in bandwidth is accompanied by a decrease in the

rate of intensity fluctuations [Stevens, 1956]. Bos and de Boer [1966] point out that the

slow rate of intensity fluctuations inherent in the structure of narrow-band noise increases

its ability to mask. Young and Wenner also indicated that the 20-dB difference between

the masking effect of a tone and a narrow critical band noise disappears when the pure

tone is replaced by a tone that is frequency-modulated at a rate of 25 per Hz. More

research is needed to see how frequency-modulated tones can be used as partial maskers

[Hellman, R.P, Harvard University, Cambride, Massachusetts 02138].

In previous work [Johnston, J.D, 1988] of tonality estimation spectral flatness

measure was used to interpolate between the masking threshold formulas [Hellman, R.P,

Harvard University, Cambride, Massachusetts 02138] and [Scharf, B, 1970]. The

problem arises with the notion of global tonality [Johnston, J.D, 1990]. Signals such as

speech have “tonal” parts and “noisy” parts of considerable energy at high frequencies.

The resultant unpredictability measure will not show the parts of the signal that are very

tonal (due to the fixed block size of the transform). Tonality by definition, in terms of

perceptual coding, estimates the amount of masking a signal can achieve based on its

type (tonal or noisy). The Tonality Index on the other hand is a global value that

characterizes a signal’s tonality based on its correlation information. The wavelet packet

57

analysis is suitable for this task because one can control the bandwidth of the frequency

bands thus making it possible to detect transients (change in frequency, attack regions)

more accurately.

Our proposed model uses this analysis tool to estimate tonality and uses Fourier

analysis to determine signal levels (SPL) and spread signal levels. A general block

diagram is shown in Figure 6.1

Figure 6.1: General block diagram of the proposed model

6.1 Detection Scheme

The proposed detection scheme relies on the flow of energy, which is analyzed

using the wavelet tree decomposition. Each audio frame is considered to have an energy

value of 100 and is decomposed to the first level as shown in Figure 6.2. The energy

ratios of the child nodes (low-end and high-end) are then calculated and compared to the

Node Reconstruction

SignalLevels (eb)

Spread SignalLevels (ecb,,en,nbb,nbw)

Tonality Estimator

Tonality Indices (tbb)

FFTWavelet PacketAnalysis

Node FrequencyMapping

Detector

Masking Levels (thrw)

SMR per Sub-band

58

parent node, which are compared to a threshold ratio (in our case 1.0 ≤ ratio<2.4). Nodes

with ratios in this range are further decomposed.

Figure 6.2: level-1 Wavelet Packet Decomposition of a signal having multiple tone (4kHz, 10kHz, 15kHz)

A signal having multiple tones (4kHz, 10kHz, 15kHz) would have a decomposition tree,

as shown in Figure 6.3, where the indicate the nodes that the detector has detected.

Figure 6.3: level-3 Wavelet Packet Decomposition of multiple tones (4kHz, 10kHz, 15kHz)

It is important to know the frequency response of the wavelet being used for

decomposition, since that determines the cut-off frequency. The energy distribution is

determined by the wavelet coefficients the wavelet generates and in order to map this

(wavelet decomposition tree) to the frequency axis one must know where it cuts off the

59

frequency. The filterbank representation in the previous chapter is a good way to picture

the frequency distribution in the decomposition tree.

This thesis uses the simplest case which is the Haar and Daubechies 1 wavelet, with a

cut-off frequency that is half of Nyquist.

6.1.1 Frequency Breakdown

The resolution of the detector is based on how far it is allowed to decompose. The

level of decomposition is also known as the depth of the wavelet tree. At present the

threshold for the depth is set to 5, which means that the wavelet decomposition can

produce 62 nodes. The concept of the frequency breakdown is analogous to how the

filterbank (wavelet) splits the energy spectrum. The algorithm checks for the energy

ratios and splits the node with the highest energy, which means it will split nodes having

energies above 95% and will stop splitting nodes below 23%.

Energy residing between these percentage criteria’s usually is a strong indicator

of a tone. For example a tone present in one of the sub-bands could be detected along its

adjacent as one decomposes a node (decreases the bandwidth), this usually is seen as

energy being shifted from an approximation (low-pass) to a detail (high-pass) sub-band

or vice versa. It is at this stage that the detector has successfully detected a tone and this

is also known as frequency breakdown, as shown in Figure 6.4

60

6.1.2 Detector Pseudocode Methodology

The detector code methodology is set up in such a way that there are four pointers

(AA, AD, DD, DA). Two of the pointers are responsible for keeping track of the

offspring nodes of the approximations (low-end, AA, AD) and the other two for the

details (high-end, DD, DA). These pointers point to values that contain parent to child

node energy ratios and are updated whenever a node is split. A variable keeps track of the

energy of the child nodes; it ensures that the algorithm picks the highest energy node for

the decomposition tree path. Figure 6.5 is a 2nd level decomposition tree with pointers

(AA, AD, DD, DA) pointing to the corresponding wavelet tree branches.

Figure 6.4: level-3 Wavelet Packet Index Tree and the Coefficients of the Terminal Nodes

Figure 6.5: level-2 Wavelet Packet Energy Tree Detector Code Pointers

61

As nodes are being split, terminal nodes of the decomposition tree are stored. This

information is important since one can trace back the path the decomposition tree takes

and send the desired nodes to the tonality analyzer. The tracing of the nodes is done by

examining the energy-difference and the nodes that correspond to it. A simple formula

generates the parent nodes from it child nodes:

21−

=CNPN (6.1)

A condition is set if the child node is even, that is decrementing it by one and then

applying equation 6.1 to it. This is shown in Figure 6.6

Figure 6.6: level-2 Wavelet Index Tree used to trace the Nodes that are sent to the tonality analyzer

6.1.3 Detection Process

For an input signal containing three tones (4kHz, 10kHz, 15kHz) the detector first

analyzes the signal’s mid and high range of frequencies. In other words nodes (4), (5) and

(6) are analyzed, as shown in Figure 6.7.

62

Figure 6.7: level-2 Wavelet Packet Energy Tree Detector Stage-I: nodes (4), (5) and (6) are analyzed first

It then calculates the energy ratios based on the threshold ratio criteria (1.0 ≤

ratio<2.4) and splits the nodes that meet this criteria. The decomposition tree will stop

splitting once the tree reaches a depth of 5, this condition also in turn defines the

frequency resolution of the wavelet decomposition tree. This is seen in Figure 6.8, where

the green lines are the nodes that are selected by the detector for synthesis or

reconstruction. It is these nodes that are passed on the node reconstructor.

Figure 6.8: level-4 Wavelet Packet Energy Tree Detector Stage-II: nodes (4), (5) and (6) are analyzed first; green lines represent the nodes that are going to be analyzed by the tonality analyzer

63

Once the decompositions of the mid and high (nodes 4,5,6) range frequencies are

done; the algorithm further analyzes and decomposes low frequencies (node 3) on a new

threshold criteria (1.0 ≤ ratio) to ensure better resolution in the low frequency range, as

shown in Figure 6.9

Figure 6.9: level-4 Wavelet Packet Energy Tree Detector Stage-III: nodes (3) is analyzed; green lines represent the nodes that are going to be analyzed by the tonality analyzer. 6.2 Node Reconstruction

In Figure 6.9 the green line represents the nodes that are selected for

reconstruction. This process involves applying the inverse discrete wavelet transform to

selected wavelet coefficients (selected nodes) along the wavelet tree. The index values of

these nodes are then mapped along the frequency axis based on the bandwidth of each

node. Among the selected nodes, the correlation information of the first two nodes or

nodes with the highest energy are used to estimate tonality, as shown in Figure 6.10, this

is due to the integrity of information these nodes contain.

64

Figure 6.10 Wavelet Energy Tree: The white-arrows showing the two nodes used to calculate our tonality

The correlation information is achieved by performing the auto-correlation

function. One might ask why perform the auto-correlation function? And why not do the

auto-correlation instead? This is because the auto-correlation function varies the lag of

the step size, thus giving us more information of the reconstructed signal. The core

concept of estimating tonality relies on how much the reconstructed nodes are correlated

with its parent. This information is then passed on the tonality estimator discussed in

section 6.2

6.3 Tonality Estimation

Once the desired nodes are reconstructed (explained in Chapter 5) they are sent to

the tonality estimator. It is here that the tonality estimator decides whether the input audio

frame has tone characteristic or noise. These characteristic are accurately determined by

the auto-covariance of the auto-correlation function. To justify this process let’s look at

the characteristics of the auto-correlation function.

65

6.3.1 Auto-Correlation Function

A pure tone has periodic peaks in its auto-correlation function. These peaks

decrease in amplitude as the amount of lag (amount of overlap) is increased. Beyond a

lag of 10 the correlation information repeats itself just like a periodic signal, as shown in

Figure 6.11

Figure 6.11: Auto-correlation Function of a Pure Tone

Conversely, the auto-correlation function of a random process (noise-like) does

not contain periodic peaks, since the there is no correlation information contained in the

signal, as shown in Figure 6.12 for white noise and Figure 6.13 for bandlimited noise.

0 2 4 6 8 10 12 14 16 18 20-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Lag

Sam

ple

Aut

ocor

rela

tion

Sample Autocorrelation Function (ACF)

66

Figure 6.12: Auto-correlation Function of White Noise

Figure 6.13: Auto-correlation Function of Band limited Noise (0-22kHz)

Generally the auto-correlation function tells us how our random signal or

processes changes with respect to the time function. It also determines whether our

process has a periodic component or random ones. So, if the signal were tone like it

0 2 4 6 8 10 12 14 16 18 20-0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Aut

ocor

rela

tion


0 2 4 6 8 10 12 14 16 18 20-0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Aut

ocor

rela

tion


67

would have characteristics similar to Figure 6.11 and noise like would have

characteristics similar to Figure 6.12 and 6.13. A signal in between these characteristics

will exhibit a variation in amplitude of it peaks. To get an estimate of how these auto-

correlation peaks change in time we take the auto-covariance of our generated auto-

correlation function because by definition the auto-covariance of a random process tell us

to what extent its values co-vary [Garcia, A.L, 1994].

It is important to note that auto-covariance is similar to auto-correlations except

that the effects of the means are removed. They are mathematically related by the

following relation:

⎩⎨⎧ +

=+]][[

]][[]][[],[ 2 nxE

mnxEnxEmnnRXX (6.2)

]][[]][[],[].[ mnxEnxEmnnRmnnC XXXX +−+=+ (6.3)

Where RXX is the auto-correlation function and CXX is the auto-covariance.

6.3.2 Auto-Covariance

The nodes that are the most important to us are the top two nodes of the selected

path. It is from these nodes that we estimate our tonality, as discussed in section 6.2. In

Figure 6.14 the blue lines represents the nodes that were used to calculate tonality for

signal having pure-tone characteristics. These type of signal are analyzed using the type-I

analysis technique.

if m ≠ 0

if m = 0

68

Figure 6.14: Energy Tree, where the blue lines represent the nodes from which the tonality value is calculated

6.3.3 Type-I Analysis

There are different areas in the auto-covariance plot which are exploited for the

estimation of tonality. In the pure tone case (Type-I Analysis) the difference in maximum

values of the selected nodes is our tonality estimate, as shown in Figure 6.15.

(b)

Figure 6.15: A 4kHz tone with selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]

69

It is interesting to see that as we add noise to our 4kHz signal, we see that the

peak-difference varies. This variation is directly proportional to the masking effect of the

noise over tone. One can see that as the peak-difference (tonality value) decreases the

effect noise masking tone increases.

When the noise of power -0.9 dB is added to the 4kHz tone the auto-covariance

figure (Figure 6.16 b) has a peak difference of 0.53922

Figure 6.16: A 4kHz tone with -0.9dB white-noise added, selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]

6.3.4 Type-II Analysis

When the input signal has noise-like characteristics, for example a snare crash in

Figure 6.17

(a) (b)

70

Figure 6.17: A Snare Crash

The size of our auto-covariance side-lobe plays a vital role in estimating the

tonality value. It’s the difference between max and min points of the first 10 points that

gives us an estimated tonality value. Figure 6.18 shows three cases: White-Noise, Band-

limited Noise and 1kHz Pure tone.

Figure 6.18a: Auto-Covariance of White-Noise

(a)

time

71

Figure 6.18b: Auto-Covariance of Band-limited 0-22kHz Noise

(b)

Figure 6.18c: Auto-Covariance of Pure-Tone (1kHz)

72

It is important to know that band-limiting the noise defines the side-lobes but they

are not well defined to the extent of a pure tone. Figure 6.19 shows Type-II analysis

(side-lobe auto-covariance) applied to the snare crash example.

First Frame

Figure 6.19: A Snare crash analysis(a) Wavelet Tree, (b) Auto-Covariance The last frame of the snare crash converges to a pure tone auto-covariance, as shown in Figure 6.20

0 5 10 15 20 25 30 35 40 45-3

-2

-1

0

1

2

3AutoCovariance of ACF

0-21 lags of ACF

AC

(a) (b)

Figure 6.20: Snare Crash (Last Frame) Auto-Covariance

0 5 10 15 20 25 30 35 40 45-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4AutoCovariance of ACF

0-21 lags of ACF

AC

73

Interestingly, when observing the snare crash, the tonality estimator switches from

Type-II Analysis to Type-I Analysis, displaying its accuracy in detecting the attack of the

snare hit (noise characteristics) which later becomes tonal.

The tonality estimator switches between these two analysis techniques as the

behavior of the input signal changes and it compiles the overall tonality index. The index

is later mapped to the frequency domain

6.4 Tonality Index (Time-Domain)

Figure 6.21 shows the tonality-index of a test signal (train of 1kHz tone + noise of

power -20dB) in the time domain, it confirms our on/off state of our tone detector by

giving a value of 1 for tone-like behavior and a value of 0 for noise-like behavior. The x-

axis is the number of frames (with 50% overlap of size 1024) and the y-axis shows the

tonality-index. The band-limited noise is generated by applying a low-pass filter

(Fstop=22050, Fpass=9600, fs=44100). The green line is the depth of the wavelet tree

and the blue line is the tonality index.

0 10 20 30 40 50 60 70 80 900

1

2

3

4

5

6

Frames

valu

es 0

-1 to

nalit

y in

dex,

2-5

dep

th o

f wav

elet

tree

Figure 6.21: Tonality Index (Time-Domain) with Input Signal consisting of 1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB

74

The time domain plot of the test signal (train of 1kHz tone + noise of power -20dB) is

shown in Figure 6.22

Figure 6.22: Time-Domain plot of test signal (1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB)

The test signal shown in Figure 6.23 is a train of White-Noise (power -20dB) +

1kHz + Band-limited Noise (power -0.9dB). Figure 6.24 is the time domain

representation of Figure 6.23.

Figure 6.23: Tonality Index (Time-Domain) with Input Signal consisting of white noise (power -20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

Frames

valu

es 0

-1 to

nalit

y in

dex,

2-5

dep

th o

f wav

elet

tree

time

75

Figure 6.24: Time-Domain plot of test signal of white noise (power -20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB) Observing Figure 6.23 it is clear that the detector is tolerant to different powers of noise.

The tonality index is also not effected by bandlimited noise.

6.5 Tonality Index (Frequency-Domain)

Once the wavelet tree is generated the selected nodes are used to trace back to the

parent nodes (as described in Figure 6.6). The tracing back of the nodes stops once they

reach node 1 or node 2. These nodes are then mapped to the frequency domain along with

there calculated tonality. The mapping of the nodes is done by looking at the index value

of each node and assigning it a frequency value based on a bandwidth of which the node

corresponds to. For simplification the chosen cut-off frequency is exactly half of sub-

bands bandwidth. Figure 6.25 is a representation of this with nodes up to 14.

time

76

Figure 6.25: Frequency Map of Wavelet Tree: The red arrows represent the generated path which consist of an array of nodes from which the last node value are taken (blue lines) to map. 6.5.1 Comparison with Model- 2

The frequency axis is further mapped into threshold calculation partitions, whose

widths are roughly 1/3 of a critical band. The partition values can be referred from the

Appendix-I [MPEG standard, ISO11172-3]. Figure 6.26a shows the tonality index of

Model-2 and Figure 6.26b shows the tonality index of the proposed model.

77

Figure 6.26a: Tonality Index – Model 2 (1kHz)

Figure 6.26b: Tonality Index – Proposed Model (1kHz)

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Partitions 1-57

Tona

lity

Val

ue

0 10 20 30 40 50 60

0.4

0.5

0.6

0.7

0.8

0.9

1

Partitions 1-57

Tona

lity

Val

ue

(a)

(b)

78

The error seen from partition zero to partition ten in Figure 6.26b is due to nodes

sent for frequency mapping, rather than one node (1kHz) the detector keeps sending

nodes that have high energy. This can be solved by implementing a better detection

algorithm which retains the knowledge of the energy path and only sends the specific

terminal node.

According to Table 3D-b in the MPEG standard a 1 kHz signal lies in partition 26 which

is seen in Figure 6.26b.

Figure 6.27a is the tonality index of a 4 kHz signal which lies in partition 45

according to Table 3D-b in MPEG standard ISO11172-3 and Figure 6.27b is the tonality

index of the proposed model.


0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Partitions 1-57

Tona

lity

Val

ue

(a)

79

Figure 6.27b: Tonality Index – Proposed Model (4kHz) Figure 6.28a is the tonality index of a 6 kHz signal which lies in partition 50 according to

Table 3D-b in MPEG standard ISO11172-3 and Figure 6.28b is the tonality index of the

proposed model.

0 10 20 30 40 50 600.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Tona

lity

Val

ue

Partitions 1-57

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Partitions 1-57

Tona

lity

Val

ue

(b)

(a)


80

Figure 6.28b: Tonality Index – Proposed Model (6kHz)

Based on the results the proposed model performs well compared to the tonality

measure in psychoacoustic model 2. The error in the tonality index is due to the depth

constraints set during the wavelet decomposition. A depth constraint of 5 limits the

decomposition and frequency resolution of the discrete wavelet packet tree which carries

to the frequency mapping.

One must be careful in scaling the energy of the nodes, as the energy decreases on

decimation. This scaling is taken in to consideration by representing the energy of the

nodes in terms of a percentage value.

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1To

nalit

y V

alue

Partitions 1-57

(b)

81

Chapter 7 Conclusions and Recommendations

The correct estimation of tonality is vital for a perceptual audio coder, since one

can optimally shape the noise for an arbitrary audio signal. In fact the concept of

optimality means reducing as much redundant data needed to express the digital

representation of the audio signal, while maintaining the perceptual transparency or in the

case of low bit rate, minimizing the perceptual disturbance caused by increased

quantization noise.

Optimal noise shaping is achieved by considering the masking model of the

human auditory system which takes into account the relevant components of a signal in

order for it to be encoded. For this reason, masking properties of tonal and non-tonal

(noise) signals are important for coding a signal with high efficiency.

Audio coded using the MPEG standard tonality classification algorithms are

confined due to the block constraints of the Fourier transform which leads us to the

discrete wavelet transform. The discrete wavelet transform used in a tree structure gives

us the flexibility to vary our time-frequency resolution, thus enabling us to map the

frequency spectrum similar to the basilar membrane of the ear.

In our investigation of classifying tonality using wavelet packet analysis, several

interesting discoveries were made. First, it seems logical to approach tonality based on

the correlation information of the audio frame rather than a prediction scheme. The two

types of analysis (Type-I and Type-II) are good estimates of tonality whether our input is

noise or tone like. It is to be noted that the number of samples decrease as a node is

decomposed thus, calculating the tonality estimate using the first and second

decompositions was appropriate.

82

Also, it was found that lower order wavelets with poor cut-off frequency

performed well in detecting tones than ones with higher order and sharp cut-off, since the

detection scheme relies on energy, thus attenuating it would only lead to inaccurate

readings of tonality.

The flaws of the proposed tonality estimator are apparent when the detector

decides which nodes are to be sent for frequency mapping. The detector sends all the

nodes that have high energy to the node frequency mapping module, rather than sending

the terminal node after the detection process.

The concept of splitting nodes based on their energy ratio is an effective method

for detecting tones but to work perfectly requires an additional variable that retains the

knowledge of the energy path and only sends the specific terminal node for frequency

mapping. Comparing, energy differences of the terminal nodes should also be added to

the detector’s detection scheme. This is necessary when two tones lie in the same sub-

band.

On the whole, it can be concluded that our tonality estimation using wavelet

packet analysis performs well compared to the one purposed in the MPEG standard

ISO11172-3. With an accurate mapping of the wavelets frequency response and

improved detection scheme this analysis technique can prove its worth.

83

References:

1) Oppenheim, A.V & Willsky, A.S, “Signal and Systems 2nd edition”, Prentice Hall Signal

Processing Series

2) Selik, M. & Baraniuk, R., “The Impulse Function”, Connexions

3) Calvert, J.B., “Time and Frequency Domain”

4) Langton, C., “Hilbert Transform, Analytic Signal and the Complex Envelop”, Signal

Processing & Simulation Newsletter©

5) Vaidyanathan, P.P, “Multirate Systems and Filter Banks”, Pearson Education

6) Hitachi Denshi, Inc., Operation Manual, Model V-1050F Oscilloscope

7) Dynascan Corporation, Instruction Manual, Function Generator, B&K Precision 3010

8) Eric W. Weisstein et al. "Orthonormal Basis." From MathWorld

9) Coifman, R, Ronald & Wickerhauser, V, Mladen, “Entropy-Based Algorithm for Best

Basis Selection”

10) Matlab Documentation, “Wavelet Packet Analysis”

11) Hellman, R.P, “Asymmetry of masking between noise and tone”, Harvard University,

Cambride, Massachusetts 02138

12) Johnston, J.D, Brandenburg, K, “Second Generation Perceptual Audio Coding: The

Hybrid Coder”

13) Johnston, J.D, “Transform Coding of Audio Signals Using Perceptual Noise” Criteria,

IEEE Journal on Selected Areas in Communications, Vol. 6 (1988), pp. 314-323

14) Scharf, B, “Chapter 5 of Foundations of Modern Auditory Theory”, New York,

Academic Press, 1970.

15) Garcia, A.L, “Probability and Random Processes for Electrical Engineering 2nd Edition”

16) Ferreira, A.J.S, “Tonality Detection in Perceptual Coding in Audio”, AT&T Bell-

Laboratories, New Jersey, USA

17) Zwicker, E, Fastl, H, “Psychoacoustics, Facts and Models”, Springer-Verlag, 1990

84

18) Blauert, J, Spatial Hearing, The MIT Press, 1993

19) Moore, C.J.B, An Introduction to the Psychology of Hearing, Academic Press, 1982

20) Jayant, N.S, Noll, P, “Digital Coding of Waveforms”, Prentice-Hall, 1984.

21) Wickerhauser, V.M, Coifman, R.R, “Entropy Based Algorithms for Best Basis Selection”

22) Shlomo Dubnov, “Generalization of Spectral Flatness Measure for Non-Gaussian Linear

Processes”

23) Johnston, J.D, “Estimation of Perceptual Entropy Using Noise Masking Criteria”

24) Learned, E.R, Karl, W.C, Willsky, A.S, “Wavelet Packet Based Transient Signal

Classification

25) Chen, Y.L, Ching, C.H, Lin, K.W, “Robust Block Switching Decision for Transform-

based Audion Coder”

26) Erne, M., Moschytz, G., Faller, C., “Best Wavelet-Packet Bases for Audio Coding using

Perceptual and Rate-Distortion Criteria

27) Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”

28) Johnston, J.D, “United States Patent, Patent Number: 5,267,938”

29) MPEG Standard, “ISO11172-3”

30) Robi, P., “The Story of Wavelets”, Rowan University

85

Appendix-I

Table 3-D.3b. Calculation Partition Table This table is valid at a sampling rate of 44.1.0 kHz.

Index wlow whigh bval minval TMN 1 1 1 0.00 0.0 24.5 2 2 2 0.43 0.0 24.5 3 3 3 0.86 0.0 24.5 4 4 4 1.29 20.0 24.5 5 5 5 1.72 20.0 24.5 6 6 6 2.15 20.0 24.5 7 7 7 2.58 20.0 24.5 8 8 8 3.01 20.0 24.5 9 9 9 3.45 20.0 24.5 10 10 10 3.88 20.0 24.5 11 11 11 4.28 20.0 24.5 12 12 12 4.67 20.0 24.5 13 13 13 5.06 20.0 24.5 14 14 14 5.42 20.0 24.5 15 15 15 5.77 20.0 24.5 16 16 16 6.11 17.0 24.5 17 17 19 6.73 17.0 24.5 18 20 22 7.61 15.0 24.5 19 23 25 8.44 10.0 24.5 20 26 28 9.21 7.0 24.5 21 29 31 9.88 7.0 24.5 22 32 34 10.51 4.4 25.0 23 35 37 11.11 4.5 25.6 24 38 40 11.65 4.5 26.2 25 41 44 12.24 4.5 26.7 26 45 48 12.85 4.5 27.4 27 49 52 13.41 4.5 27.9 28 53 56 13.94 4.5 28.4 29 57 60 14.42 4.5 28.9 30 61 64 14.86 4.5 29.4 31 65 69 15.32 4.5 29.8 32 70 74 15.79 4.5 30.3 33 75 80 16.26 4.5 30.8 34 81 86 16.73 4.5 31.2 35 87 93 17.19 4.5 31.7 36 94 100 17.62 4.5 32.1 37 101 108 18.05 4.5 32.5 38 109 116 18.45 4.5 32.9 39 117 124 18.83 4.5 33.3 40 125 134 19.21 4.5 33.7 41 135 144 19.60 4.5 34.1 42 145 155 20.00 4.5 34.5 43 156 166 20.38 4.5 34.9 44 167 177 20.74 4.5 35.2 45 178 192 21.12 4.5 35.6 46 193 207 21.48 4.5 36.0 47 208 222 21.84 4.5 36.3 48 223 243 22.20 4.5 36.7 49 244 264 22.56 4.5 37.1 50 265 286 22.91 4.5 37.4 51 287 314 23.26 4.5 37.8 52 315 342 23.60 4.5 38.1 53 343 371 23.95 4.5 38.4 54 372 401 24.30 4.5 38.8 55 402 431 24.65 4.5 39.1 56 432 469 25.00 4.5 39.5 57 470 513 25.33 3.5 39.8

86

Appendix-II

Matlab Files:

1) Main Function (Encoder)

% Vaibhav Chhabra % Thesis - Main Fucntion (Encoder) % Last Modified: 04/12/2005 % % This program sets up all the variables and frame work to encode and decode the audio stream % the function CalculateAll (Psychoacoustic Model) calls modules that perform % the SMR calculation function new_codec() clear all; clc; global FRAMES frame_count s iblen_index iblen r f Fs earlyblock prevblock SMR_interp scalebits = 4; bitrate = 128000; N = 2048; % framelength original_filename = sprintf('Sound 6.wav'); coded_filename = sprintf('encoded_file.enc'); decoded_filename = sprintf('decoded_file.wav'); [Y,Fs,NBITS] = wavread(original_filename); tone = Y; num_subbands = floor(fftbark(N/2,N/2,Fs))+1; bits_per_frame = floor(((bitrate/Fs)*(N/2)) - (scalebits*num_subbands)); % Enframe Audio tonality_index=1; un_index=1; FRAMES = enframe(tone,N,N/2); r = zeros(3,1024); f = zeros(3,1024); earlyblock = []; prevblock = []; s=zeros(1,512); iblen = 512; % Write File Header fid = fopen(coded_filename,'w'); fwrite(fid, Fs, 'ubit16'); % Sampling Frequency fwrite(fid, N, 'ubit12'); % Frame Length fwrite(fid, bitrate, 'ubit18'); % Bit Rate fwrite(fid, scalebits, 'ubit4'); % Number of Scale Bits per Sub-Band fwrite(fid, length(FRAMES(:,1)), 'ubit26'); % Number of frames % Computations for frame_count=1:length(FRAMES(:,1)) if (mod(frame_count,2) == 0) | (mod(frame_count,2)==1) outstring = sprintf('NOW ENCODING FRAME %i of %i', frame_count, length(FRAMES(:,1))); disp(outstring);

87

end fft_frame = fft(FRAMES(frame_count,:)); if fft_frame == zeros(1,N) Gain = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); bit_alloc = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); else for iblen_index = 0:1 s = FRAMES(frame_count,((iblen_index*iblen)+1:(iblen_index*iblen)+iblen)); CalculateAll() end % End Main Loop New_FFT2 = SMR_interp; if frame_count == 25 figure; semilogx([0:(Fs/2)/(N/2):Fs/2-1],New_FFT2); title('SMR');xlabel('Frequency');ylabel('dB') figure; stem(allocate(New_FFT2,bits_per_frame,N,Fs)); title('Bits perceptually allocated');xlabel('Critical Bands');ylabel('Bits Allocated') end bit_alloc = allocate(New_FFT2,bits_per_frame,N,Fs); [Gain,Data] = p_encode(mdct(FRAMES(frame_count,:)),Fs,N,bit_alloc,scalebits); end % end of computations % Write Audio Data to File qbits = sprintf('ubit%i', scalebits); fwrite(fid, Gain, qbits); fwrite(fid, bit_alloc, 'ubit4'); for i=1:25 indices = find((floor(fftbark([1:N/2],N/2,Fs))+1)==i); qbits = sprintf('ubit%i', bit_alloc(i)); % bits(floor(fftbark(i,framelength/2,48000))+1) if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) fwrite(fid, Data(indices(1):indices(end)) ,qbits); end end end % end of frame loop fclose(fid); % RUN DECODER disp('Decoding...'); p_decode(coded_filename,decoded_filename); disp('Okay, all done!'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % FFTBARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=fftbark(bin,N,Fs) % b=fftbark(bin,N,Fs) % Converts fft bin number to bark scale

88

% N is the fft length % Fs is the sampling frequency f = bin*(Fs/2)/N; b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ENFRAME % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function f=enframe(x,win,inc) %ENFRAME split signal up into (overlapping) frames: one per row. F=(X,WIN,INC) % % F = ENFRAME(X,LEN) splits the vector X up into % frames. Each frame is of length LEN and occupies % one row of the output matrix. The last few frames of X % will be ignored if its length is not divisible by LEN. % It is an error if X is shorter than LEN. % % F = ENFRAME(X,LEN,INC) has frames beginning at increments of INC % The centre of frame I is X((I-1)*INC+(LEN+1)/2) for I=1,2,... % The number of frames is fix((length(X)-LEN+INC)/INC) % % F = ENFRAME(X,WINDOW) or ENFRAME(X,WINDOW,INC) multiplies % each frame by WINDOW(:) % Copyright (C) Mike Brookes 1997 % % Last modified Tue May 12 13:42:01 1998 % % VOICEBOX home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This program is free software; you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation; either version 2 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You can obtain a copy of the GNU General Public License from % ftp://prep.ai.mit.edu/pub/gnu/COPYING-2.0 or by writing to % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% nx=length(x); nwin=length(win); if (nwin == 1) len = win; else len = nwin; end if (nargin < 3) inc = len; end

89

nf = fix((nx-len+inc)/inc); f=zeros(nf,len); indf= inc*(0:(nf-1)).'; inds = (1:len); f(:) = x(indf(:,ones(1,len))+inds(ones(nf,1),:)); if (nwin > 1) w = win(:)'; f = f .* w(ones(nf,1),:); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SCHROEDER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function m=Schroeder(freq,spl,downshift) % Calculate the Schroeder masking spectrum for a given frequency and SPL N = 2048; f_kHz = [1:48000/N:48000/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(-3))*(f_kHz).^4; f_Hz = f_kHz*1000; % Schroeder Spreading Function dz = bark(freq)-bark(f_Hz); mask = 15.81 + 7.5*(dz+0.474) - 17.5*sqrt(1 + (dz+0.474).^2); New_mask = (mask + spl - downshift); m = New_mask; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % BARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=bark(f) % b=bark(f) % Converts frequency to bark scale % Frequency should be specified in Hertz b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ALLOCATE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=allocate(y,b,N,Fs) % x=allocate(y,b,N) % Allocates b bits to the 25 subbands % of y (a length N/2 MDCT, in dB SPL) bits(floor(bark( (Fs/2)*[1:N/2]/(N/2) )) +1) = 0; for i=1:N/2 bits(floor(bark( (Fs/2)*i/(N/2) )) +1) = max(bits(floor(bark( (Fs/2)*i/(N/2) )) +1) , ceil( y(i)/6 )); end

90

indices = find(bits(1:end) < 2); bits(indices(1:end)) = 0; % NEED TO CALCULATE SAMPLES PER SUBBAND n = 0:N/2-1; f_Hz = n*Fs/N; f_kHz = f_Hz / 1000; A_f = 3.64*f_kHz.^-.8 - 6.5*exp(-.6*(f_kHz-3.3).^2) + 1e-3*f_kHz.^4; % *** Threshold in Quiet z = 13*atan(0.76*f_kHz) + 3.5*atan((f_kHz/7.5).^2); % *** bark frequency scale crit_band = floor(z)+1; num_crit_bands = max(crit_band); num_crit_band_samples = zeros(num_crit_bands,1); for i=1:N/2 num_crit_band_samples(crit_band(i)) = num_crit_band_samples(crit_band(i)) + 1; end x=zeros(1,25); bitsleft=b; [blah,i]=max(bits); while bitsleft > num_crit_band_samples(i) [blah,i]=max(bits); x(i) = x(i) + 1; bits(i) = bits(i) - 1; bitsleft=bitsleft-num_crit_band_samples(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_ENCODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Quantized_Gain,quantized_words]=p_encode(x2,Fs,framelength,bit_alloc,scalebits) for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); Gain(i) = 2^(ceil(log2((max(abs(x2(indices(1):indices(end))+1e-10)))))); if Gain(i) < 1 Gain(i) = 1; end x2(indices(1):indices(end)) = x2(indices(1):indices(end)) / (Gain(i)+1e-10); Quantized_Gain(i) = log2(Gain(i)); end for i=1:length(x2) quantized_words(i) = midtread_quantizer(x2(i), max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)+1e-10); % 03/20/03 end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_QUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_quantizer(x,R) Q = 2 / (2^R - 1);

91

q = quant(x,Q); s = q<0; ret_value = uint16(abs(q)./Q + s*2^(R-1)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_DEQUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_dequantizer(x,R) sign = (2 * (x < 2^(R-1))) - 1; Q = 2 / (2^R - 1); x_uint = uint32(x); x = bitset(x_uint,R,0); x = double(x); ret_value = sign * Q .* x; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_DECODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Fs=p_decode(coded_filename,decoded_filename) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE HEADER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% fid = fopen(coded_filename,'r'); Fs = fread(fid,1,'ubit16'); % Sampling Frequency framelength = fread(fid,1,'ubit12'); % Frame Length bitrate = fread(fid,1,'ubit18'); % Bit Rate scalebits = fread(fid,1,'ubit4' ); % Number of Scale Bits per Sub-Band num_frames = fread(fid,1,'ubit26'); % Number of frames for frame_count=1:num_frames %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE CONTENTS % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% qbits = sprintf('ubit%i', scalebits); gain = fread(fid,25,qbits); bit_alloc = fread(fid,25,'ubit4'); for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) qbits = sprintf('ubit%i', bit_alloc(i)); InputValues(indices(1):indices(end)) = fread(fid, length(indices) ,qbits); else InputValues(indices(1):indices(end)) = 0; end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DEQUANTIZE VALUES %

92

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:length(InputValues) if InputValues(i) ~= 0 if max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0) ~= 0 InputValues(i) = midtread_dequantizer(InputValues(i),... max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)); end end end for i=1:25 gain2(i) = 2^gain(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % APPLY GAIN % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); InputValues(indices(1):indices(end)) = InputValues(indices(1):indices(end)) * gain2(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % INVERSE MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x2((frame_count-1)*framelength+1:frame_count*framelength) = imdct(InputValues(1:framelength/2)); end status = fclose(fid); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % RECOMBINE FRAMES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x3 = zeros(1,(length(x2)-1)/2+1); for i=0:0.5:floor(length(x2)/(2*framelength))-1 x3(i*framelength+1 : (i+1)*framelength) = x3(i*framelength+1 : (i+1)*framelength) + x2((2*i)*framelength+1 : (2*i+1)*framelength); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % WRITE FILE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% wavwrite(x3/2,Fs,decoded_filename); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = mdct(x) x=x(:); N=length(x); n0 = (N/2+1)/2; wa = sin(([0:N-1]'+0.5)/N*pi);

93

y = zeros(N/2,1); x = x .* exp(-j*2*pi*[0:N-1]'/2/N) .* wa; X = fft(x); y = real(X(1:N/2) .* exp(-j*2*pi*n0*([0:N/2-1]'+0.5)/N)); y=y(:); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % IMDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = imdct(X) X=X(:); N = 2*length(X); ws = sin(([0:N-1]'+0.5)/N*pi); n0 = (N/2+1)/2; Y = zeros(N,1); Y(1:N/2) = X; Y(N/2+1:N) = -1*flipud(X); Y = Y .* exp(j*2*pi*[0:N-1]'*n0/N); y = ifft(Y); y = 2*ws .* real(y .* exp(j*2*pi*([0:N-1]'+n0)/2/N)); function CalculateAll() % Calculate All Variables global iblen Fs earlyblock prevblock newblock r f global Y s r_hat f_hat cw e cb en cbb spreadplot epart npart global tbb SNRb bcb nbb nbw thrw SMR THRw SMR2 Fs iblen_index newblock = s; % Step 1 - Reconstruct 1024 Samples of the Input Signal [earlyblock, prevblock, newblock, s] = ... Reconstruct(earlyblock, prevblock, newblock, iblen); if length(s) == 1024 % Step 2 - Calculates the magnitude and phase using FFT [r,f] = Spectrum(s, r, f); % Step 3 - Calculates the energy and unpredictability in the threshold calculation partitions [e] = Energy_Unpredictability(r); % Step 4 - Convolves the partioned energy with the spreading function load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; [en] = Spread(e); % Step 5 - Calculate tonality index tbb=MAIN(s);

94

% Step 6 - Calculate the Required SNR in Each Partition SNRb = CalcSNR(tbb); % Step 7 - Calculate the Power Ratio bcb = CalcPwrRatio(SNRb); % Step 8 - Calculation of Actual Energy Threshold, nbb nbb = CalcNbb(en, bcb); % Step 9 - Spread the Threshold Energy over FFT Lines, Yielding nbw nbw = CalcNb(nbb); % Step 10 - Include Absolute Thresholds, Yielding the Final Energy Threshold of Audibility, thrw thrw = CalcThresh(nbw); % Step 11 - Calculate the Signal-to-Mask Ratios, SMRn [SMR(iblen_index+1,:),epart,npart] = CalcSMR(r, thrw); end %-------------------------------------------------------------------------------------------------------------------------- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Purposed Model - Calculations begin %Spreading Function (has been calculated and stored in the 'tables.mat' for %more information refer p129 of standard % load ('tables.mat'); % Tables and Spreading Functions for sampling rate 44.1Khz %-------------------------------------------------------------------------------------------------------------------------- %Reconstruct 1024 samples of the input signal function [earlyblock, prevblock, newblock, s] = ... Reconstruct(earlyblock, prevblock, newblock, iblen); if iblen >= 512, block = [prevblock newblock]; else block = [earlyblock prevblock newblock]; end earlyblock=prevblock; prevblock=newblock; if length(block) >= 1024 s = block(end-1023:end); % Newest 1024 samples else s = zeros(1,512); end %-------------------------------------------------------------------------------------------------------------------------- %Calculate the complex spectrum of the input signal function [r,f] = Spectrum(s, r, f) global frame_count iblen_index Fs sw = s .* (0.5 - 0.5*cos((2*pi*([1:1024]-0.5))/1024)); % Hann Window r(1:2,:) = r(2:3,:); % Shift previous two magnitude values f(1:2,:) = f(2:3,:); % Shift previous two phase values r(3,:) = abs(fft(sw)); f(3,:) = angle(fft(sw)); mag=r(3,:); if frame_count ==25 && iblen_index ==1 figure freq = (Fs/2)*(1:513)/1024;

95

plot(freq,mag(1:513)); title('Magnitude Component of the fft') xlabel('Frequency') ylabel('Magnitude "f"') figure plot(f(3,:)); title('Phase Component of the fft') xlabel('Frequency') ylabel('Phase "r"') end %-------------------------------------------------------------------------------------------------------------------------- %Calculate the Energy and Unpredictabilty using the (Threshold) Calculation %Partition Table D.3b p134 standard function [e] = Energy_Unpredictability(r) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; % Applying smooth function to AbsTable w_lo(1:57) = Table3D3b([1:57],1); %Median Bark Values of the Partition w_hi(1:57) = Table3D3b([1:57],2); for w=1:57 e(w) = sum((r(3,w_lo(w):w_hi(w))).^2); end if frame_count==25 && iblen_index==1 for i=1:57, spreadplot(i,:) = sprdngf(i,:)*e(i); end figure;plot(spreadplot', 'r:'); hold on; plot(e, 'b'); hold off; title('Energy in each partition using Table D.3b p134 of standard'); ylabel('Energy "e" (blue) and Spreading Functions (red)'); xlabel('Partitions 1-57') end %-------------------------------------------------------------------------------------------------------------------------- %Convolve the partitioned energy and unpredictabilty with the spreading function function [en] = Spread(e) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; ecb=zeros(1,57); ct=zeros(1,57); for i=1:57 for j=1:57 ecb(i) = ecb(i) + (e(j) * sprdngf(j,i)); end end %normalizing rnormb = 1 ./ (sum(sprdngf,1)); %normalizing coefficient used to normailize ecb en = ecb.*rnormb; if frame_count==25 && iblen_index==1 for i=1:57, spreadplot(i,:) = sprdngf(i,:)*e(i); end figure;plot(spreadplot', 'r:'); hold on;plot(ecb);hold off title('Convolved Partitioned Energy with Spreading Function ecb') xlabel('Partitions 1-57');ylabel('ecb (blue) and Spreading Functions (red)')

96

for i=1:57, spreadplot(i,:) = sprdngf(i,:)*ecb(i); end figure;plot(spreadplot', 'r:'); hold on;plot(en);hold off title('Convolved Partitioned Energy with Spreading Function (normalized) enb') xlabel('Partitions 1-57');ylabel('enb (blue) and Spreading Function (red)') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the SNR in each partition function SNRb = CalcSNR(tbb) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; NMTb = 5.5; % Downshift for Noise Masking Tone (in dB) SNRb = max(Table3D3b(:,4)', tbb .* Table3D3b(:,5)'+(1-tbb)*NMTb); if frame_count==25 && iblen_index==1 figure plot(SNRb) title('SNR in each partition') xlabel('Partitions 1-57');ylabel('SNR') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the power ratio function bcb = CalcPwrRatio(SNRb) global frame_count iblen_index bcb = 10.^(-SNRb/10); if frame_count==25 && iblen_index==1 figure plot(bcb) title('Power Ratio bcb');xlabel('Partition 1-57');ylabel('bcb') end %-------------------------------------------------------------------------------------------------------------------------- %Calculation of actual energy threshold, nb function nbb = CalcNbb(tbb, bcb) global frame_count iblen_index load ('tables.mat'); abstable=smooth(abstable,9,'moving')'; nbb = tbb .* bcb; if frame_count==25 && iblen_index==1 figure;plot(nbb) title('Actual Energy threshold nbb');xlabel('Partition 1-57');ylabel('nbb (blue) and Spreading Fucntion (red)') end %-------------------------------------------------------------------------------------------------------------------------- %Spread the threshold energy over FFT lines, yeilding nb(w) function nb = CalcNb(nbb) global frame_count iblen_index Fs load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; w_lo(1:57) = Table3D3b([1:57],1); w_hi(1:57) = Table3D3b([1:57],2);

97

for b=1:57 for w=1:513 if ((w>=w_lo(b))&(w<=w_hi(b))) nb(w) = nbb(b)/(w_hi(b)-w_lo(b)+1); end; end end % TRANSFORM FFT'S TO SPL VALUES fftmax = 471.4874; % max(abs(fft(1kHz tone)))... defined as 96dB %586.7143 nb = 96 +20*log10(abs(nb)/fftmax); if frame_count==25 && iblen_index==1 figure % freq = (Fs/2)*(1:513)/1024; semilogx(nb) title('Spread the threshold energy over FFT lines, yielding nb(w)') xlabel('Frequency'); ylabel('dB') end %-------------------------------------------------------------------------------------------------------------------------- %Include absolute threshold, yeilding the final energy threshold of %audibility, thrw function thrw = CalcThresh(nb) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; thrw = max(nb, abstable); if frame_count==25 && iblen_index==1 figure semilogx(thrw) title('Absolute Thresholds, yielding final energy threshold of audibility thrw') xlabel('Frequency');ylabel('dB') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the Signal-to-Mask ration (SMR) function [SMR_interp,epart,npart] = CalcSMR(r, thrw) global SMR_interp frame_count iblen_index w_low=1; for i=1:31 epart(i)=sum((r(3,w_low:w_low+16)).^2); if i < 13 npart(i) = sum(thrw(w_low:w_low+16)); else npart(i) = min(thrw(w_low:w_low+16)) * 17; end w_low=w_low+16; end SMR = 10 * log10(epart./npart); SMR_interp=interp(SMR,33); SMR_interp=real(SMR_interp); SMR_interp=[SMR_interp SMR_interp(1023)]; %-------------------------------------------------------------------------------------------------------------------------- 2) Main Function (Tonality Index Calculation)

98

% Vaibhav Chhabra % Thesis - Main Fucntion (Tonality Index Calculation) % Last Modified: 04/12/2005 % % This program sends the frame of audio for wavelet analysis and scales the % tonality index (frequency), it then converts the frequency axis to bin % values and maps it to the partition table of the MPEG ISO11172-3 % Table-3D.b function [tbb]=MAIN(s) length_block=length(s); Fs=44100; global frame_count frames length_block diff tin tIndex_f if length(s) == 1024 tIndex_f=zeros(1,22050); % sending input block for wavelet analysis [rcfs,count,gNodes,wpt,tn]=WPDALGb(s); % scaling the tonality index (frequency axis) tIndex_f=tIndex_f./20; %converting frequency to bin values for f=1:22050 bin(f)=(f*length_block)/(Fs/2); end storeTINDEX=zeros(22050,2); storeTINDEX(:,1)=bin; storeTINDEX(:,2)=tIndex_f; load tables.mat % map to patition table in MPEG standart ISO11172-3 Table 3D-b tIndex_p=zeros(57,1); tIndex_p(1)=storeTINDEX(22,2); tIndex_p(2)=storeTINDEX(44,2); tIndex_p(3)=storeTINDEX(65,2); tIndex_p(4)=storeTINDEX(87,2); tIndex_p(5)=storeTINDEX(108,2); tIndex_p(6)=storeTINDEX(130,2); tIndex_p(7)=storeTINDEX(151,2); tIndex_p(8)=storeTINDEX(174,2); tIndex_p(9)=storeTINDEX(195,2); tIndex_p(10)=storeTINDEX(216,2); tIndex_p(11)=storeTINDEX(237,2); tIndex_p(12)=storeTINDEX(259,2); tIndex_p(13)=storeTINDEX(280,2); tIndex_p(14)=storeTINDEX(302,2); tIndex_p(15)=storeTINDEX(323,2); tIndex_p(16)=storeTINDEX(345,2); tIndex_p(17)=storeTINDEX(367,2); tIndex_p(18)=storeTINDEX(431,2); tIndex_p(19)=storeTINDEX(496,2);

99

tIndex_p(20)=storeTINDEX(560,2); tIndex_p(21)=storeTINDEX(625,2); tIndex_p(22)=storeTINDEX(690,2); tIndex_p(23)=storeTINDEX(754,2); tIndex_p(24)=storeTINDEX(819,2); tIndex_p(25)=storeTINDEX(883,2); tIndex_p(26)=storeTINDEX(969,2); tIndex_p(27)=storeTINDEX(1056,2); tIndex_p(28)=storeTINDEX(1142,2); tIndex_p(29)=storeTINDEX(1230,2); tIndex_p(30)=storeTINDEX(1314,2); tIndex_p(31)=storeTINDEX(1400,2); tIndex_p(32)=storeTINDEX(1508,2); tIndex_p(33)=storeTINDEX(1615,2); tIndex_p(34)=storeTINDEX(1745,2); tIndex_p(35)=storeTINDEX(1874,2); tIndex_p(36)=storeTINDEX(2025,2); tIndex_p(37)=storeTINDEX(2175,2); tIndex_p(38)=storeTINDEX(2348,2); tIndex_p(39)=storeTINDEX(2520,2); tIndex_p(40)=storeTINDEX(2692,2); tIndex_p(41)=storeTINDEX(2908,2); tIndex_p(42)=storeTINDEX(3123,2); tIndex_p(43)=storeTINDEX(3360,2); tIndex_p(44)=storeTINDEX(3596,2); tIndex_p(45)=storeTINDEX(3835,2); tIndex_p(46)=storeTINDEX(4156,2); tIndex_p(47)=storeTINDEX(4479,2); tIndex_p(48)=storeTINDEX(4802,2); tIndex_p(49)=storeTINDEX(5254,2); tIndex_p(50)=storeTINDEX(5707,2); tIndex_p(52)=storeTINDEX(6783,2); tIndex_p(53)=storeTINDEX(7386,2); tIndex_p(54)=storeTINDEX(8011,2); tIndex_p(55)=storeTINDEX(8657,2); tIndex_p(56)=storeTINDEX(9303,2); tIndex_p(57)=storeTINDEX(10121,2); % condition for setting the index values to zero (noise detection) for z=1:length(tIndex_p) if (tIndex_p(z) <= 18) tIndex_p(z)=0; end end % applying the spreading function to the tonality index tbb=zeros(1,57); for i=1:57 for j=1:57 tbb(i) = tbb(i) + (tIndex_p(j) * sprdngf(j,i)); end end tbb=tbb./max(tbb); end % end Calculations

100

% Scaling Tonality Index (Time Domain) tIndex=zeros(size(diff)); if (diff(:,1)==zeros) tIndex(:,1)=diff(:,1); else tIndex(:,1)=diff(:,1); tIndex(:,1)=diff(:,1)./max(diff(:,1)); end tIndex(:,2)=diff(:,2)./1000; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reconstruct block %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [containerblock, prevblock, newblock, s] = reconstruct(containerblock, prevblock, newblock, iblen) % global iblen containerblock newblock prevblock if iblen >= 512, block = [prevblock newblock]; else block = [containerblock prevblock newblock]; end containerblock=prevblock; prevblock=newblock; if length(block) >= 1024 s = block(end-1023:end); % Newest 1024 samples else s = zeros(1,512); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

3) Wavelet Packet Analysis

% Vaibhav Chhabra % Thesis - Wavelet Packet Analysis Fucntion % Last Modified: 04/12/2005 % % This program analysis the frame of audio using the discrete wavelet % packet tree. It detects the nodes with high energy and sends it to the % node reconstruction module function [rcfs,count,gNodes,wpt,tn]=WPDALGb(block) global frame_count frames diff tin tIndex_f % Initializing variables PN_E=100; PN_AA_ratio=[]; PN_DD_ratio=[]; PN_AD_ratio=[]; PN_DA_ratio=[]; store_energy=[]; store_d=[]; store_child_node=zeros(1,2);

101

store_energy=PN_E; % Doing a level 1 decomposition using Daubechies 1 wavelet wpt=wpdec(block,1,'db1','shannon'); [d,tn]=get(wpt,'depth','tn'); % getting depth and terminal nodes PN=tn(1); E=wenergy(wpt); % getting energy of terminal nodes Eratio=zeros(length(E)); % Initializing pointers SN_AA=1; SN_DD=2; SN_AD=[]; SN_DA=[]; var_block=var(block); Eratio=ratio(store_energy,E); store_energy=E; E_diff=abs(E(end-1)-E(end)); % Setting pointers PN_AA_ratio=Eratio(tn(1)); PN_DD_ratio=Eratio(tn(2)); store_Enodes_tn=[SN_AA SN_DD]; if (E(1) > E(2)) theRightPath=[1]; else theRightPath=[2]; end % Start detection scheme while (((1.0<=PN_AA_ratio) & PN_AA_ratio<2.4) | ((1.0<=PN_DD_ratio) & PN_DD_ratio<2.4)| ((1.0<=PN_AD_ratio) & PN_AD_ratio<2.4)| ((1.0<=PN_DA_ratio) & PN_DA_ratio<2.4) ) & (d<5) & (E_diff>0) check_flag=0; if ((1.0<=PN_AA_ratio) & PN_AA_ratio<2.4) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_AA); % split node [CN_A,CN_D]=Cnode(SN_AA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A > tn(end)) & (store_d<d) Enodes=E(end-1:end); Pnode_energy=E(end-1)+E(end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; E_diff=abs(E(end-1)-E(end)); end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2);

102

[PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_DD_ratio) & (PN_DD_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_DD); [CN_A,CN_D]=Cnode(SN_DD,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (tn(end) > tn(1)) & (store_d<=d) & (d==2) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end if (7<CN_A) & (store_d==d) & (CN_A<14) Pnode_energy=E(end-3)+E(end-2); Enodes=E(end-3:end-2); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_DA_ratio=Eratio(1); PN_DD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_AD_ratio) & (PN_AD_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_AD); [CN_A,CN_D]=Cnode(SN_AD,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt);

103

store_Enodes_tn=[CN_A CN_D]; if CN_A < CN_D & (store_d < d) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_DA_ratio) & (PN_DA_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_DA); [CN_A,CN_D]=Cnode(SN_DA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A < CN_D) & (store_d<=d) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_DA_ratio=Eratio(1); PN_DD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if (E(1) > E(2:end))==ones(1,length(E)-1) & (1.0<=PN_AA_ratio) & (E_diff>0) CN_A=tn(1);

104

SN_AA=CN_A; Pnode_energy=E(1); wpt=wpsplt(wpt,SN_AA); [CN_A,CN_D]=Cnode(SN_AA,store_child_node); SN_AA=CN_A; SN_AD=CN_D; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); Enodes=E(2:3); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); if (store_Enodes(1) > store_Enodes(2)) theRightPath=[ theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end end % end while-I % Start detection scheme for low frequencies while ((PN_AA_ratio<1.0) & (PN_DD_ratio>2.4)) & (d<5) & (E_diff>0) if (PN_AA_ratio<1.0) & (PN_DD_ratio>2.4) & (d<5) check_flag=0; wpt=wpsplt(wpt,SN_AA); [CN_A,CN_D]=Cnode(SN_AA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A > tn(end)) & (store_d<d) Enodes=E(end-1:end); Pnode_energy=E(end-1)+E(end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; E_diff=abs(E(end-1)-E(end)); end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end

105

end end % end while-II % giving data (theRightPath) to generate nodes if its Energy Difference meets the criteria if ((E_diff>0) ) [gNodes,count,rcfs]=generateNodes(wpt,theRightPath,store_Enodes,store_Enodes_tn); else odd_Nodes_index=find(mod(tn,2)==1); gNodes=tn(odd_Nodes_index); count=length(gNodes); rcfs=[]; store_Enodes=[]; store_Enodes_tn=[]; tin=zeros(1,22050); [gNodes,count,rcfs,diff]=generateNodes(wpt,theRightPath,store_Enodes,store_Enodes_tn); end % end main %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Pointer Update Function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); if (diff(tn)==ones(length(tn)-1,1)) & (tn(end) > tn(1)) & (check_flag==0) SN_DA=CN_A; SN_DD=CN_D; check_flag=1; end if (1.0<=PN_AA_ratio) & (1.0<=PN_AD_ratio) & (tn(1) < tn(end)) & (check_flag==0) & (d > 3) & (PN_AA_ratio<4) & (PN_AD_ratio<4) & (CN_A < 10) CN_A=tn(1); CN_D=tn(1)+1; SN_AA=CN_A; SN_AD=CN_D; check_flag=1; end if (10 <CN_A) & (check_flag==0) & (CN_A < 14) & (d >3) SN_DA=CN_A; SN_DD=CN_D; check_flag=1; end if (PN==1) & (CN_A==3) SN_AA=CN_A; SN_AD=CN_D; check_flag=1; end

106

if (PN == 2) & (CN_A==5) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 6) & (CN_A==13) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 5) & (CN_A==11) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 4) & (CN_A==9) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 3) & (CN_A==7) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 6) & (CN_A==13) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 14) & (CN_A==29) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 13) & (CN_A==27) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 12) & (CN_A==25) SN_DD=CN_D; SN_DA=CN_A; end

107

if (PN == 11) & (CN_A==23) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 7) & (CN_A==15) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 8) & (CN_A==17) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 9) & (CN_A==19) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 10) & (CN_A==21) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 15) & (CN_A==31) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 16) & (CN_A==33) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 17) & (CN_A==35) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 18) & (CN_A==37) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 19) & (CN_A==39) SN_AD=CN_D;

108

SN_AA=CN_A; end if (PN == 20) & (CN_A==41) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 21) & (CN_A==43) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 22) & (CN_A==45) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 23) & (CN_A==47) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 24) & (CN_A==49) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 25) & (CN_A==51) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 26) & (CN_A==53) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 27) & (CN_A==55) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 28) & (CN_A==57) SN_DD=CN_D; SN_DA=CN_A;

109

end if (PN == 29) & (CN_A==59) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 30) & (CN_A==61) SN_DD=CN_D; SN_DA=CN_A; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Energy Ratio %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Eratio=ratio(store_energy,E) for i=1:length(store_energy) for j=1:length(E) Eratio(i,j)=store_energy(i)/E(j); end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Child Nodes from Parent Node %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [CN_A,CN_D]=Cnode(PN,store_child_node) CN=(PN*2)+1; container_child_node_A(1,1)=CN; container_child_node_A(1,2)=CN+1; CN_A=CN; CN_D=CN+1; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Parent Nodes of Updated Child Nodes %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function PN=Pnode(CN_A) if (mod(CN_A,2)==0) CN_A=CN_A-1; end PN=((CN_A-1)/2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Generate Nodes that will be sent to the Node Reconstructor %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [gNodes,count,rcfs,diff]=generateNodes(wpt,tn,store_Enodes,store_Enodes_tn) global frame_count tin tIndex_f clear gNodes diff odd_Nodes = tn(find(mod(tn,2)==1))'; count_gn = 0; temp=[]; rcfs_nodes=[]; gNodes=[]; rcfs=[]; check_flag=0; for path=1:length(tn)

110

HE_N=tn(path); check=isempty(store_Enodes); if (check==0) if (length(tn)==1) pn_HE_N=Pnode(HE_N); temp=[pn_HE_N]; if (rcfs_nodes==temp) rcfs_nodes=[rcfs_nodes]; else rcfs_nodes=[temp,rcfs_nodes]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); check_flag=1; end if (HE_N==1) | (HE_N==2) temp=HE_N; if (rcfs_nodes==temp) rcfs_nodes=[rcfs_nodes]; else rcfs_nodes=[temp,rcfs_nodes]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); end % If nodes are not 1 and 2 then generate them (ken, this is where % the flaw is my tonality index (the one you asked in my thesis) while ((HE_N~=1) & (HE_N~=2)) & (length(tn) >1) & check_flag==0 clear gNodes HE_N=Pnode(HE_N); temp=HE_N; rcfs_nodes=[temp, rcfs_nodes]; if(HE_N==1)|(HE_N==2) rcfs_nodes=[rcfs_nodes, tn(path)]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); end % end while-III % send the nodes for reconstruction [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); % generate tonality index with frequency mapping

111

tIndex_f=tIndex_f+tin; % clear node list rcfs_nodes=[]; check_flag=0; else count=count_gn; container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') end end %end for

4) Node Reconstruction

% Vaibhav Chhabra % Thesis - Node Reconstruction Fucntion % Last Modified: 04/12/2005 % % This program gets the nodes from the wavelet analysis function and % reconstructs them using the inverse discrete wavelet transform. It then % sends the array of reconstructed nodes to the tonality estimator function [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count,tn) global length_block rcfs=zeros(count,length_block); % Initializing the array for i=1:length(gNodes) rcfs_str = ['rcfs',int2str(gNodes(i)),' = wprcoef(wpt, [gNodes(i)]);']; eval(rcfs_str); rcfs_store_str=['rcfs(',int2str(i) ',:) = rcfs',int2str(gNodes(i)),';']; eval(rcfs_store_str); end % check if gNodes is empty check=isempty(gNodes); if (check==0) [diff,count,gNodes]=ACFALG(rcfs,count,gNodes,tn); end

5) Tonality Estimation

% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the reconstructed nodes and analyzes them based on the % auto-covariance peaks function [diff,count,gNodes] = ACFALG(rcfs,count,gNodes,tn) global frame_count frames diff f check_flag=0;

112

check=isempty(rcfs); if (check==1) & (check_flag==0) container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') check_flag=1; end if (check==0) & (check_flag==0) % Calculate Autocorrelation Function ACFrcfs=zeros(count,21); for acf_count=1:count ACFrcfs(acf_count,:) = autocorr(rcfs(acf_count,:)); end % Calculate Autocovariance varACF=zeros(count,41); for var_count=1:count varACF(var_count,:) = xcov(ACFrcfs(var_count,:)); end % Uncomment this if you want to plot the auto-covariance of the ACF % for p=1:count % % if p==1 % plot(varACF(p,:));hold on % end % % if p==2 % plot(varACF(p,:),'g') % end % % if p==3 % plot(varACF(p,:),'r') % end % % if p==4 % plot(varACF(p,:),'y') % end % % if p==5 % plot(varACF(p,:),'c') % end % % if p==6 % plot(varACF(p,:),'*k') % end % % if p==7 % plot(varACF(p,:),'om') % end % end % hold off;title('AutoCovariance of ACF');xlabel('0-21 lags of ACF');ylabel('AC')

113

% legend=sprintf('Legend: blue-rcfs10, green-rcfs20, red-rcfs30, yellow-rcfs40, cyan-rcfs50, black*-rcfs60, purpleo-rcfs70'); % disp(legend); % Calculate AC difference if (var_count==1) & check_flag==0 max_var_rcfs10=max(varACF(1,1:10)); diff_sl_rcfs10=abs(max_var_rcfs10); container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; disp('Almost all the energy is in the Low-end'); gNodes; else if (max(varACF(1,:)) > max(varACF(2,:))) & check_flag==0 diff_rcfs1020=abs((max(varACF(1,:))-max(varACF(2,:)))); disp('type I Analysis - Peak Difference of AC-ACF'); gNodes; container_diff_rcfs1020(frame_count,1) = diff_rcfs1020; container_diff_rcfs1020(frame_count,2) = count; diff(frame_count,:)=container_diff_rcfs1020(frame_count,:)*1000; check_flag=1; end if (var_count>=4 & check_flag==0) min_var_rcfs10=min(varACF(1,(1:10))); if(min_var_rcfs10<0) min_var_rcfs10=0; end diff_sl_rcfs10=min_var_rcfs10; disp('var_count >=4 taking min of side-lobe') container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; end % side lobe variance difference if (max(varACF(2,:)) > max(varACF(1,:))) & check_flag==0 max_var_rcfs10=max(varACF(1,(1:10))); min_var_rcfs10=min(varACF(1,(1:10))); diff_sl_rcfs10=(max_var_rcfs10-min_var_rcfs10); disp('type II Analysis - Side Lobe Peaks of AC-ACF-rcfs10'); gNodes; container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; end

114

if (var_count==4) & check_flag==0 if (max(varACF(4,:)) > max(varACF(3,:))) max_var_rcfs40=max(varACF(4,:)); min_var_rcfs30=min(varACF(3,:)); diff_sl_rcfs4030=(max_var_rcfs40-min_var_rcfs30); disp('type III Analysis - noise like characteristics varACF40 and varACF30 peaks compared'); gNodes; container_diff_sl_rcfs4030(frame_count,1) = diff_sl_rcfs4030; container_diff_sl_rcfs4030(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs4030(frame_count,:)*1000; check_flag=1; end end end end % if for check==0 % send gNodes for frequency mapping fTable(gNodes,diff,f);

6) Frequency Mapping

% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the last node "generated nodes array" and maps them on a % frequency axis. It then stores them in an array corresponding to its % frame value (frame_count) function tin=fTable(gNodes,diff,store_gNodes,f) global frame_count tin tin=zeros(1,22050); i=gNodes(end); % Stupid way of mapping to the frequency axis if (gNodes(end)==1) for j=1:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==2) for j=11024:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end

115

end end if (gNodes(end)==3) for j=1:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==4) for j=5512:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==5) for j=11024:16538 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==6) for j=16538:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==7) for j=1:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==8) for j=2756:5512 if (round(diff(frame_count))==0)

116

tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==9) for j=5512:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==10) for j=8268:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==11) for j=11024:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==12) for j=13780:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==13) for j=16536:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==14) for j=19292:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else

117

tin(j)=diff(frame_count);end end end if (gNodes(end)==15) for j=1:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==16) for j=1378:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==17) for j=2756:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==18) for j=4134:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==19) for j=5512:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==20) for j=6890:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end

118

end if (gNodes(end)==21) for j=8268:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==22) for j=9646:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==23) for j=11024:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==24) for j=12402:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==25) for j=13780:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==26) for j=15158:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

119

if (gNodes(end)==27) for j=16536:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==28) for j=17914:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==29) for j=19292:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==30) for j=20670:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==31) for j=1:689 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==32) for j=689:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==33) for j=1378:2067

120

if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==34) for j=2067:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==35) for j=2756:3445 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==36) for j=3445:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==37) for j=4134:4823 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==38) for j=4823:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==39) for j=5512:6201 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count));

121

else tin(j)=diff(frame_count);end end end if (gNodes(end)==40) for j=6201:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==41) for j=6890:7579 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==42) for j=7579:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==43) for j=8268:8957 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==44) for j=8957:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==45) for j=9646:10335 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end

122

end end if (gNodes(end)==46) for j=10335:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==47) for j=11024:11713 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==48) for j=11713:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==49) for j=12402:13091 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==50) for j=13091:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==51) for j=13780:14469 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

123

if (gNodes(end)==52) for j=14469:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==53) for j=15158:15847 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==54) for j=15847:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==55) for j=16536:17225 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==56) for j=17225:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==57) for j=17914:18603 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==58)

124

for j=18603:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==59) for j=19292:19981 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==60) for j=19981:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==61) for j=20670:21359 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==62) for j=21359:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end

university of miami tonality estimation using …vaibhav, chhabra (m.s., music engineering...

Documents