spana – development of multimedia tool for …mwmak/programs/spanaprojectreportv...project title...

Project Title

SPANA – Development of Multimedia Tool for Learning Speech

Analysis

Supervisor: Dr. M.W. Mak Student Name: Sit Chin Hung Student ID: 00146713D Period: Aug 2003 – Apr 2004

2

Abstract Digital speech processing has wide applications in modern day such as mobile phone

communication, voice recognition and voice verification systems. Speech analysis is the most

fundamental to these applications. In order to help students learn the abstract concepts in speech

analysis, a software package tool, SPANA, was developed.

For this version of Spana, six functions, namely, Plotting of Pitch Contour, Plotting of

LPCC-based spectral envelope, Plotting of spectral envelope by FFT-based cepstral liftering,

Zooming of the speech signal in time domain, Interactive Fast Display and Interactive Spectral

Plot were added.

For the Plotting of Pitch Contour, AMDF method was used in pitch detection In order to reduce

error in pitch detection, a probabilistic approach was applied.

Plotting of LPCC-based spectral envelope and spectral envelope by FFT-based cepstral liftering

were integrated to the PlotSpectralEnvelope( ) function which was responsible for plotting the

frequency spectrum and LP spectral envelope. As a result, the PlotSpectralEnvelope( ) could plot

all the envelopes stated above on the same screen.

The Zoom function was accomplished by creating an event handler, OnMouseMove( ), to handle

the “mouse move” event. Therefore, blow-up of the waveform of the speech signal in time

domain could be done when the mouse pointer was moved across the waveform of the speech

signal

Interactive Fast Display was completed by adding an KeyDown event handler, OnKeyDown( ),

which was used to response the keys pressed in keyboard.

3

Interactive Spectral Plot was done by reacting users’ changes in poles, zeros and sensitivity of LP

parameters by changing the LPCC-based spectral envelope

Spana was developed under MS Visual C++ environment with MFC

It is believed that the addition of these functions to Spana has taken a step forward in making

Spana more user-friendly and helping students to learn speech analysis more easily.

4

Acknowledgments I would like to offer my special thanks to my supervisor Dr M. W. Mak for his valuable advice

and useful materials on handling the project. I was impressed by his willingness to give his time

so generously to guide me to find the solution instead of giving me the solution directly.

I would also like to extend my thanks to the technicians of the laboratory of the EIE department

for their help in offering me the resources in the development of the project.

5

Table of Content Chapter 1 Introduction ..................................................................................................... 8

1.1 Background .......................................................................................................................8 1.2 Objectives..........................................................................................................................8 1.3 Organization ......................................................................................................................9

Chapter 2 Project Specification ......................................................................................... 10 2.1 Plotting the LPCC-based spectral envelope ..........................................................10 2.2 Plotting the spectral envelope by FFT-based cepstral liftering .............................10 2.3 Plotting of Pitch Contour....................................................................................... 11 2.4 Adding a Zoom function to view the speech signal in time domain..................... 11 2.5 Interactive Fast Display......................................................................................... 11 2.6 Interactive Spectral Plot ........................................................................................12

Chapter 3 Theories of Speech Analysis ......................................................................... 13 3.1 Pitch Estimation ....................................................................................................13 3.2 Smoothing the LP-based spectral envelopes by cepstral processing ....................14 3.3 FFT-based cepstral liftering...................................................................................15

Chapter 4 Window Programming .................................................................................... 17 4.1 Introduction ...........................................................................................................17

4.1.1 The Windows Programming Model ..............................................................17 4.1.2 Microsoft Foundation Class (MFC) ..............................................................18 4.1.3 The Document/View Architecture.................................................................18

4.2 Guide to create a simple windows program using MFC.......................................20 4.2.1 Create a Single Document Interface (SDI) Application................................20 4.2.2 Add a menu entry to Menu............................................................................21 4.2.3 Adding function to the Menu ........................................................................23 4.2.4 Drawing a line ...............................................................................................25

4.3 GDI, Memory DC and Bitmap..............................................................................25 Chapter 5 Methodology .................................................................................................. 28

5.1 Plotting the LPCC-based spectral envelope ..........................................................28 5.1.1 Introduction ...................................................................................................28 5.1.2 Program Flowchart........................................................................................28 5.1.3 Getting the frame index.................................................................................29 5.1.4 Windowing the samples of the selected frame..............................................31 5.1.5 Computing the LPC coefficients ...................................................................32 5.1.6 Computing the Makhoul’s “a” ......................................................................32 5.1.7 Computing the LPC gain...............................................................................33 5.1.8 Computing the cepstral coefficients..............................................................34 5.1.9 Appending zero quefrency ............................................................................35 5.2.9 Computing the smooth spectrum from LP-derived cepstrum .......................36

6

5.2 Plotting the spectral envelope by FFT-based cepstral liftering .............................38 5.2.1 Introduction ...................................................................................................38 5.2.1 Program flowchat ..........................................................................................38 5.2.2 Computing the frame index and then the windowed signals of a frame .......39 5.2.3 Computing the Short-Term Real Cepstrum (stRC).......................................39 5.2.4 Peform liftering, cut-off time at pOrder ........................................................41 5.2.5 Computing spectral envelope based on stRC................................................41 5.2.6 Flow chart of the plotting function................................................................43 5.2.7 The plotting function.....................................................................................44 5.2.8 Starting to plot the two spectral envelopes....................................................44

5.3 Plotting the Pitch Contour .....................................................................................47 5.3.1 Introduction ...................................................................................................47 5.3.2 Program Flowchart........................................................................................47 5.3.3 Computation of the Mean and Standard Derivation......................................48

5.3.3.1 Lowpass filtering the entire speech signal ............................................49 5.3.3.2 HANNING windowing the speech samples..........................................52 5.3.3.3 Computing the zero crossing rate..........................................................52 5.3.3.4 Computing the pitch period estimates using AMDF.............................53 5.3.3.5 Mean and standard derivation of the pitch period estimates.................54

5.3.4 Computing all candidate pitch periods for the selected frame ......................55 5.3.5 Computing the zero crossing rate..................................................................57 5.3.6 Filtering out the markers with the constraints...............................................57

5.3.6.1 Finding the constraints ..........................................................................59 5.3.7 Weighting the markers with the normal distribution ....................................63

5.4 Adding a zoom function to view the speech signal in time domains....................66 5.4.1 Introduction ...................................................................................................66 5.4.2 Program flowchart.........................................................................................66 5.4.3 Creating and showing the Zoom scale dialog ...............................................68 5.4.3 The OnMouseMove( ) handler. .....................................................................69 5.4.4 Getting the zoom factor to compute the new frame size...............................69 5.4.5 Calculating the Start Play Sample index .......................................................71 5.4.6 Windowing the frame samples......................................................................72 5.4.7 Appending zeros for FFT ..............................................................................72 5.4.8 Computation of frequency spectrum and the spectral envelopes..................73

5.4.8.1 Computed the frequency spectrum........................................................73 5.4.8.2 Computing the LPC envelope ............................................................74 5.4.8.3 Computing the LPCC-based spectral envelope the spectral envelope by FFT-based cepstral liftering ..........................................................................75

5.4.9 The Zoom Scale Dialog.................................................................................76 5.6 Interactive Fast Display.........................................................................................79

7

5.6.1 Introduction ...................................................................................................79 5.6.2 Program Flowchart........................................................................................80 5.6.3 Getting the current frame index and next frame index..................................81 5.6.4 Setting the red vertical line to new position..................................................81

5.6 Interactive Spectral Plot ........................................................................................84 5.6.1 Introduction ...................................................................................................84 5.6.2 Program flowchart.........................................................................................84 5.6.3 Computing a new set of LP coefficients........................................................85 5.6.4 Computing the new LCPP-based spectral envelope .....................................85 5.6.5 Plotting the new LPCC-based spectral envelope .................................................86

Chapter 6 Results and Discussion.................................................................................. 87 6.1 Plotting of the LPCC-based spectral envelope and the spectral envelope by FFT-based cepstral liftering...................................................................................................87 6.2 Plotting of Pitch Contour.......................................................................................88 63 Zooming the speech signals in time domain .........................................................91 6.4 Interactive Fast Display.........................................................................................93 6.5 Interactive Spectral Plot ........................................................................................95

Chapter 7 Conclusion and Recommendations .............................................................. 97 7.1 Conclusion.............................................................................................................97 7.2 Recommendations for further work ......................................................................98

References 99

8

Chapter 1 Introduction

In order to develop and build applications using digital speech processing technology such as

mobile phone communication, speech synthesis and speech recognition, we have to understand

the characteristics of speech signal. Speech analysis refers to the analysis and extraction of

characteristics of speech signal. For this goal, a speech analysis learning tool, SPANA, was

therefore developed to help students learning speech analysis.

The project began in Aug 2003 and completed in April 2004.

1.1 Background

SPANA has been developed a few years ago and was kept enhancing. It has been developed in

Visual C++ environment using MFC. SPANA is run on Windows 32 application.

1.2 Objectives

SPANA has been developed a few years ago. The objective of this project is to make

enhancement to the SPANA, for example, integration of new features to SPANA. The new

features in SPANA included Plotting of Pitch Contour, Plotting of LPCC-based Envelope,

Plotting of spectral envelope by FFT-based cepstral liftering, Zooming of the speech signal in

time domain, Interactive Fast Display and Interactive Spectral Plot.

There are many features included in past versions of SPANA including Spectrogram Display,

line spectrum pair analysis; average energy and zero crossing measurements.

This report covers the theories of both speech analysis and Windows programming. Therefore, it

9

is suggested to have a fundamental knowledge in both speech analysis and window programming

in order to have a better understanding of this project.

1.3 Organization

The introduction to the background and the objectives of this project are presented in this chapter.

The rest of the dissertation is organized as follows.

Chapter 2 presents the specifications of the project.

Chapter 3 gives the information about the speech analysis theories that was involved in this

project.

Chapter 4 provides information about window programming. Since MFC was used in this project,

a brief introduction of MFC is included in this chapter including how to create a Single

Document Interface, how to add function for a menus and how to paint using Device Context

etc.

Chapter 5 describes the methodologies used in the development of this project. It includes the

flow charts of the algorithms involved in the project, procedures of the implementation and

program codes of the procedures.

Finally, conclusions are presented in Chapter 6 together with recommendations for further work.

10

Chapter 2 Project Specification

The specifications of this project are as follows:

1. Plotting the LPCC based spectral envelope

2. Plotting the spectral envelope by FFT-based cepstral liftering

3. Plotting of Pitch Contour

4. Adding a zoom function to view the speech signal in time domains

5. Interactive Fast Display

6. Interactive Spectral Plot

2.1 Plotting the LPCC-based spectral envelope

The LPCC-based spectral envelope refers to the envelope that is obtained by smoothing the

LP-based spectral envelopes by cepstral processing. The advantage of LPCC-based spectral

envelope is that it can provide a more consistent representation of a speaker’s vocal tract

characteristics. The envelope created from LP-derived cepstral coefficients (LPCCs) can track

the peaks of the speech spectrum and hence it can be used as a feature for speaker recognition.

2.2 Plotting the spectral envelope by FFT-based cepstral liftering

Spectral envelope by FFT-based cepstral liftering could be obtained by carrying out cepstral

liftering and then FFT to the Short-Term Real Cepstrum (stRC). Cepstral liftering is analogous to

filtering in the usual frequency domain. The spectral envelope can be applied in formant

estimation and pitch detection. The LPCC-based spectral envelope and spectral envelope by

FFT-based cepstral liftering were painted on the same screen so that their relationship could be

seen.

11

2.3 Plotting of Pitch Contour

The pitch contour of a speech shows the pitch period for every frame of voiced sound. This lets

users view the pitch periods at every frame of voiced sound.

2.4 Adding a Zoom function to view the speech signal in time domain

The Zoom function should zoom the speech signal in time domain. Scaling of waveform

blow-up is supported. In other words, users can tune the scale of zooming by adjusting the scale

slider. There are four scales in the slider. When users move the mouse pointer, the zooming

window would shift accordingly with the mouse pointer.

The zoom window contains two parts, the upper part of the window will zoom the speech signal

in time domain while the lower part will display the spectrum of the signal within the zoom

window and its corresponding LPC envelope, LPCC-based spectral envelope and spectral

envelope by FFT-based cepstral liftering

2.5 Interactive Fast Display

In this version of Spana, users can react interactively with the Fast Display dialog by using the

keyboard. That means the red vertical line (indicating the frame that is displaying in the Fast

Display dialog) will shift accordingly with the keys “ ” or “ ”. For key “ ”, the red vertical

line will shift to right while “ ” will shift it to left. For example, if the frame that is currently

displayed in Fast Display dialog is 10 and the user presses “ ” once, the red vertical line will

shift to right and the frame that will be displayed in Fast Display dialog is 11.

12

2.6 Interactive Spectral Plot

During the display of the spectral envelopes, users can change the LPCC-based spectral envelope

simultaneously by adjusting the following parameters:

i. Poles in LP Pole Control dialog (Figure 2.1 A)

ii. Zeros in LSP Control dialog (Figure 2.1 B)

iii. Sensitivity of LP parameters in the Sensitivity of LP Parameters dialog (Figure 2.1 C)

(A) (B)

(C)

Figure 2.1 – (A) LP Pole Control dialog, (B) LSP Control dialog,

(C) Sensitivity of LP Parameters dialog

13

Chapter 3 Theories of Speech Analysis

The following theories of speech analysis were applied in this project.

1. Pitch Estimation

2. Smoothing the LP-based spectral envelopes by cepstral processing

3. Cepstral liftering

3.1 Pitch Estimation

Basic algorithm for AMDF

For each frame k, the short-term difference function AMDF is defined as follows:

∑=

+−=N

innn jixix

NjAMDF

1

|,)()(|1)( MAXLAGj ≤≤1 (3.1)

Where MAXLAG is the maximum number of AMDF values generated in each frame. The

difference function would have a local minimum if the lag j is equal to or very close to the

fundamental period. Thus, for each frame, the lag for which the AMDF is a global minimum is a

strong candidate for the pitch period of that frame [9].

Problem for this algorithm:

The disadvantage of this algorithm is that the minimum in each frame is strongly affected by the

intensity variation and the background noise of the speech signal. In order to reduce the errors

due to the problem mentioned above, a global error correction routine is required for the pitch

detection system to locate the incorrect estimates and correct the errors [9].

14

3.2 Smoothing the LP-based spectral envelopes by cepstral processing

The linear prediction (LP) analysis is based on the assumption that the current sample of speech

signals s(n) can be predicted from the past P speech samples. This can be illustrated by the

following equation:

∑ −−=≈P

kk knsansns )()(~)( (2.2)

where Pkka 1}{ = are called the LP coefficients. Another assumption is that the excitation source

Gu(n), where G is the gain and u(n) is the normaliszed excitation, can be separated from the

vocal tract. By using these two assumptions, the vocal track can be represented by an IIR filter of

the form:

∑=

−+== P

k

kk zanGu

zSzH

11

1)(

)()( (2.3)

The time-domain representation of the output s(n) of this IIR filter is a linear regression of its

past output values and the present input Gu(n):

∑=

+−−=P

kk nGuknsans

1)()()( (2.4)

The LP analysis is aimed at computing a set of LP coefficients },...,{ 1 Paaa = for each frame of

speech. As a result, the frequency response of Eq. 2.3 is as close to the frequency spectrum of the

speech signal as possible. Therefore, vocal track of a speaker can be modeled by using the LP

coefficients [1].

However, although LP coefficients represent the spectral envelope of the speech signals, it was

found that a more consistent representation of a speaker’s vocal tract characteristics can be

obtained by smoothing the LP-based spectral envelopes by cepstral processing. The cepstral

15

coefficients nc can be computed from LP coefficients ka as follows:

Gc ln0 =

∑−

=−

−−=

1

1

n

kknknn ac

nkac Pn ≤≤1

∑−

=−

−=

1

1

n

kknkn ac

nkc Pn >

where G is the estimated model gain and P is the prediction order. Fig.2.2 shows the process of

computing the LP-based cepstral parameters. Since the parameters are derived from LP analysis,

they are called LP-derived cepstral coefficient (LPCCs) [1].

Figure 3.1 – Computation of LPCCs from speech signals

Since the envelope created by the LPCCs can track the peaks of the speech spectrum, LPCCs can

be used as a feature for speaker recognition.

3.3 FFT-based cepstral liftering

The spectral envelope by FFT-based cepstral liftering is obtained by carrying out cepstral

liftering and then FFT (Fast Fourier Transform) to the Short-Term Real Cepstrum (stRC). Figure

2.2 shows the computation of the spectral envelope by FFT-based cepstral liftering.

Windowing and Frame Blocking

Pre-emphasis

Cepstral Transformation

LP Analysis

LPC vectorCepstral vector

speech

(2.5)

(2.6)

(2.7)

16

Figure 3.2 – Computation of the spectral envelope by FFT-based cepstral liftering

Liftering

Linear filtering refers to filtering in quefrency domain. Therefore, low-time lifter is analogous to

a lowpass filter in the usual frequency domain.

Figure 3.3 – Liftering Where

Zero padding

FFT Log( )

IFFT FFT Low-time lifter

s(n)

s(n) stRC

l(n) = 1, n=0, 1, …, L

0, other than n

Spectral envelope

l(n) stRC

liftering

L

1

17

Chapter 4 Window Programming

4.1 Introduction

4.1.1 The Windows Programming Model

DOS application uses a procedural programming model while windows programming is based

on event-driven model. In windows programming model, there is a message queue storing the

events to be handled later (Fig 4.1). An event can be a mouse move, a mouse click or minimizing

a window frame etc. When there is an event happened, for instance, mouse pointer is moved over

the window frame, the corresponding message, WM_MOUSEMOVE, would be generated.

When the message enters the message queue, it will be passed to the message loop and

dispatched to the corresponding message handler and the procedures included in the handler

would be run accordingly. The message handler for WM_MOUSEMOVE is OnMouseMove().

Fig 4.1 depicts the Windows programming model.

Figure 4.1 – Windows programming model

18

4.1.2 Microsoft Foundation Class (MFC)

If we want to create software with Graphical User Interface (GUI), Windows Application

Interface (API) for windows can help us. However, there are many Windows API for Windows

programming. Moreover, it is quite time consuming to develop software if we do all the work by

calling Windows API directly. MFC is a library that provides multiple levels of support to

developers. At one level, it provides a C++ class library that encapsulates the Windows API.

Many of these classes encapsulate intrinsic Windows objects and their associated functions,

allowing the developers to work at a somewhat higher level of abstraction than is experienced

using the raw API. For example, to create a simple window in MFC, you declare an instance of

the CWnd class and call its Create() function. All of the steps that are required to create a

window (like defining a WndProc, registering a window class, etc.) are now provided by the

CWnd class implementation

MFC is used when include the header file “Afxwin.h” in the application.

4.1.3 The Document/View Architecture

The document view architecture has been introduced since MFC 2.0. With this architecture,

when you create a Single Document Interface (SDI) application, there would be four specific

classes created to make up an SDI application

-The CWinApp-derived class

-The CFrameView-derived class

-The CDocment-derived class

-The Cview-derived class

The CWinApp class receives all the event messages and then passes the messages to the

CFrameView and CView classes.

19

The CFrameView class is the window frame. It is responsible for holding the menu, toolbar,

scrollbars, and any other visible objects attached to the frame. It is also for the determination

how much of the document is visible at any time.

The CDocument class houses your document. It is responsible for the storage and manipulation

of data that makes up the document. The class receives input from the CView class and passes

display information to the CView class. Moreover, retrieving the document data from files is

done by this class.

The CView class is for the display of the visual representation of the document for the user. It is

responsible for passing input information to the CDocument class and receiving display

information from the CDocument class.

It should be noted that only one document can be opened at a time in an SDI application. On the

other hand, a multiple document interface (MDI) application allows the existence of multiple

documents with multiple views to each of the document and the frame window object is to host

those views.

Fig 4.2 – Data flow of the Document/View Architecture

Application object (CWinApp)

Document object

(CDocument)

Messages passed to the

frame window and view

Two-way flow of information between

the document and the view objects

20

Fig 4.2 shows a simple data flow in the document/view architecture. There is a message loop in

the application object to retrieve the event-driven message. The application object (CWinApp)

would act as a receiver to receives all the event messages and then passes the messages to the

view object. The view object requests data from the document object while the document object

would response by providing the necessary data to render the output in the view object.

There are many advantages of using the document/view architecture. We can centralize the data

source such that it is possible to view the same data with multiple views, one in the form of a

table while another in the form of a chart. Moreover, when there is a modification to the data in

any one of the view, the data in other views can be easily be synchronized by calling the

UpdateAllView() function.

Another important feature of MFC is command routing [5]. The command routing mechanism

enables the command message almost anywhere in the application.

4.2 Guide to create a simple windows program using MFC

4.2.1 Create a Single Document Interface (SDI) Application

The following step shows how to create a new SDI application. Let us start a new project by

selecting “File” “New”.

1.) In the Project Tab, select “MFC AppWizard (exe)”.

2.) Type the project name and project location. Click “OK”.

3.) Select “single document” and check the “Document/View Architecture Support”.

Click “Next”.

4.) Select “None” for no database support and then click “Next”.

5.) Select “None” for no compound document support and then click “Next”.

6.) Click the expected features of user interface and then click “Next”.

21

7.) Click “Next” again and then click “Finish”.

A workspace is created and we can now develop our application through this framework.

4.2.2 Add a menu entry to Menu

The following steps show you how to add a menu entry to Menu

1. Select the Resource View tab in the workspace pane

2. Select the project resources folder at the top of the tree;

3. Click the “+” of the “Menu” folder

4. Double-click the “IDR_MAINFRAME”, as shown in Figure 4.3.

Figure 4.3 – The Insert Resource dialog

6. Click the last rectangular box (the red circle) and input “Test”, then press “Enter”.

7. There will be a rectangular box appear below “Test”, click to highlighted it and input “Draw

Line” as shown in Figure 4.4.

22

Figure 4.4 – Enter menus entity

8. Right click “Draw Line” and select “Properties” in the pop-up menu.

9. Input all the parameters, as shown in Figure 4.5, and press “Enter”.

Figure 4.5 – The Menu Item Properties dialog

10. The menu entry has been created as shown in Figure 4.6

23

Figure 4.6 – The new menu entry

4.2.3 Adding function to the Menu

Windows program is event-driven. When we select a menu item, a message for this message

would be generated and would be sent to message queue to invoke an operation. The operation

for the event depends on the codes in the message handler. Thus, we have to add necessary codes

to the handler.

1.) Select “View” “ClassWizard”.

2.) Select “ID_MENUDrawLine” in the “Object ID” column and “Command” in “Messages”

column.

3.) Click “Add Function” to add the handler for the ID selected and click “OK”.

4.) Click “Edit Code” button to add the required program codes now. See Figure 4.6.

24

Figure 4.7 – MFC ClassWizard dialog for adding function

5. Now you can add the codes to the handler, OnMenuDrawLine( ), as shown in Figure 4.8.

Fig 4.8 – Adding code to the handler

25

4.2.4 Drawing a line

In Windows programming, drawing graphics is done through the device context (DC). In Visual

C++, the MFC device context provides numerous drawing functions for drawing circles, squares,

lines, curves, and so on. The operating system uses the device context to learn in which context a

graphic is being drawn, how much of the area is visible, and where on the screen it is currently

located. In MFC, the drawing functions are wrapped by the CDC class. To draw a line, we

should use the MoveTo() function to move to a starting point and then use LineTo() function to

draw a line to the destination point. We can use the following codes to draw a line.

Listing 4.1 – Draw a Line from the point (20, 20) to the point (120, 120) void CHelloView::OnPaint()

{

CPaintDC dc(this); // Device Context

dc.MoveTo(20,20);

dc.LineTo(120,120);

}

4.3 GDI, Memory DC and Bitmap

GDI stands for "Graphics Device Interface", DC for "Device Context". The designers of

Windows decided that it would be nice to have a single way of drawing to all "things", The

development of GDI is in order to provide a universal set of routines that can be used to draw

onto a screen, printer, plotter or bitmap image in memory.

Associated with a Device Context, a number of tools that can be used to act on the associated

drawing surface: Pens, brushes, fonts etc. For memory DC, a number of present pens are

provided, and more can be created as needed.

26

A Device Context is a handle to a drawing surface on some device .It can typically be obtained

for the display device including printers and plotters. The most commonly worked with are

window dc which is a display DC that merely represents the area of a single window and a

memory DC that represents a bitmap as a device

A Bitmap is the in-memory representation of a drawing surface. By “linking” a bitmap into a

memory DC, the DC then represents that bitmap as a drawing surface, and all the normal GDI

operations can be performed on the bitmap. GDI also has a number of functions that can copy

areas from the drawing surface of one DC to another, so bitmaps then are a useful way to store

images in memory that will later be copied to the display (or other devices).

The bitmap and memory DC can be used to remove the flicker effect when updating the screen

based on z-buffering technique. A bitmap object is an instance of the CBitmap class. It is not

exactly the traditional bitmap graphic (BMP). Instead, a CBitmap object is a GDI object. It is an

array of bits in which one or more bits correspond to each display pixel. We can load a bitmap

graphic from a file to a CBitmap object or we can construct our own bitmap data of the CBitmap

object.

To create a CBitmap object, the following code is used. The third statement is to

define the

attributes of the object such as the resolution and color depth. In this case, the attributes is the

same as the screen device context dcScreen and with both width and height equal to 100.

Listing 4.2 – Create a CBitmap object CClientDC dcScreen (this); // Device Context of the Client Window

CBitmap bitmap;

bitmap.CreateCompatibleBitmap (&dcScreen, 100, 100);

27

A memory DC is then created with attributes of the screen DC. To enable the GDI output

functions to the memory DC, the CBitmap object is selected by the memory DC. In the example

below, the GDI output function is FillRect( ) which draw a solid rectangle with blue color.

Listing 4.3 – Use of Memory DC CDC dcMem; // Create a Memory DC with attributes the same as the dcScreen

dcMem.CreateCompatibleDC (&dcScreen);

CBrush brush (RGB (0, 0, 255));

CBitmap* pOldBitmap = dcMem.SelectObject (&bitmap);

dcMem.FillRect (CRect (0, 0, 100, 100), &brush);

dcMem.SelectObject (pOldBitmap);

With the use of CBitmap and memory DC, an image can be pasted on the screen immediately instead of pixel by pixel. Listing 4.5 – Paste the image from memory DC to the screen DC dcScreen.BitBlt (0, 0, 100, 100, &dcMem, 0, 0, SRCCOPY);

28

Chapter 5 Methodology

5.1 Plotting the LPCC-based spectral envelope

5.1.1 Introduction

LPCC-based spectral envelope was obtained by smoothing the LP-based spectral envelope by

cepstral processing. The function of plotting the envelope was added under the “Plot”

“Spectral Envelope” menu and the envelope was plotted in the same screen as the LPC envelope.

5.1.2 Program Flowchart

Figure 5.1.1 – Flowchart of computation of the LPCC-based spectral envelope

Start of Calculation function

Get the frame index

Windowing the samples of the

selected frame

Compute the Makhoul’s a by

using the LPC

Compute LPC coefficients

Compute the LPC gain

Compute cepstral coefficients of

Makhoul’s “a”

Append zero quefrecy to the

cepstral coefficients

Compute LPCC based spectral

envelope

End of calculation function

29

5.1.3 Getting the frame index

Before getting the frame index, it was necessary to know the total number of frames.

Figure 5.1.2 –Frame overlapping Assume there the following parameters. Overlapping = 50 % Number of samples = 95 windowsize =20. Number_of_frames = floor(Number_of_samples / offset -1) = floor(95/(20*50%)-1) = 8

10 20 30 40 50 60 70 80 90 95

Offset Discard this frame

20 40 60 80 95

Overlapping

…

Speech signal

30

Listing 5.1.1 – Getting the total number of frames (SpanaView.cpp) void CSpanaView::allocate(speech_parameter *sp)

{

..........

sp->offset=(__int16)((sp->windowsize)*sp->window_overlap+0.5);

sp->number_of_frames=(__int16)((float)sp->number_of_samples/

(float)sp->offset-(float)(sp->windowsize)/sp->offset)+1;

..........

}

When there was a mouse left-click to the waveform of the speech signal, there would be a red

vertical line at the point of mouse click and a Fast Display dialog would be shown in Figure

5.1.3. Based on the x-coordinates of the red vertical line, the frame index can be evaluated.

Figure 5.1.3 – Finding the frame index

point.x - xoffs

M_dfMaxX – 2*xoffx

X Number of frame Index =

Y

The red vertical line

Fast Display Dialog

point.x

xoffx

= point.x - xoffs

xoffx

M_dfMaxX

= M_dfMaxX – 2*xoffx

X(0, 0)

rect1

31

Listing 5.1.2 – Finding the frame index (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)

{……….

CRect rect1;

this->GetClientRect(&rect1);

……….

m_dfMaxX = rect1.right;

index = (short)((sp.number_of_frames)/(m_dfMaxX-2*xoffs)*(point.x-xoffs)+1);

……….

}

5.1.4 Windowing the samples of the selected frame

The frame index was then used to get the windowed signals for the selected frame. Each frame

of the speech signals had been windowed once the speech file was loaded. The windowed signal

for the entire speech file could be referenced by the following pointer.

float **w; // pointer to matrix containing windowed data // range: w[0 .. sp->number_of_frames-1][0.. // sp->windowsize-1]

Figure 5.1.4 – Structure of windowed data for the selected frame

After getting the frame index, the windowed frame signal can be referenced by the following code. ……….

sp.w[index-1]; // index=1, 2, 3, …, number_of_frame ……….

w[0][0] w[0][1] w[0][windowsize-1] . . .

w[1][0] w[1][1] w[1][windowsize-1]

w[2][0] w[2][1] w[2][windowsize-1]

w[num-1][0] w[num-1][windowsize-1]

. . .

. . .

. . .

. . .

w[num-1][1]

Where num = number_of_frames

sp.w[0]

sp.w[1]

sp.w[2]

sp.w[num-1]

32

5.1.5 Computing the LPC coefficients

lpc = [1 a(1) a(2) a(3) … a(order)], where a is the LPC coefficients and pOrder is prediction order

Where autocc = autocorrelation of x (x = sp.w[index-1] )

order = prediction order K = reflection coefficients

Figure 5.1.5 – Structure of lpc

5.1.6 Computing the Makhoul’s “a”

Makhoul’s “a” =

Figure 5.1.6 – Structure of Makhoul’s “a” Listing 5.1.3 – Computing the Makhoul’s “a” (SpanaView.cpp) float * CSpanaView::cal_a_Markhoul(__int16 pOrder, float *lpc,__int16 windowsize)

{

float * a=new float[pOrder+windowsize];

calc_lpc(order,autocc,lpc,K) autocc,

order lpc, K

1 a(1) a(2) a(3) … a(pOrder) lpc =

pOrder+1

a(1) a(2) … a(pOrder) 0 0 … 0

pOrder windowsize

cal_a_Markhoul(pOrder,lpc,windowsize)

Makhoul’s lpc

33

//append the lpc to a_makhoul first

for(int i=0;i<pOrder;i++)

{

a[i]=lpc[i+1];

}

//then append zeros to a_Marhoul

for(i=0;i<windowsize;i++)

{

a[i+pOrder]=0;

}

return a;

}

5.1.7 Computing the LPC gain

Where a_makhoul = makhoul’s a x = sp.w[index-1] Listing 5.1.4 – Computing the LPC gain (SpanaView.cpp) float CSpanaView::LPCGain(float *x, float *a_makhoul,__int16 pOrder,__int16 framesize)

{

// R0=dot(x,x);

float temp=0;

float *R=new float[pOrder];

float R0=0;

float energy;

float gain;

//cal the dot product of a

for(int i=0;i<framesize;i++)

{

R0+=x[i]*x[i];

}

// for j=1:pOrder,

for (int j=1;j<=pOrder;j++)

LPCGain(x, a_makhoul,pOrder,framesize)

pOrder, x,

framesize,

a_makhoul gain

34

{

temp=0;

for (int m=0;m<framesize-j;m++)

{

temp=temp+x[m]*x[m+j];

}

R[j-1]=temp;

}

temp=0;

for (int k=0;k<pOrder;k++)

{

temp=temp+a_makhoul[k]*R[k];

}

energy=R0+temp;

gain=float(pow((double)energy,0.5));

delete R;

return gain;

}

5.1.8 Computing the cepstral coefficients

Figure 5.1.6 – Structure of cepstral coefficients, tempc

Listing 5.1.5 – Computing the cepstral coefficients (SpanaView.cpp) float * CSpanaView::lpc2cep(float *a_makhoul, __int16 pOrder)

{

//Convert to c(1) to c(pOrder)

float temp=0;

c(0) c(1) c(2) … (2*pOrder-1) tempc =

Lpc2cep(a_makhoul, pOrder) a_makhoul

pOrder tempc

2*pOrder

35

__int16 n,m;

float *c=new float[2*pOrder];

for(n=1;n<=pOrder;n++)

{

temp=0;

for (m=1;m<=(n-1);m++)

{

temp=temp+m*c[m-1]*a_makhoul[n-m-1]/n;

}

c[n-1]=a_makhoul[n-1]-temp;

}

//Convert to c(pOrder+1) to c(pOrder*2)

for (n=pOrder+1;n<=2*pOrder;n++)

{

temp=0;

for (m=1;m<=(n-1);m++)

{

temp=temp+m*c[m-1]*a_makhoul[n-m-1]/n;

}

c[n-1]=-temp;

}

//Convert to cepstral coefficients of H(z)

for(int i=0;i<2*pOrder;i++)

{

c[i]=-1*c[i];

}

return c;}

5.1.9 Appending zero quefrency

c = Where N = windowsize

Figure 5.1.7 – Structure of c

Log(gain) tempc(0) … tempc(2*pOrder-1) 0 0 … 0

1 2*pOrder N -2*pOrder-1

N

36

Listing 5.1.6 – Appending zero quefrency (SpanaView.cpp) void CSpanaView::PlotLPCSpectral()

{

……….

tempc=lpc2cep(a,sp.order);

//Append zero quefrency

c[0]=(float)log(gain);

for(i=0;i<2*sp.order;i++)

{

c[i+1]=tempc[i];

}

//append (N - 2*sp.order) zeros to c

for (i=2*sp.order+1;i<sp.windowsize;i++)

{

c[i]=0;

}

……….

}

5.2.9 Computing the smooth spectrum from LP-derived cepstrum

Y is the complex number obtained from FFT(c, N), where N is the windowsize. It should be

noted that FFT(c, N) and Real(Y) together form the function FFT_complex(…), which gives the

real part of FFT(c, N) to c_fft.

Figure 5.1.8 – Structure of the smooth spectrum

FFT(c, N) exp(c_fft)c c_fft lpCepSpecEnv_buffer

Real(Y)Y

)0(_ fftce )1(_ fftce )12/(_ +Nfftce. . . lpCepSpecEnv_buffer =

N/2+1

N/2+1

c_fft(0) c_fft(1) … c_fft(N/2) c_fft =

37

Listing 5.1.7 – Computing the smooth spectrum (SpanaView.cpp) float * CSpanaView::lpc2cep(float *a_makhoul, __int16 pOrder)

{

……….

//Compute smooth spectrum from LP-derived cepstrum

//lpCepSpecEnv=exp(real((fft(c,N))));

float *c_fft =new float[sp.windowsize];

float *lpCepSpecEnv=new float[sp.windowsize];

FFT_complex(c,c_fft,sp.windowsize);

for(i=0;i<=sp.windowsize/2;i++)

{

lpCepSpecEnv_buffer[i]=(float)exp(c_fft[i]);

}

……….

}

38

5.2 Plotting the spectral envelope by FFT-based cepstral liftering

5.2.1 Introduction

The spectral envelope by FFT-based cepstral liftering is obtained by carrying out cepstral

liftering and then FFT to the Short-Term Real Cepstrum (stRC). Similarly, the function of

plotting the spectral envelope by FFT-based cepstral liftering was added under “Plot”

“Spectral Envelope” menu. The envelope was plotted in the same screen as the LPC envelope

5.2.1 Program flowchat

Figure 5.2.1 – Flowchart of plotting the spectral envelope by FFT-based lifting

Calculation function starts

Get the frame index

Based on the frame index to get

the windowed signal

Perform liftering

Compute the short-time real

cepstrum

Compute spectral envelope

Calculation function ends

39

5.2.2 Computing the frame index and then the windowed signals of a frame

The computation of frame index and the windowed signal for the selected frame had been

discussed in session 5.1.3 and 5.1.4 respectively.

5.2.3 Computing the Short-Term Real Cepstrum (stRC).

Figure 5.2.2 – Flowchart of computing the short-time real cepstrum

Where Y is the complex number returned from FFT( ). Listing 5.2.1 –Computing the short-time real cepstrum (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)

{

……….

FFT(sp.w[index-1], sp.winbuffer1, sp.windowsize);

……….

}

void CSpanaView::PlotLPCSpectral()

{

……….

float *stRC =new float[sp.windowsize];

float *x_fft =new float[sp.windowsize];

float *x_ifft =new float[sp.windowsize];

//get log(abs(fft(x,N)))


{

x_fft[i]=(float)log(sp.winbuffer1[i]);

}

//make x_fft[i] symmetrical for IFFT

FFT(sp.w, N) Abs( Y )

Log(sp.winbuffer1)IFFT

sp.w[index-1]

N

Y

sp.winbuffer1

stRC x_fft

This FFT(…) integrates both the FFT(sp.w, N) and Abs(Y). i.e. return sp.winbuffer1 directly.

40

for(i=1;i<sp.windowsize/2;i++)

x_fft[sp.windowsize/2+i]=x_fft[sp.windowsize/2-i];

//perform ifft(log(abs(fft(x,N))))

IFFT(x_fft,x_ifft,sp.windowsize);

stRC=x_ifft;

……….

}

Figure 5.2.3 – Structures of Short Term Real Cepstrum, stRC

N/2 + 1

Log(|Y(0)|) Log(|Y(1)|) . . . Log(|Y(N/2)|)x_fft =

|Y(0)| |Y(1)| . . . |Y(N/2)|

N/2 + 1

sp.winbuffer1 =

Re(Y(0)) Im(Y(0)) Re(Y(1)) Im(Y(1)) . . . Re(Y(N/2)) Im(Y(N/2))

N + 2

Y =

N

stRC(0) stRC(1) . . . stRC(N-1)stRC =

41

5.2.4 Peform liftering, cut-off time at pOrder

After liftering was performed, stRC became:

Figure 5.2.4 – Structure of stRC after liftering

Listing 5.2.2 – Perform liftering (SpanaView.cpp) void CSpanaView::PlotLPCSpectral( )

{

……….

//Perform liftering, cut-off time at pOrder

for(i=sp.order;i<(sp.windowsize-sp.order);i++)

{

stRC[i]=0;

}

……….

}

5.2.5 Computing spectral envelope based on stRC.

Figure 5.2.4 – Flowchart of computing spectral envelope based on stRC

N-2*pOder

stRC(0) stRC(1) … 0 0 … 0 stRC(N-pOrder-1) … stRC(N-1) stRC =

pOrder pOrder

N

FFT(stRC, N) Real( Y ) stRC, N Y

exp(stRC_fft) cepSpecEnv

stRC_fft

42

Figure 5.2.5 – Structure of the spectral envelope, cepSpecEnv Listing 5.2.3 – Computing the spectral envelope, cepSpecEvn (SpanaView.cpp) void CSpanaView::PlotLPCSpectral( )

{

……….

//Compute spectral envelope based on stRC

//cepSpecEnv=exp(real(fft(stRC,N)));

///float *cepSpecEnv=new float[sp.windowsize];

float *stRC_fft =new float[sp.windowsize];

FFT_complex(stRC,stRC_fft,sp.windowsize);


{

CepSpecEnv[i]=(float)exp(stRC_fft[i]);

}

……….

}

Re(Y(0)) Im(Y(0)) Re(Y(1)) Im(Y(1)) . . . Re(Y(N/2)) Im(Y(N/2))

N + 2

Y =

N/2 + 1

Re(Y(0)) Re(Y(1)) . . . Re(Y(N/2))stRC_fft =

))0(Re(Ye ))1(Re(Ye ))2/(Re( NYe. . .

N/2 + 1

cepSpecEnv =

This FFT(…) integrates both the FFT( ) and Real( ). i.e. return real part only.

43

5.2.6 Flow chart of the plotting function

Figure 5.2.6 – Flow chart of the plotting function for the spectral envelopes

Start of plotting function

Declare a Memory DC, Screen DC and a

CBitmap object

Declare a CRect object to be the Virtual

Screen in Memory DC

Select the CBitmap into the Memory DC

Plot the speech signal in time domain in the

upper part of the Virtual Screen

Plot the x-axis, y-axis and other general

information

Plot the LPCC based spectral envelope in the

lower part of the Screen in Memory DC

Plot x-axis, y-axis and other general

information

End of plotting function

Plot the spectral envelope by FFT-based

cepstral liftering.

44

5.2.7 The plotting function.

Plotting the spectral envelope (variable name: CepSpecEnv) by FFT-based cepstral liftering and

LPCC based spectral envelope (variable name: lpCepSpecEnv_buffer) was done by the

PlotSpectalenvelope( ) function.

Listing 5.2.4 – Plotting the two spectral envelopes (SpanaView.cpp) void CSpanaView::PlotSpectralenvelope()

{……….

// line 4(lpCepSpecEnv_buffer)start here

CPen pen_lpCepSpecEnv(PS_SOLID, 1, RGB(0,0,255));

pOldPen = dc.SelectObject(&pen_lpCepSpecEnv);

dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv_buffer[0]-miny)*stepy));

x_coor=0;

for(i=0;i<number_of_values;x_coor+=stepx,i++)

{

dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv_buffer[i]-miny)*stepy));

}

//line 5 (cepSpecEnv) start here//////////////////////

CPen pen_CepSpecEnv(PS_SOLID,1,RGB(255,0,255));

pOldPen = dc.SelectObject(&pen_CepSpecEnv);

dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)CepSpecEnv[0]-miny)*stepy));

x_coor=0;


{ dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)CepSpecEnv[i]-miny)*stepy));

}

// line 5 end here/////////////////////////////////

……….

}

5.2.8 Starting to plot the two spectral envelopes

When there is a mouse left-click to the waveform of the speech signals, the FastDisplay dialog

would be popped out. As shown in Figure 5.2.6

45

Figure 5.2.7 – The Fast Display dialog

The FastDisplay dialog provides user a fast display of speech signal in time domain and its

corresponding frequency spectrum spectral envelope. In this version of Spana, the spectral

envelope by FFT-based Cepstral liftering and LPCC-based spectral envelope could be plotted on

the Fast Display dialog.

For simplicity, we could use the above data for plotting the two envelops. In order words,

plotting the two envelopes on the Fast Display dialog and plotting in the main window use the

same data source. After the calculation of the two envelopes, the data of them had been assigned

to the variables in the class of FastDisplay dialog. The following codes assigned the data in

CepSpecEnv, and pCepSpecEnv_buffer to CepSpecEnv and lpCepSpecEnv which were

belonging to the class of FastDisplay dialog respectively.

Listing 5.2.5 – Assigning data to the Fast Display dialog (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)

{

……….

m_FastDisplayDlg.CepSpecEnv = new float[sp.windowsize];

Fast Display dialog

46

m_FastDisplayDlg.CepSpecEnv = CepSpecEnv;

m_FastDisplayDlg.lpCepSpecEnv = new float[sp.windowsize];

m_FastDisplayDlg.lpCepSpecEnv = lpCepSpecEnv_buffer;

……….

}

Listing 5.2.6 –Plotting of the two spectral envelopes on Fast Display dialog (SpEnGraphDlg.cpp)

void CSpEnGraphDlg::OnPaint()

{

……….

/////////////////plot CepSpecEnv///////////////

CPen pen_CepSpecEnv(PS_SOLID ,1 ,RGB(0,255,0));

pOldPen = dc.SelectObject(&pen_CepSpecEnv);

dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)CepSpecEnv[0]-miny)*step y));

x_coor=0;


{

dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)CepSpecEnv[i]-miny)*stepy));

}

////////////////CepSpecEnv line here////////////

////////////////plot lpCepSpecEnv////////////////

CPen pen_lpCepSpecEnv(PS_SOLID ,1 ,RGB(0 ,0 ,0));


dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv[0]-miny)*stepy));

x_coor=0;


{

dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv[i]-miny)*stepy));

}

/////////////////lpCepSpecEnv end/////////////////////////

……….

}

the FastDisplay dialog

47

5.3 Plotting the Pitch Contour

5.3.1 Introduction

For every frame of voiced signals, there must be a pitch period for that frame. A line joining all

the points of pitch period for the whole speech represents the pitch contour. I had used AMDF

(Average Magnitude Difference Function) in computing the pitch period together with a

probabilistic approach to correct the errors during the computation of pitch period in this project.


Figure 5.3.1 – Flowchart of Pitch Detection algorithm

Compute the mean and standard

derivation

Compute all candidate pitch periods

the selected frame

Compute the zero crossing rate

Filter out the markers with the

constraints

Weight the markers with the normal

distribution

Store the pitch period marker in

m_pitch[f]

Start of Pitch Detection

The frame is voiced?

Pitch period marker

End of the file?

End of Pitch Detection

YES

NO

YES

NO

Next frame

48

5.3.3 Computation of the Mean and Standard Derivation

Computation of mean and standard derivation of the pitch period estimates for the whole speech

was done by the FindMean_Std( ) function.

Figure 5.3.2 – The program flow of the FindMean_Std( )

Low pass filter

HANNING windowing

Compute the zero

crossing rate

Voiced frame?

Find the pitch period

estimate

Compute mean for the

array

Compute standard

derivation

End of

FindMean_Std( )

Start of FindMean_Std( )

YES

NO End of file?

YES

NO

Next frame

Store the Pitch Period

49

5.3.3.1 Lowpass filtering the entire speech signal

In order to eliminate the effects of intensity variation and background noise, passed the speech

samples to a lowpass filter with 3dB attenuation at 600 Hz and 40 dB attenuation at 900 Hz..

The required filter was designed with the help of FDATool (Filter Design & Analysis Tool) in

MATLAB.

Design Procedure A. Designed the filter

Run MATLAB, and in the command line, typed: fdatool. The FDATool window will be shown as

shown in Figure1.3. Inputted all the filter parameters and clicked the “Filter Design” button to

initialize the design process

Figure 5.3.3 – The FDATool

50

When the filter design finished, the frequency response of the filter could be obtained as shown

in Figure 5.3.4

Figure 5.3.4 – Frequency Response of the required filter

B. Obtained the filter coefficients

Went to “File” “Export”, then selected “Export To Text-file”, click OK. The coefficients of the

filter have been exported to a text file.

Figure 5.3.5 – Export the filter coefficients to a text file

Opened the text file that have been created and copied all the coefficients to the lowpassfilter( )

function. The lowpassfilter( ) function was to convolute the input speech samples.

51

Listing 5.3.1 – Lowpass filter the speech samples (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)

{

……….

//filter the speech signal

lowpassfilter(sp.spcdata,spcdata_filtered,sp.number_of_samples);

……….

}

void CSpanaView::lowpassfilter(__int16 *spcdata,__int16 *spcdata_filtered,long num_spc_samples)

{

const double B[59] = {

0.006815591950326,-8.261294934178e-005,-0.0008504860921417,-0.002135981039487,

-0.003923588510516,-0.006167886847072,-0.008771445369777,-0.01161054026741,

-0.01450768607681, -0.01725123021524, -0.01960134629328, -0.02130724375831,

-0.02211597742747, -0.02179485795837, -0.02014755490484, -0.01703070669192,

-0.0123684615914,-0.006166552489493, 0.001479595354589, 0.0103867511485,

0.02028557794243, 0.03083282948718, 0.04162511803079, 0.0522280014446,

0.06218524704965, 0.07105391473281, 0.07842105987528, 0.08396391887656,

0.08738033546011, 0.08853662302354, 0.08738033546011, 0.08396391887656,

0.07842105987528, 0.07105391473281, 0.06218524704965, 0.0522280014446,

0.04162511803079, 0.03083282948718, 0.02028557794243, 0.0103867511485,

0.001479595354589,-0.006166552489493, -0.0123684615914, -0.01703070669192,

-0.02014755490484, -0.02179485795837, -0.02211597742747, -0.02130724375831,

-0.01960134629328, -0.01725123021524, -0.01450768607681, -0.01161054026741,

-0.008771445369777,-0.006167886847072,-0.003923588510516,-0.002135981039487,

-0.0008504860921417,-8.261294934178e-005, 0.006815591950326

}; for(int n=0;n<num_spc_samples;n++)

{

spcdata_filtered[n]=0;

for(int m=0;m<59;m++)

{

if ((n-m)<0)

{

}

else

spcdata_filtered[n]=__int16(B[m]*spcdata[n-m]+spcdata_filtered[n]);

}

}

Filter coefficients

Convolution

52

5.3.3.2 HANNING windowing the speech samples

In order to increase the accuracy of AMDF, for each frame of samples, they should be

preprocessed by a HANNING window. By calling the windowing function directly and passed

“HANNING” as a parameter to the windowing function, HANNING windowing was performed.

The computation steps of the windowing have been discussed in .theory.

Listing – 5.3.2 HANNING windowing a frame of samples (SpanaView.cpp)

void CSpanaView::pitch_detection(int *m_pitch_array_size) {

……….

windowing(x, frameData, win_size, HANNING, sp.norm_factor,sp.prem_factor);

……….

}

5.3.3.3 Computing the zero crossing rate

The purpose of computing the zero crossing rate was to justify whether the frame of samples was

voiced or unvoiced. The frequency of voiced sound was lower than that of unvoiced sound.

Listing 5.3.3 – Computing the zero crossing rate (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)

{

………. for(int f=0; f<num_frames; f++){

……….

zerox = 0;

for(int m=1; m<m_win_size; m++)

53

zerox += abs(sgn(frameData[m])-sgn(frameData[m-1]))/2;

zerox /= m_win_size;}

}

……….

}

5.3.3.4 Computing the pitch period estimates using AMDF

∑=

≤≤+−=N

innn MAXLAGjjixix

NjAMDF

11,)()(1)(

If the frame of samples was voiced, we would start the computation of AMDF. For each frame,

the AMDF would be computed for each lag, j, and the magnitude of AMDF would be stored in

an array, Delta [ ], which would be used in searching for the pitch period estimate. The pitch

period estimate was the lag for which the magnitude of AMDF was the global minimum in the

selected frame and the distribution of the pitch period estimates would be approximated with a

normal distribution.

On the other hand, if the frame of samples was unvoiced, the pitch period for that frame would

be set to zero. The pitch period for unvoiced frame would not be counted into the distribution of

pitch period estimates.

Listing 5.3.4 – Setting the pitch period to zero for unvoiced frame and computation of AMDF (SpanaView.cpp)

void CSpanaView::pitch_detection(int *m_pitch_array_size)

{

……….

if((zerox>0.3 && wh.bps==16) || zerox == 0)

m_pitch[f] = 0;

else if((zerox>0.6 && wh.bps==8) || zerox == 0)

m_pitch[f] = 0;

else {

If unvoiced, set the pitch period to zero

54

// AMDF

for(int i=1; i<=m_win_size; i++) {

N = 0;

Delta[i-1] = 0;

for(int j=0; j<m_win_size; j++) {

Delta[i-1] += (float)fabs(frameData[m_win_size+j-i]-frameData[m_win_size+j]);

N++;

}

Delta[i-1] /= N;

}}

……….

}

5.3.3.5 Mean and standard derivation of the pitch period estimates

After evaluating all the pitch period estimates for the whole speech, we started to compute the

mean and standard derivation of the pitch period estimates.

Listing 5.3.5 – Mean of the pitch period estimates (SpanaView.cpp) float CSpanaView::average(int num_frame,int *global_min_location)

{

int sum=0; //sum of all the lags

float mean=0;//mean of the speech

for(int i=0;i<num_frame;i++)

{

sum+=global_min_location[i];

}

mean=sum/(float)num_frame;

return mean;

}

Listing 5.3.6 – Standard derivation of the pitch period estimates (SpanaView.cpp) float CSpanaView::STD(float mean,int *global_min_location,int num_frame)

{

float std=0; //standard derivation

float var=0; //variance

for(int i=0;i<num_frame;i++){

55

var+=(global_min_location[i]-mean)*(global_min_location[i]-mean)/(num_frame);

}

std=(float)pow(var,0.5);

return std;

}

5.3.4 Computing all candidate pitch periods for the selected frame

Candidate pitch periods of a frame refer to the lags for which the AMDF were the local minima

in a frame. Searching for the local minima was accomplished by the Findlocal_min( ) function.

Figure 5.3.6 – Input and output of Findlocal_min( ) function

Findlocal_min( ) Delta[ ] num_min

local_min

Number of minima

Values of minima

local_min_location

Locations of minima

56

Figure 5.3.7 – Program flow of computing the candidate pitch periods Listing 5.3.7 – Finding all the candidate pitch periods in a frame (SpanaView.cpp) int *CSpanaView::Findlocal_min(float *Delta,int MaxLag,int *counter,float *MinData)

{

for(int j=1;j<MaxLag;j++)

{

if(j==MaxLag-1) //the last sample

{

if(Delta[j]<Delta[j-1])

{

*counter+=1;

pData[*counter-1]=j+1;

MinData[*counter-1]=Delta[j];

}

Start of FindLocalmin( )

Set i=1, counter =0

Delta[i]<(Delta[i-1]

&Delta[i-1])

Delta[i] is last

sample?

Store Delta[i] into local_min

i and counter increments

End of Delta[i]?

End of FindLocalmin( )

NO

NO

YES

YES

NO

YES

Next sample

Assign counter to num_min

57

}

else

{

if((Delta[j]<Delta[j-1])&&(Delta[j+1]>Delta[j]))

{

tempDelta=Delta[j];

tempIndex=j;

*counter+=1;

pData[*counter-1]=j+1;

MinData[*counter-1]=Delta[j];

}

}

}

return pData;

}

5.3.5 Computing the zero crossing rate

Again, in order to justify whether the frame was voiced or unvoiced, it was necessary to compute

the zeros crossing rate. Since computation of zero crossing rate had been discussed in 5.3.2.3,

please refer to that section for details.

5.3.6 Filtering out the markers with the constraints

In most AMDF-based PDAs (Pitch Detection Algorithm), the lag for which the magnitude of the

difference function is a global minimum is chosen as the pitch period estimate for that frame. In

this AMDF PDA, we not only computed the lag with global minimum, but also a set of

candidates for the pitch period in a frame was selected. Please refer to the theory. To be a marker,

the candidate pitch periods must satisfy the AMDF pattern constraints that were stated in Theory.

The computation of markers was implemented by the FindMarker( ) function.

58

Listing 5.3.8 – Finding the marker_location (lag) and marker_height (magnitude of AMDF

for the lag) (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)

{

……….

FindMarker(marker_height,marker_location,m_win_size,Delta,num_marker);

……….

}

Figure 5.3.8 – Flowchart of filtering out the markers

Find all the candidate pitch

periods

Find the constraints for each

candidate

Store the candidate as a marker

Start of FindMarker( )

Constraints

satisfied?

Any candidates?

End of FindMarker( )

YES

YES

NO

NO

59

5.3.6.1 Finding the constraints

a. global_max

The global_max was found by Findglobal_max( ) function.

Figure 5.3.9 – Program flow finding the global_max

Findglobal_max( ) Delta[ ] global_max

global_max_location

Start of

Findglobal_max( )

Store Delta[i] in buffer

Delta[i] <

Delta[i+1]

Store Delta[i+1] in

buffer

Delta[i+1]<

buffer

i=0

End of Delta[i]?

i increments

YES NO

YES

NO

YES

NO

global_max=buffer Start of

Findglobal_max( )

Next sample

NO

YES

60

Listing 5.3.9 – Finding the global_max (SpanaView.cpp) int CSpanaView::Findglobal_max(float *Delta,int MaxLag,float *MaxDelta )

{

……….

for(int j=0;j<MaxLag-1;j++)

{

if (Delta[tempIndex]>Delta[j+1])

{

tempDelta=Delta[tempIndex];

}

else

{

tempDelta=Delta[j+1];

tempIndex=j+1;

}

*MaxDelta=tempDelta;

}

return tempIndex+1;

}

b. iheight

)_,_min( iii heightrightheightleftheight = , which was computed by the

FindHeight( ) function.

Figure 5.3.10 – Input and output of FindHeight( ) function Noted that the Local_max was found before running Findlocal_min( ). Listing 5.3.10 – Finding iheight (SpanaView.cpp)

void CSpanaView::FindHeight(float *local_max,int num_min,float *height_i,float *local_min)

{

for(int i=0;i<num_min;i++)

{

FindHeight( ) Local_max

num_local_max

height_i

Array of iheight

61

if (local_max[i]<local_max[i+1])

height_i[i]=local_max[i]-local_min[i];

else

height_i[i]=local_max[i+1]-local_min[i];

}

}

c. peak_ratio

The peak_ratio was computed by the Findpeak_ratio( ) function using the formula:

peak_ratio=local_maximum/global_max.

Figure 5.3.11 – Input and output of Findpeak_ratio( ) function Listing 5.3.11 – Find peak_ratio (SpanaView.cpp) void CSpanaView::Findpeak_ratio(int num_min,float *local_max,float *global_max

,float *peak_ratio)

{


{

peak_ratio[i]=local_max[i]/(*global_max);

}

}

d. iwidthlobe _ The FindLobe_width( ) could find the lobe_width by using the formula:

lobe_width = distance between right and left local maxima.

Figure 5.3.12 – Input and output of FindLobe_width( ) function

Findpeak_ratio Local_max

global_max

peak_ratio

FindLobe_width local_max_location

iwidthlobe _

62

Listing 5.3.12 – Finding iwidthlobe _

void CSpanaView::FindLobe_width(int *local_max_location,int num_min,int *lobe_width_i)

{


{

lobe_width_i[i]=local_max_location[i+1]-local_max_location[i];

}

}

Listing 5.3.13 –Finding the four constraints void CSpanaView::FindMarker(float *marker_height,int *marker_location,int m_win_size,float *Delta,int

*num_marker)

{

……….

//Find the min(left_height,right_height)

FindHeight(local_max,*num_min,height_i,local_min);

//Find the difference of heights between two consecutive maxima

FindDiff_i(local_max,*num_min,diff_i);

//Find the loba width between two consecutive maxima

FindLobe_width(local_max_location,*num_min,lobe_width);

//Get the peakrato

Findpeak_ratio(*num_min,local_max,global_max,peak_ratio);

……….

}

To be a marker, the thi candidate needed to satisfy: 1. peak_ratio ≥ 0.8 2. max_3.0 globalheighti ×≥ 3. max_1.0 globaldiff i ×≤ 4. lagswidthlobe i 100_ ≤

63

Listing 5.3.14 – Filter the candidates with the constraints (SpanaView.cpp) void CSpanaView::FindMarker(float *marker_height,int *marker_location,int m_win_size,float *Delta,int

*num_marker)

{

………. if((peak_ratio[i]>=0.7)&&(height_i[i]>=0.3*(*global_max))&&(diff_i[i]<=0.1*(*global_max))&&(lobe_wi

dth[i]<=100))

{

num+=1;

marker_location[num-1]=local_min_location[i];

marker_height[num-1]=height_i[i];

}

……….

}

5.3.7 Weighting the markers with the normal distribution

The probability density function of the normal distribution with mean µ and standard deviation σ

is an example of a Gaussian function

2

2

2)(

21)( σ

µ

πσ

−−

=x

exf

Figure 5.3.14 – The graph of normal distribution

After computed all the markers of a frame, the next step was to weight the markers with a normal

distribution. Figure 1.8 showed the markers of a frame.

64

Figure 5.3.15 – AMDF and markers for a voiced frame

Substituted the marker into the Gaussian function to weight the marker with the normal

distribution. The marker with the highest height after weighting with the normal distribution

would be regarded as the pitch period of the frame.

Figure 5.3.16 – (A) Distribution approximation of the initial pitch period estimate (B) AMDF for a voiced frame. The dashed line showed the normal distribution approximation

It was noted that marker 2 was selected as the pitch period of the frame. However, after

weighting the markers with the normal distribution, marker 1 made a better candidate for the

pitch period of the frame.

Listing 5.3.15 – Weighting the markers with the normal distribution of the initial pitch period estimates (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)

{

……….

markers_weighted[i]=weight(marker_location[i],*std,*mean);

int temp_pitch_index=0;

for(i=0;i<*num_marker-1;i++)

{

25 50 75 100 125 150

AMDF

lags

A B f(x)

lags lags

AMDF

Gaussian function

65

if (markers_weighted[temp_pitch_index]>markers_weighted[i+1])

temp_pitch_index=temp_pitch_index;

else

temp_pitch_index=i+1;

}

m_pitch[f]=marker_location[temp_pitch_index];

……….

}

//find the probability of the normal distribution of a given x

float CSpanaView::weight(int x,float std,float mean)

{

float expvalue=0;

expvalue=(float)((1/pow(2*3.1415926*pow(std,2),0.5))*exp(-1*(x-mean)*(x-mean)/(2*std*std)));

return expvalue;

}

The marker with highest height would be selected as the pitch period of the frame

66

5.4 Adding a zoom function to view the speech signal in time domains

5.4.1 Introduction

The zoom function can zoom speech signal in time domain and its corresponding spectral

envelopes can also be seen in the zooming window. The zoom function was added under the

“Plot” “Zoom” menu.

5.4.2 Program flowchart

Figure 5.4.1 – (A) Flowchart of OnZoom( ) hanlder. (B) Flowchart of calculation of frequency spectrum and spectral envelopes.

Create and show the Zoom Scale

dialog

Set the Zoom Indicator to TRUE

Start of OnZoom( ) handler

End of OnZoom( ) handler

Compute the LPC and

autocorrelation.

Compute the frequency spectrum

Start of computation of the

frequency spectrum and the three

End of computation function

Compute the LPCC based spectral

envelope

Compute the spectral envelope by

FFT-based cepstral liftering

Convert from linear to dB.

67

Figure 5.4.2 – Flowchart of MouseMove( ) hanlder

Get the zoom factor to

compute the new frame size

Start of OnMouseMove( )

handler

End of OnMouseMove( )

Zoom Indicator is

TRUE?

Mouse's coordinate

within the boundary?

Compute the start play sample

index

Append zeros for FFT

Windowing

Start the computation of

frequency spectrum and the

three spectral envelopes

Start of the plotting function

YES

NO

YESNO

68

5.4.3 Creating and showing the Zoom scale dialog.

In the menu bar, “Plot” “Zoom” would initialize the handler, OnZoom( ). The OnZoom( )

function would create and show the Zoom scale dialog and set the Zoom Indicator to TRUE

which would be used as an indicator for the mouse-move handler, OnMouseMove( ).

Figure 5.4.3 – Zooming the speech signal

Listing 5.4.1 – Created and showed the Zoom Scale dialog (SpanaView.cpp) void CSpanaView::OnZoom()

{

……….

m_FastZoomDlg.Create(IDD_DISPLAY_ZOOM,this);

m_FastZoomDlg.ShowWindow(SW_SHOW);

m_ZoomIndicator=TRUE;

m_FastZoomDlg.SetIndicator(m_ZoomIndicator);

m_FastZoomDlg.Invalidate();

……….

}

The Zoom Scale dialog

The Zoom window

69

5.4.3 The OnMouseMove( ) handler.

When the mouse pointer moved across the document, the OnMouseMove( ) handler would be

initialized which would then run the PlotZoom( ) function

Listing 5.4.2 – PlotZoom( ) ran when mouse pointer moved (SpanaView.cpp) void CSpanaView::OnMouseMove(UINT nFlags, CPoint point)

{

……….

PlotZoom(point);

……….

}

5.4.4 Getting the zoom factor to compute the new frame size

When the PlotZoom( ) function was run, the first step was to get the zoom factor. If the Zoom

Indicator was TRUE and the mouse pointer’s x-coordinate was within the painting area, the

value returned from the slider in the Zoom Scale dialog will be assigned to zoom factor

Figure 5.4.4 – The Zoom Scale dialog

The Zoom Scale dialog

The slider scales the zoom factor

70

Listing 5.4.3 – Get the Zoom factor (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

m_ZoomIndicator=m_FastZoomDlg.GetZoomIndicator();

if (m_ZoomIndicator==TRUE)

{

……….

//check if the mouse's coordinate is out of the window

if((rect.right>point.x)&&(point.x>xoffs))

{

……….

SliderIndicator=m_FastZoomDlg.GetSliderIndicator();

if (SliderIndicator==FALSE)

factor=1;

else

factor=m_FastZoomDlg.GetFactorValue();

//find the new window size

Zoomwindowsize=sp.windowsize;

Zoomwindowsize=int(Zoomwindowsize*factor);

……….

}

}

}

The Zoom Indicator

71

5.4.5 Calculating the Start Play Sample index

Figure 5.4.5 – Calculating the Star Play Sample Index Listing 5.4.4 – Computed the Start Play Sample Number (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

// Calculate the Start Play Sample Number

m_bPlayIndex = (unsigned long)((sp.number_of_samples)/ (m_dfMaxX-2*xoffs)*(point.x-xoffs)+0.5);

……….

}

(0, 0)

point.x

xoffs

m_dfMaxX

Start Play Sample Index point.x xoffs

m_dfMaxX=

xoffs

_

_ 2

Num of samples x

72

5.4.6 Windowing the frame samples

Where signal_timeDomain = speech signal frameData = the speech signal after windowing

Figure 5.4.5 – Windowing the speech samples

Listing 5.4.5 – HAMMING windowing the speech samples (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

float *frameData=new float[sp.windowsize];

signal_timeDomain = (__int16 *)sp.spcdata+(long)m_bPlayIndex;

windowing(signal_timeDomain, frameData, Zoomwindowsize, HAMMING,

sp.norm_factor,sp.prem_factor);

……….

}

5.4.7 Appending zeros for FFT

In order to fit frameData into the FFT ( ) function, the length of frameData should be of power

of 2. However, sp.windowsize was of power of 2 while Zoomwindowsize not. Thus, frameData

should be of length equal to sp.windowsize. However, this would introduce some unknown

signal to frameData. The unknown signal was due to the fact that the memory locations beyond

Zoomwindowsize has not been assigned properly.

. . . windowing

. . .

Zoomwindowsize

. . . . . .

sp.windowsize

unknown

signal_timeDomainframeData

73

See the Figue 5.4.5 that the data beyond the Zoomwindowsize were unknown. These unknown

data were errors. To remove them, appended zeros to the memory locations beyond

Zoomwindowsize. It would be clearer to see Figure 5.4.6.

Figure 5.4.6 – Append zeros to memory locations beyond Zoomwindowsize

Listing 5.4.6 – Append zeros to fit the FFT( ) (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

//append zeros for fft

//since there is Zoomwindowsize of data, we need to append

//(sp.windowsize-Zoomwindowsize) zeros to framedata

for(int i=0;i<sp.windowsize-Zoomwindowsize;i++)

{

frameData[i+Zoomwindowsize]=0; }

……….

}

5.4.8 Computation of frequency spectrum and the spectral envelopes

5.4.8.1 Computed the frequency spectrum

The frequency spectrum could be got by transforming the windowed speech samples into

frequency domain.

Zoomwindowsize

. . . . . .

sp.windowsize

unknown

frameData

Append zeros

. . . . . .

sp.windowsize

frameData

All zeros

74

Figure 5.4.7 – Transforming the speech samples into frequency domain

Listing 5.4.7 –Transform the windowed speech samples into frequency domain (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

FFT(frameData, sp.winbuffer1, sp.windowsize);

……….

}

5.4.8.2 Computing the LPC envelope

Prior to the computation of LPC envelope, it was necessary to compute the autocorrelation of the

windowed signal. The second step was to use the result of autocorrelation, tempautocc, to

calculate the LPC coefficients, tempLPC. Finally, put tempLPC into spectral_envelope( ) to

compute the LPC envelope. The spectral envelope was stored in sp.winbuffer2.

Figure 5.4.8 – Flowchart of computing the LPC envelope

frameData FFT sp.winbuffer1

autocorrelation cal_lpc( )

frameData, sp.order, frameData

tempautocc

tempLPC

Spectral_envelope( )

sp.winbuffer2

75

Listing 5.4.8 – Computation of the LPC envelope (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

td_autoc(Zoomwindowsize,sp.order,frameData,tempautocc);

calc_lpc(sp.order,tempautocc,tempLPC,sp.K[index-1]);

spectral_envelope(sp.order, tempLPC, sp.windowsize,

sp.winbuffer2, gain, sp.dspflag);

……….

}

Listing 5.4.9 – Computation steps of autocorrelation (SpanaView.cpp) void CSpanaView::td_autoc(__int16 win_size, __int16 order, float *indata,float *autocc)

{

……….

for (k=0;k<=order;k++)

{

sum = 0.0;

for (m=0;m<win_size-k;m++)

sum += indata[m] * indata[m+k];

autocc[k] = sum;

}

}

5.4.8.3 Computing the LPCC-based spectral envelope the spectral envelope by

FFT-based cepstral liftering.

The computation of them was performed by the function, PlotLPCSpectralZoom( ). Since the

computation step in PlotLPCSpectralZoom(…) was same as the PlotSpectral(…), the

implementation of PlotLPCSpectralZoom(…) would not be discussed here. Please refer to

sections 5.1 and 5.2 for the details.

After computing the spectral envelopes, we could start plotting the envelopes. However, it was

necessary to convert data of frequency spectrum ,LPC envelope, LPCC-based spectral envelope

and spectral envelope by FFT-based cepstral lifting from linear to dB before plotting the

envelopes.

76

Where sp.winbuffer1 = frequency spectrum sp.winbuffer2 = LPC envelope lpCepSpecEnv_Zoom = LPCC-based spectral envelope CepSpecEnv_Zoom = spectral envelope by FFT-based cepstral liftering

Figure 5.4.9 – Converting data from linear to dB

Listing 5.4.10 – Convert the linear data to dB (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)

{

……….

PlotLPCSpectralZoom(Zoomwindowsize,tempLPC,frameData);

linear_to_log10(sp.winbuffer1, Zoomwindowsize/2, 1.0);

// convert linear data to dB

linear_to_log10(sp.winbuffer2, Zoomwindowsize/2, 1.0);


linear_to_log10(lpCepSpecEnv_Zoom,Zoomwindowsize/2,1.0);


linear_to_log10(CepSpecEnv_Zoom,Zoomwindowsize/2,1.0);

……….

}

5.4.9 The Zoom Scale Dialog

There was a slider in the Zoom Scale Dialog to scale the zooming factor which was used to scale

the window size of the zooming. There are four scales in the slider.

Figure 5.4.10 – Four scales of zoom were supported

10*log( )

sp.winbuffer1, sp.winbuffer2

lpCepSpecEnv_Zoom, CepSpecEnv_Zoom

sp.winbuffer1, sp.winbuffer2

lpCepSpecEnv_Zoom, CepSpecEnv_Zoom

Zoom increasing

Four scales

77

Figure 5.4.11 – The four scales of zoom

Listing 5.4.11 – Set the four scales (SpanaView.cpp)

void CFastZoomDlg::OnHScroll(UINT nSBCode, UINT nPos, CScrollBar* pScrollBar)

{

if (GetZoomIndicator()==TRUE)

{

m_SliderZoom.SetRange(1,4,TRUE);

78

m_SliderZoom.SetPageSize(1);

m_SliderZoom.SetTicFreq(1);

switch(m_SliderZoom.GetPos())

{

case 1: ZoomFactor=1;

break;

case 2: ZoomFactor=(float)0.8;

break;


break;


}

}

……….

}

79


5.6.1 Introduction

The frame of samples that the Fast Display dialog is displaying out is determined by the location

of the vertical line in the main window as shown in Figure 5.6.1. Thus, we could display the next

frame of samples by reacting with red vertical line. To do this, it was a must to get the current

location of the red vertical line. This functionality can be added by adding a handler,

OnKeyDown( ), for the KeyDown event. The main purpose of the OnKeyDown( ) function was

to get the frame index so that the Fast Display dialog could show the required frame of samples

by using the frame index.

Figure 5.6.1 – The red vertical line

The red vertical line

80


Figure 5.6.2 – Flowchart of the OnKeyDown( ) function

Set the new location for the red

line

Start the computation and

painting process

Start of OnKeyDown( )

key =

VK_RIGHT ?

Current frame

index selected?

key = VK_LEFT?

Frame index decrements

Frame index increments

End of OnKeyDown( )

NO

YES

NO NO

YES YES

81

5.6.3 Getting the current frame index and next frame index

Current frame index:

In order to select the next frame of samples for displaying in the Fast Display dialog, it was

necessary to know current frame index. The reason was due to the fact that any increment or

decrement of the frame index must be based on the current frame index. Otherwise, it was

impossible to know which frame of samples that the users want to display. The current frame

index is selected by clicking the left button of mouse at the waveform of the speech signals

Please refer to section 5.1.3 for the details. An indicator, m_bCheckMouse, was used to verify if

the current frame index was selected. If selected, it will be set to TRUE. There would be no

response to the key “ ” or “ ” if the indicator was set to FALSE.

New frame index:

There would be no any response to any key pressed in keyboard unless the key pressed was “ ”

or “ . If right key is passed, the frame index would increment for the key “ ” and decrement

for the key “ ”. Then, the frame index would be passed to the class of the Fast Display dialog

for further computation.

5.6.4 Setting the red vertical line to new position

The red line should be moved to right if key “ ” has been pressed and to left if key pressed is

“ ”. The red vertical line position was determined by the value of the variable, x_indicator,

which is the x-coordinate of the red vertical line in the main window. Figure 5.6.3

82

Figure 5.6.3 – Calculation of the new location of the red vertical line Listing 5.5.1 – Getting the new frame index and setting the red vertical line to new position

(SpanaView.cpp) void CSpanaView::OnKeyDown(UINT nChar, UINT nRepCnt, UINT nFlags)

{

……….

if(nChar==VK_RIGHT||nChar==VK_LEFT)

{

if(nChar==VK_RIGHT)

{

x_indicator=x_indicator+x_movement_step;

new positon

x_movement_step original position

x indicator

m_dfMaxX

xoffx

-x_movement_step

m_dfMaxX xoffx 2

Number of frames =

x indicator new = x indicator x movement step +

new position

83

m_FastDisplayDlg.win_index = m_FastDisplayDlg.win_index +1;

}

else

{

x_indicator=x_indicator-x_movement_step;

m_FastDisplayDlg.win_index = m_FastDisplayDlg.win_index -1;

Invalidate();

}

………

}

}

84


5.6.1 Introduction

When the poles and zeros in Z-Plane and sensitivity of LP parameters are adjusted, the

LPCC-based spectral envelope would change. The reason was that adjusting these parameters

would change the values of LP coefficients. Therefore, what I had done was to get the new LP

coefficients and use this data for the computation of the new LPCC-based spectral envelope.

5.6.2 Program flowchart

Figure 5.6.1 – Program flow of reacting the change in poles, zeros or sensitivity of LP parameters by changing the LPCC-based spectral envelope

Any change in zeros, poles or

sensitivity of LP parameters

Compute a new set of LP

coefficients

Compute the new LPCC-based

spectral envelope

Plot the new LPCC-based spectral

envelope

End

85

5.6.3 Computing a new set of LP coefficients

There had been events handlers created to handle the changes in poles, zeros and the sensitivity

of LP parameters in previous version Spana, so I needed not create any handler to handle these

events. Since the handlers would compute a new set of LP coefficients, I was not required to add

codes to these handlers to do so. Instead, what I needed to do was to get the new set of LP

coefficients. After the computation of Ta new set of LP coefficients, the array, sp.LPC[index-1],

would be updated with these LP coefficients. Therefore, the new set of LP coefficients could be

referenced by the following code:

………. sp.LPC[index-1]; //LP coefficients ……….

5.6.4 Computing the new LCPP-based spectral envelope

After updating the array, sp.LPC[index-1], with the new set of LP coefficients, we could start the

computation of the new LPCC-based spectral envelope. The computation was done in the

PlotSpectralEnvelope( ) function.

Listing 5.6.1 – Computation of the new LPCC-based spectral envelope (SpanaView.cpp)

void CSpanaView::PlotSpectralenvelope()

{

……….

gain=calc_gain(sp.autocc[index-1], sp.LPC[index-1], sp.order);

FFT(sp.w[index-1], sp.winbuffer1, sp.windowsize);

spectral_envelope(sp.order, sp.LPC[index-1], sp.windowsize, sp.winbuffer2, gain, sp.dspflag);

PlotLPCSpectral();

linear_to_log10(sp.winbuffer1, sp.windowsize/2, 1.0);


linear_to_log10(sp.winbuffer2, sp.windowsize/2, 1.0);


linear_to_log10(lpCepSpecEnv_buffer,sp.windowsize/2,1.0);


86

linear_to_log10(CepSpecEnv,sp.windowsize/2,1.0); // convert linear data to dB

……….

}

5.6.5 Plotting the new LPCC-based spectral envelope

Plotting of the new LPCC-based spectral envelope was also completed in the

PlotSpectralEnvelope( ) function.

Listing 5.6.2 – Plotting the new LPCC-based spectral envelope (SpanaView.cpp) void CSpanaView::PlotSpectralenvelope()

{

………. // new (lpCepSpecEnv_buffer)start here

CPen pen_lpCepSpecEnv(PS_SOLID, 1, RGB(0,0,255));


dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv_buffer[0]-miny)*stepy));

x_coor=0;


{

dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv_buffer[i]-miny)*stepy));

}

……….

}

When there are changes in poles, zeros or the sensitivity of LP parameters next times, above

process will be repeated.

87

Chapter 6 Results and Discussion

6.1 Plotting of the LPCC-based spectral envelope and the spectral envelope

by FFT-based cepstral liftering

Results:

Figure 6.1.1 – The spectral envelopes including LPCC-based spectral envelope (Blue), spectral envelope by FFT-based cepstral liftering (Pink) and LPC spectral envelope (Red) Discussion

It can be seen from Figure 6.1.1 that the three spectral envelopes are very close to each other. By

using this function, students can have a look on the relationship between these spectral envelopes.

They can also verify that LPCC-based spectral envelope and spectral envelope by FFT-based

cepstral liftering can model the frequency spectrum. Therefore, it is easier to tell them that the

two spectral envelopes can help in locating the formants.

88

6.2 Plotting of Pitch Contour

Results: (A)

(B)

(C)

Figure 6.2.1 – The pitch contours for the speech “seven” from (A) WaveSurfer 1.6.0, (B)

Spana (current version), (C) Spana (previous version)

Period in ms

89

(A)

(B)

(C)

Figure 6.2.2 – Pitch contours for the speech “welcome” from (A) WaveSurfer 1.6.0, (B)

Spana (current version), (C) Spana (previous version)

90

Discussion

By observing Figures 6.2.1 and 6.2.2, it is found that the envelopes of pitch contours from

current version Spana are closer to that from WaveSurfer than the envelopes of pitch contours

from the previous version Spana. Thus, it can be concluded that the performance of plotting the

pitch contour in current version has been enhanced.

91

63 Zooming the speech signals in time domain

Results:

Figure 6.3.1 – The zooming window

Figure 6.3.2 – Zooming in greater scale

92

Discussion

By seeing Figure 6.3.1, you will find that blow-up of waveform become possible with the

zooming function. Figure 6.3.2 shows that zooming scale can be changed. Another important

feature of the zooming function is that the view in the zooming window will change accordingly

with the mouse pointer. This is convenient to users because users need not to select a portion of

waveform to zoom and then press the zoom button in order to zoom the speech signals.

93


Results:

Figure 6.4.1 – Original view in the Fast Display dialog (frame 45)

Figure 6.4.2 – The next view in the Fast Display dialog (frame 46) when the key “ ” was pressed once

94

Discussion

In previous version Spana, if the user wants to view frame 41 in the Fast Display dialog, she

must use the mouse pointer to locate frame 41 and then click the mouse’s left button at that

location. If she wants to shift the view to the next frame, she must repeat above process.

However, she must repeat above process 100 times if she wants to view the entire speech which

contains 100 frames of signals. Thus, it is not convenient for her to do so. The Interactive Fast

Display feature provides users a more convenient way to shift the view to the next frame of

signals by using the “ ” or “ ” keys on keyboard.

95


Figure 6.5.1 – Changing the LPC spectral envelope and LPCC-based spectral envelope by moving the poles on Z-Plane

Figure 6.5.2 – Changing the LPC spectral envelope and LPCC-based spectral envelope by adjusting the zeros on Z-Plane

96

Figure 6.5.3 – Change the LPC spectral envelope and LPCC-based spectral envelope by adjusting the sensitivity of LP parameters

Discussion

By see Figures 6.5.1, 6.5.2, 6.5.3, it should be observed that the LPC spectral envelope and

LPCC-based spectral envelope can be changed by adjusting the poles and zeros on Z-Plane and

the sensitivity of LP parameters. This interactive function can help student know how the zeros,

poles and sensitivity of LP parameters affect the LPC spectral envelope and LPCC-based spectral

envelope.

97

Chapter 7 Conclusion and Recommendations

7.1 Conclusion

This project is aimed at enhancing Spana by adding new features so that it is more useful in

helping student to learn the abstract concepts of speech analysis. The new features that had been

stated in Chapter 2 include Plotting of LPCC-based spectral envelope, spectral envelope by

FFT-based cepstral liftering and Pitch Contour, addition of zooming function to view speech

signal in time domain and selection of frame index by keyboard. All these functions are

completed. Therefore, the objective of this project has been met.

For the pitch detection algorithm, there was still error even though the performance of plotting

the pitch contour had been enhanced. Experiments at extending the probabilistic approach

indicate that the error in pitch detection can be further reduced by using a finer approximation of

the normal distribution [9]. By including both distributions, it is expected that a more desirable

result of pitch detection can be obtained.

During the development of this project, it was found that it is difficult to justify whether a new

function written by myself worked or not. However, this problem can be solved by comparing

the results of the new function with that obtained in MATLAB. It is because MATLAB has

enormous built-in functions. For example, if you want to justify whether the FFT function

written by you is correct, what you need to do is simply call the FFT function in MATLAB and

compare the results between your function and the function in MATLAB.

98

7.2 Recommendations for further work

For the zooming function, there is a little flicker to the Fast Zoom dialog during the blow-up of

speech waveform. This may be irritating to users. It is because the paint in the Fast Zoom dialog

will update when the mouse pointer is moving across the waveform of the speech signal.

Hopefully, this problem can be solved in later version of Spana.

The zooming function in this version of Spana can only zoom the speech signal in time domain.

Due to the limitation of time, zooming the speech signal in spectral domain is not supported in

this version and it is suggested for further work.

Ideas never stop and there is room for enhancement to Spana. Other new features can be added

including spectrogram analysis and plotting of MFCC envelope. The interface of Spana can also

be improved such as displaying different characteristics of the speech signal in the same time by

using Multiple Document Interface.

99

References

[1] http://www.codeproject.com/bitmap/gditutorial.asp [2] http://www.codeguru.com/ [1] S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning Approach,

Prentice Hall, to appear [2] Deller, J.R. et al. Discrete-Time Processing of Speech Signals, Macmillan Pub. Company,

2000. [3] Kondoz A.M., Digital Speech: Coding for Low Bit Rate Communications Systems, J. Wiley,

1994. [4] Rabiner, L, J. and Juang, B.H. Fundamentals of Speech Recognition, Prentice Hall, 1993. [5] Prosise, J., Programming Windows with MFC 2nd Edition, Microsoft Press, 1999. [6] Kruglinskl, D.J., Wingo, S. and Shepherd G., Programming Visual C++ 5th Edition,

Microsoft Press, 1998. [7] Kain, E., The MFC Answer Book – Solutions for Effective Visual C++ Applications, Addison

Wesley, 1998 [8] Barnwell III, T. P. et al. Speech Coding: A Computer Laboratory Textbook, Join Wiley & Sons, Inc.,

1996.

[9] Ying, G.S.; Jamieson, L.H.; Michell, C.D.; A probabilistic approach to AMDF pitch detection, Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on , Volume: 2 , 3-6 Oct. 1996 Page(s): 1201 -1204 vol.2

spana – development of multimedia tool for …mwmak/programs/spanaprojectreportv...project title...

Documents