spana – development of multimedia tool for …mwmak/programs/spanaprojectreportv...project title...
TRANSCRIPT
Project Title
SPANA – Development of Multimedia Tool for Learning Speech
Analysis
Supervisor: Dr. M.W. Mak Student Name: Sit Chin Hung Student ID: 00146713D Period: Aug 2003 – Apr 2004
2
Abstract Digital speech processing has wide applications in modern day such as mobile phone
communication, voice recognition and voice verification systems. Speech analysis is the most
fundamental to these applications. In order to help students learn the abstract concepts in speech
analysis, a software package tool, SPANA, was developed.
For this version of Spana, six functions, namely, Plotting of Pitch Contour, Plotting of
LPCC-based spectral envelope, Plotting of spectral envelope by FFT-based cepstral liftering,
Zooming of the speech signal in time domain, Interactive Fast Display and Interactive Spectral
Plot were added.
For the Plotting of Pitch Contour, AMDF method was used in pitch detection In order to reduce
error in pitch detection, a probabilistic approach was applied.
Plotting of LPCC-based spectral envelope and spectral envelope by FFT-based cepstral liftering
were integrated to the PlotSpectralEnvelope( ) function which was responsible for plotting the
frequency spectrum and LP spectral envelope. As a result, the PlotSpectralEnvelope( ) could plot
all the envelopes stated above on the same screen.
The Zoom function was accomplished by creating an event handler, OnMouseMove( ), to handle
the “mouse move” event. Therefore, blow-up of the waveform of the speech signal in time
domain could be done when the mouse pointer was moved across the waveform of the speech
signal
Interactive Fast Display was completed by adding an KeyDown event handler, OnKeyDown( ),
which was used to response the keys pressed in keyboard.
3
Interactive Spectral Plot was done by reacting users’ changes in poles, zeros and sensitivity of LP
parameters by changing the LPCC-based spectral envelope
Spana was developed under MS Visual C++ environment with MFC
It is believed that the addition of these functions to Spana has taken a step forward in making
Spana more user-friendly and helping students to learn speech analysis more easily.
4
Acknowledgments I would like to offer my special thanks to my supervisor Dr M. W. Mak for his valuable advice
and useful materials on handling the project. I was impressed by his willingness to give his time
so generously to guide me to find the solution instead of giving me the solution directly.
I would also like to extend my thanks to the technicians of the laboratory of the EIE department
for their help in offering me the resources in the development of the project.
5
Table of Content Chapter 1 Introduction ..................................................................................................... 8
1.1 Background .......................................................................................................................8 1.2 Objectives..........................................................................................................................8 1.3 Organization ......................................................................................................................9
Chapter 2 Project Specification ......................................................................................... 10 2.1 Plotting the LPCC-based spectral envelope ..........................................................10 2.2 Plotting the spectral envelope by FFT-based cepstral liftering .............................10 2.3 Plotting of Pitch Contour....................................................................................... 11 2.4 Adding a Zoom function to view the speech signal in time domain..................... 11 2.5 Interactive Fast Display......................................................................................... 11 2.6 Interactive Spectral Plot ........................................................................................12
Chapter 3 Theories of Speech Analysis ......................................................................... 13 3.1 Pitch Estimation ....................................................................................................13 3.2 Smoothing the LP-based spectral envelopes by cepstral processing ....................14 3.3 FFT-based cepstral liftering...................................................................................15
Chapter 4 Window Programming .................................................................................... 17 4.1 Introduction ...........................................................................................................17
4.1.1 The Windows Programming Model ..............................................................17 4.1.2 Microsoft Foundation Class (MFC) ..............................................................18 4.1.3 The Document/View Architecture.................................................................18
4.2 Guide to create a simple windows program using MFC.......................................20 4.2.1 Create a Single Document Interface (SDI) Application................................20 4.2.2 Add a menu entry to Menu............................................................................21 4.2.3 Adding function to the Menu ........................................................................23 4.2.4 Drawing a line ...............................................................................................25
4.3 GDI, Memory DC and Bitmap..............................................................................25 Chapter 5 Methodology .................................................................................................. 28
5.1 Plotting the LPCC-based spectral envelope ..........................................................28 5.1.1 Introduction ...................................................................................................28 5.1.2 Program Flowchart........................................................................................28 5.1.3 Getting the frame index.................................................................................29 5.1.4 Windowing the samples of the selected frame..............................................31 5.1.5 Computing the LPC coefficients ...................................................................32 5.1.6 Computing the Makhoul’s “a” ......................................................................32 5.1.7 Computing the LPC gain...............................................................................33 5.1.8 Computing the cepstral coefficients..............................................................34 5.1.9 Appending zero quefrency ............................................................................35 5.2.9 Computing the smooth spectrum from LP-derived cepstrum .......................36
6
5.2 Plotting the spectral envelope by FFT-based cepstral liftering .............................38 5.2.1 Introduction ...................................................................................................38 5.2.1 Program flowchat ..........................................................................................38 5.2.2 Computing the frame index and then the windowed signals of a frame .......39 5.2.3 Computing the Short-Term Real Cepstrum (stRC).......................................39 5.2.4 Peform liftering, cut-off time at pOrder ........................................................41 5.2.5 Computing spectral envelope based on stRC................................................41 5.2.6 Flow chart of the plotting function................................................................43 5.2.7 The plotting function.....................................................................................44 5.2.8 Starting to plot the two spectral envelopes....................................................44
5.3 Plotting the Pitch Contour .....................................................................................47 5.3.1 Introduction ...................................................................................................47 5.3.2 Program Flowchart........................................................................................47 5.3.3 Computation of the Mean and Standard Derivation......................................48
5.3.3.1 Lowpass filtering the entire speech signal ............................................49 5.3.3.2 HANNING windowing the speech samples..........................................52 5.3.3.3 Computing the zero crossing rate..........................................................52 5.3.3.4 Computing the pitch period estimates using AMDF.............................53 5.3.3.5 Mean and standard derivation of the pitch period estimates.................54
5.3.4 Computing all candidate pitch periods for the selected frame ......................55 5.3.5 Computing the zero crossing rate..................................................................57 5.3.6 Filtering out the markers with the constraints...............................................57
5.3.6.1 Finding the constraints ..........................................................................59 5.3.7 Weighting the markers with the normal distribution ....................................63
5.4 Adding a zoom function to view the speech signal in time domains....................66 5.4.1 Introduction ...................................................................................................66 5.4.2 Program flowchart.........................................................................................66 5.4.3 Creating and showing the Zoom scale dialog ...............................................68 5.4.3 The OnMouseMove( ) handler. .....................................................................69 5.4.4 Getting the zoom factor to compute the new frame size...............................69 5.4.5 Calculating the Start Play Sample index .......................................................71 5.4.6 Windowing the frame samples......................................................................72 5.4.7 Appending zeros for FFT ..............................................................................72 5.4.8 Computation of frequency spectrum and the spectral envelopes..................73
5.4.8.1 Computed the frequency spectrum........................................................73 5.4.8.2 Computing the LPC envelope ............................................................74 5.4.8.3 Computing the LPCC-based spectral envelope the spectral envelope by FFT-based cepstral liftering ..........................................................................75
5.4.9 The Zoom Scale Dialog.................................................................................76 5.6 Interactive Fast Display.........................................................................................79
7
5.6.1 Introduction ...................................................................................................79 5.6.2 Program Flowchart........................................................................................80 5.6.3 Getting the current frame index and next frame index..................................81 5.6.4 Setting the red vertical line to new position..................................................81
5.6 Interactive Spectral Plot ........................................................................................84 5.6.1 Introduction ...................................................................................................84 5.6.2 Program flowchart.........................................................................................84 5.6.3 Computing a new set of LP coefficients........................................................85 5.6.4 Computing the new LCPP-based spectral envelope .....................................85 5.6.5 Plotting the new LPCC-based spectral envelope .................................................86
Chapter 6 Results and Discussion.................................................................................. 87 6.1 Plotting of the LPCC-based spectral envelope and the spectral envelope by FFT-based cepstral liftering...................................................................................................87 6.2 Plotting of Pitch Contour.......................................................................................88 63 Zooming the speech signals in time domain .........................................................91 6.4 Interactive Fast Display.........................................................................................93 6.5 Interactive Spectral Plot ........................................................................................95
Chapter 7 Conclusion and Recommendations .............................................................. 97 7.1 Conclusion.............................................................................................................97 7.2 Recommendations for further work ......................................................................98
References 99
8
Chapter 1 Introduction
In order to develop and build applications using digital speech processing technology such as
mobile phone communication, speech synthesis and speech recognition, we have to understand
the characteristics of speech signal. Speech analysis refers to the analysis and extraction of
characteristics of speech signal. For this goal, a speech analysis learning tool, SPANA, was
therefore developed to help students learning speech analysis.
The project began in Aug 2003 and completed in April 2004.
1.1 Background
SPANA has been developed a few years ago and was kept enhancing. It has been developed in
Visual C++ environment using MFC. SPANA is run on Windows 32 application.
1.2 Objectives
SPANA has been developed a few years ago. The objective of this project is to make
enhancement to the SPANA, for example, integration of new features to SPANA. The new
features in SPANA included Plotting of Pitch Contour, Plotting of LPCC-based Envelope,
Plotting of spectral envelope by FFT-based cepstral liftering, Zooming of the speech signal in
time domain, Interactive Fast Display and Interactive Spectral Plot.
There are many features included in past versions of SPANA including Spectrogram Display,
line spectrum pair analysis; average energy and zero crossing measurements.
This report covers the theories of both speech analysis and Windows programming. Therefore, it
9
is suggested to have a fundamental knowledge in both speech analysis and window programming
in order to have a better understanding of this project.
1.3 Organization
The introduction to the background and the objectives of this project are presented in this chapter.
The rest of the dissertation is organized as follows.
Chapter 2 presents the specifications of the project.
Chapter 3 gives the information about the speech analysis theories that was involved in this
project.
Chapter 4 provides information about window programming. Since MFC was used in this project,
a brief introduction of MFC is included in this chapter including how to create a Single
Document Interface, how to add function for a menus and how to paint using Device Context
etc.
Chapter 5 describes the methodologies used in the development of this project. It includes the
flow charts of the algorithms involved in the project, procedures of the implementation and
program codes of the procedures.
Finally, conclusions are presented in Chapter 6 together with recommendations for further work.
10
Chapter 2 Project Specification
The specifications of this project are as follows:
1. Plotting the LPCC based spectral envelope
2. Plotting the spectral envelope by FFT-based cepstral liftering
3. Plotting of Pitch Contour
4. Adding a zoom function to view the speech signal in time domains
5. Interactive Fast Display
6. Interactive Spectral Plot
2.1 Plotting the LPCC-based spectral envelope
The LPCC-based spectral envelope refers to the envelope that is obtained by smoothing the
LP-based spectral envelopes by cepstral processing. The advantage of LPCC-based spectral
envelope is that it can provide a more consistent representation of a speaker’s vocal tract
characteristics. The envelope created from LP-derived cepstral coefficients (LPCCs) can track
the peaks of the speech spectrum and hence it can be used as a feature for speaker recognition.
2.2 Plotting the spectral envelope by FFT-based cepstral liftering
Spectral envelope by FFT-based cepstral liftering could be obtained by carrying out cepstral
liftering and then FFT to the Short-Term Real Cepstrum (stRC). Cepstral liftering is analogous to
filtering in the usual frequency domain. The spectral envelope can be applied in formant
estimation and pitch detection. The LPCC-based spectral envelope and spectral envelope by
FFT-based cepstral liftering were painted on the same screen so that their relationship could be
seen.
11
2.3 Plotting of Pitch Contour
The pitch contour of a speech shows the pitch period for every frame of voiced sound. This lets
users view the pitch periods at every frame of voiced sound.
2.4 Adding a Zoom function to view the speech signal in time domain
The Zoom function should zoom the speech signal in time domain. Scaling of waveform
blow-up is supported. In other words, users can tune the scale of zooming by adjusting the scale
slider. There are four scales in the slider. When users move the mouse pointer, the zooming
window would shift accordingly with the mouse pointer.
The zoom window contains two parts, the upper part of the window will zoom the speech signal
in time domain while the lower part will display the spectrum of the signal within the zoom
window and its corresponding LPC envelope, LPCC-based spectral envelope and spectral
envelope by FFT-based cepstral liftering
2.5 Interactive Fast Display
In this version of Spana, users can react interactively with the Fast Display dialog by using the
keyboard. That means the red vertical line (indicating the frame that is displaying in the Fast
Display dialog) will shift accordingly with the keys “ ” or “ ”. For key “ ”, the red vertical
line will shift to right while “ ” will shift it to left. For example, if the frame that is currently
displayed in Fast Display dialog is 10 and the user presses “ ” once, the red vertical line will
shift to right and the frame that will be displayed in Fast Display dialog is 11.
12
2.6 Interactive Spectral Plot
During the display of the spectral envelopes, users can change the LPCC-based spectral envelope
simultaneously by adjusting the following parameters:
i. Poles in LP Pole Control dialog (Figure 2.1 A)
ii. Zeros in LSP Control dialog (Figure 2.1 B)
iii. Sensitivity of LP parameters in the Sensitivity of LP Parameters dialog (Figure 2.1 C)
(A) (B)
(C)
Figure 2.1 – (A) LP Pole Control dialog, (B) LSP Control dialog,
(C) Sensitivity of LP Parameters dialog
13
Chapter 3 Theories of Speech Analysis
The following theories of speech analysis were applied in this project.
1. Pitch Estimation
2. Smoothing the LP-based spectral envelopes by cepstral processing
3. Cepstral liftering
3.1 Pitch Estimation
Basic algorithm for AMDF
For each frame k, the short-term difference function AMDF is defined as follows:
∑=
+−=N
innn jixix
NjAMDF
1
|,)()(|1)( MAXLAGj ≤≤1 (3.1)
Where MAXLAG is the maximum number of AMDF values generated in each frame. The
difference function would have a local minimum if the lag j is equal to or very close to the
fundamental period. Thus, for each frame, the lag for which the AMDF is a global minimum is a
strong candidate for the pitch period of that frame [9].
Problem for this algorithm:
The disadvantage of this algorithm is that the minimum in each frame is strongly affected by the
intensity variation and the background noise of the speech signal. In order to reduce the errors
due to the problem mentioned above, a global error correction routine is required for the pitch
detection system to locate the incorrect estimates and correct the errors [9].
14
3.2 Smoothing the LP-based spectral envelopes by cepstral processing
The linear prediction (LP) analysis is based on the assumption that the current sample of speech
signals s(n) can be predicted from the past P speech samples. This can be illustrated by the
following equation:
∑ −−=≈P
kk knsansns )()(~)( (2.2)
where Pkka 1}{ = are called the LP coefficients. Another assumption is that the excitation source
Gu(n), where G is the gain and u(n) is the normaliszed excitation, can be separated from the
vocal tract. By using these two assumptions, the vocal track can be represented by an IIR filter of
the form:
∑=
−+== P
k
kk zanGu
zSzH
11
1)(
)()( (2.3)
The time-domain representation of the output s(n) of this IIR filter is a linear regression of its
past output values and the present input Gu(n):
∑=
+−−=P
kk nGuknsans
1)()()( (2.4)
The LP analysis is aimed at computing a set of LP coefficients },...,{ 1 Paaa = for each frame of
speech. As a result, the frequency response of Eq. 2.3 is as close to the frequency spectrum of the
speech signal as possible. Therefore, vocal track of a speaker can be modeled by using the LP
coefficients [1].
However, although LP coefficients represent the spectral envelope of the speech signals, it was
found that a more consistent representation of a speaker’s vocal tract characteristics can be
obtained by smoothing the LP-based spectral envelopes by cepstral processing. The cepstral
15
coefficients nc can be computed from LP coefficients ka as follows:
Gc ln0 =
∑−
=−
−−=
1
1
n
kknknn ac
nkac Pn ≤≤1
∑−
=−
−=
1
1
n
kknkn ac
nkc Pn >
where G is the estimated model gain and P is the prediction order. Fig.2.2 shows the process of
computing the LP-based cepstral parameters. Since the parameters are derived from LP analysis,
they are called LP-derived cepstral coefficient (LPCCs) [1].
Figure 3.1 – Computation of LPCCs from speech signals
Since the envelope created by the LPCCs can track the peaks of the speech spectrum, LPCCs can
be used as a feature for speaker recognition.
3.3 FFT-based cepstral liftering
The spectral envelope by FFT-based cepstral liftering is obtained by carrying out cepstral
liftering and then FFT (Fast Fourier Transform) to the Short-Term Real Cepstrum (stRC). Figure
2.2 shows the computation of the spectral envelope by FFT-based cepstral liftering.
Windowing and Frame Blocking
Pre-emphasis
Cepstral Transformation
LP Analysis
LPC vectorCepstral vector
speech
(2.5)
(2.6)
(2.7)
16
Figure 3.2 – Computation of the spectral envelope by FFT-based cepstral liftering
Liftering
Linear filtering refers to filtering in quefrency domain. Therefore, low-time lifter is analogous to
a lowpass filter in the usual frequency domain.
Figure 3.3 – Liftering Where
Zero padding
FFT Log( )
IFFT FFT Low-time lifter
s(n)
s(n) stRC
l(n) = 1, n=0, 1, …, L
0, other than n
Spectral envelope
l(n) stRC
liftering
L
1
17
Chapter 4 Window Programming
4.1 Introduction
4.1.1 The Windows Programming Model
DOS application uses a procedural programming model while windows programming is based
on event-driven model. In windows programming model, there is a message queue storing the
events to be handled later (Fig 4.1). An event can be a mouse move, a mouse click or minimizing
a window frame etc. When there is an event happened, for instance, mouse pointer is moved over
the window frame, the corresponding message, WM_MOUSEMOVE, would be generated.
When the message enters the message queue, it will be passed to the message loop and
dispatched to the corresponding message handler and the procedures included in the handler
would be run accordingly. The message handler for WM_MOUSEMOVE is OnMouseMove().
Fig 4.1 depicts the Windows programming model.
Figure 4.1 – Windows programming model
18
4.1.2 Microsoft Foundation Class (MFC)
If we want to create software with Graphical User Interface (GUI), Windows Application
Interface (API) for windows can help us. However, there are many Windows API for Windows
programming. Moreover, it is quite time consuming to develop software if we do all the work by
calling Windows API directly. MFC is a library that provides multiple levels of support to
developers. At one level, it provides a C++ class library that encapsulates the Windows API.
Many of these classes encapsulate intrinsic Windows objects and their associated functions,
allowing the developers to work at a somewhat higher level of abstraction than is experienced
using the raw API. For example, to create a simple window in MFC, you declare an instance of
the CWnd class and call its Create() function. All of the steps that are required to create a
window (like defining a WndProc, registering a window class, etc.) are now provided by the
CWnd class implementation
MFC is used when include the header file “Afxwin.h” in the application.
4.1.3 The Document/View Architecture
The document view architecture has been introduced since MFC 2.0. With this architecture,
when you create a Single Document Interface (SDI) application, there would be four specific
classes created to make up an SDI application
-The CWinApp-derived class
-The CFrameView-derived class
-The CDocment-derived class
-The Cview-derived class
The CWinApp class receives all the event messages and then passes the messages to the
CFrameView and CView classes.
19
The CFrameView class is the window frame. It is responsible for holding the menu, toolbar,
scrollbars, and any other visible objects attached to the frame. It is also for the determination
how much of the document is visible at any time.
The CDocument class houses your document. It is responsible for the storage and manipulation
of data that makes up the document. The class receives input from the CView class and passes
display information to the CView class. Moreover, retrieving the document data from files is
done by this class.
The CView class is for the display of the visual representation of the document for the user. It is
responsible for passing input information to the CDocument class and receiving display
information from the CDocument class.
It should be noted that only one document can be opened at a time in an SDI application. On the
other hand, a multiple document interface (MDI) application allows the existence of multiple
documents with multiple views to each of the document and the frame window object is to host
those views.
Fig 4.2 – Data flow of the Document/View Architecture
Application object (CWinApp)
Document object
(CDocument)
Messages passed to the
frame window and view
Two-way flow of information between
the document and the view objects
20
Fig 4.2 shows a simple data flow in the document/view architecture. There is a message loop in
the application object to retrieve the event-driven message. The application object (CWinApp)
would act as a receiver to receives all the event messages and then passes the messages to the
view object. The view object requests data from the document object while the document object
would response by providing the necessary data to render the output in the view object.
There are many advantages of using the document/view architecture. We can centralize the data
source such that it is possible to view the same data with multiple views, one in the form of a
table while another in the form of a chart. Moreover, when there is a modification to the data in
any one of the view, the data in other views can be easily be synchronized by calling the
UpdateAllView() function.
Another important feature of MFC is command routing [5]. The command routing mechanism
enables the command message almost anywhere in the application.
4.2 Guide to create a simple windows program using MFC
4.2.1 Create a Single Document Interface (SDI) Application
The following step shows how to create a new SDI application. Let us start a new project by
selecting “File” “New”.
1.) In the Project Tab, select “MFC AppWizard (exe)”.
2.) Type the project name and project location. Click “OK”.
3.) Select “single document” and check the “Document/View Architecture Support”.
Click “Next”.
4.) Select “None” for no database support and then click “Next”.
5.) Select “None” for no compound document support and then click “Next”.
6.) Click the expected features of user interface and then click “Next”.
21
7.) Click “Next” again and then click “Finish”.
A workspace is created and we can now develop our application through this framework.
4.2.2 Add a menu entry to Menu
The following steps show you how to add a menu entry to Menu
1. Select the Resource View tab in the workspace pane
2. Select the project resources folder at the top of the tree;
3. Click the “+” of the “Menu” folder
4. Double-click the “IDR_MAINFRAME”, as shown in Figure 4.3.
Figure 4.3 – The Insert Resource dialog
6. Click the last rectangular box (the red circle) and input “Test”, then press “Enter”.
7. There will be a rectangular box appear below “Test”, click to highlighted it and input “Draw
Line” as shown in Figure 4.4.
22
Figure 4.4 – Enter menus entity
8. Right click “Draw Line” and select “Properties” in the pop-up menu.
9. Input all the parameters, as shown in Figure 4.5, and press “Enter”.
Figure 4.5 – The Menu Item Properties dialog
10. The menu entry has been created as shown in Figure 4.6
23
Figure 4.6 – The new menu entry
4.2.3 Adding function to the Menu
Windows program is event-driven. When we select a menu item, a message for this message
would be generated and would be sent to message queue to invoke an operation. The operation
for the event depends on the codes in the message handler. Thus, we have to add necessary codes
to the handler.
1.) Select “View” “ClassWizard”.
2.) Select “ID_MENUDrawLine” in the “Object ID” column and “Command” in “Messages”
column.
3.) Click “Add Function” to add the handler for the ID selected and click “OK”.
4.) Click “Edit Code” button to add the required program codes now. See Figure 4.6.
24
Figure 4.7 – MFC ClassWizard dialog for adding function
5. Now you can add the codes to the handler, OnMenuDrawLine( ), as shown in Figure 4.8.
Fig 4.8 – Adding code to the handler
25
4.2.4 Drawing a line
In Windows programming, drawing graphics is done through the device context (DC). In Visual
C++, the MFC device context provides numerous drawing functions for drawing circles, squares,
lines, curves, and so on. The operating system uses the device context to learn in which context a
graphic is being drawn, how much of the area is visible, and where on the screen it is currently
located. In MFC, the drawing functions are wrapped by the CDC class. To draw a line, we
should use the MoveTo() function to move to a starting point and then use LineTo() function to
draw a line to the destination point. We can use the following codes to draw a line.
Listing 4.1 – Draw a Line from the point (20, 20) to the point (120, 120) void CHelloView::OnPaint()
{
CPaintDC dc(this); // Device Context
dc.MoveTo(20,20);
dc.LineTo(120,120);
}
4.3 GDI, Memory DC and Bitmap
GDI stands for "Graphics Device Interface", DC for "Device Context". The designers of
Windows decided that it would be nice to have a single way of drawing to all "things", The
development of GDI is in order to provide a universal set of routines that can be used to draw
onto a screen, printer, plotter or bitmap image in memory.
Associated with a Device Context, a number of tools that can be used to act on the associated
drawing surface: Pens, brushes, fonts etc. For memory DC, a number of present pens are
provided, and more can be created as needed.
26
A Device Context is a handle to a drawing surface on some device .It can typically be obtained
for the display device including printers and plotters. The most commonly worked with are
window dc which is a display DC that merely represents the area of a single window and a
memory DC that represents a bitmap as a device
A Bitmap is the in-memory representation of a drawing surface. By “linking” a bitmap into a
memory DC, the DC then represents that bitmap as a drawing surface, and all the normal GDI
operations can be performed on the bitmap. GDI also has a number of functions that can copy
areas from the drawing surface of one DC to another, so bitmaps then are a useful way to store
images in memory that will later be copied to the display (or other devices).
The bitmap and memory DC can be used to remove the flicker effect when updating the screen
based on z-buffering technique. A bitmap object is an instance of the CBitmap class. It is not
exactly the traditional bitmap graphic (BMP). Instead, a CBitmap object is a GDI object. It is an
array of bits in which one or more bits correspond to each display pixel. We can load a bitmap
graphic from a file to a CBitmap object or we can construct our own bitmap data of the CBitmap
object.
To create a CBitmap object, the following code is used. The third statement is to
define the
attributes of the object such as the resolution and color depth. In this case, the attributes is the
same as the screen device context dcScreen and with both width and height equal to 100.
Listing 4.2 – Create a CBitmap object CClientDC dcScreen (this); // Device Context of the Client Window
CBitmap bitmap;
bitmap.CreateCompatibleBitmap (&dcScreen, 100, 100);
27
A memory DC is then created with attributes of the screen DC. To enable the GDI output
functions to the memory DC, the CBitmap object is selected by the memory DC. In the example
below, the GDI output function is FillRect( ) which draw a solid rectangle with blue color.
Listing 4.3 – Use of Memory DC CDC dcMem; // Create a Memory DC with attributes the same as the dcScreen
dcMem.CreateCompatibleDC (&dcScreen);
CBrush brush (RGB (0, 0, 255));
CBitmap* pOldBitmap = dcMem.SelectObject (&bitmap);
dcMem.FillRect (CRect (0, 0, 100, 100), &brush);
dcMem.SelectObject (pOldBitmap);
With the use of CBitmap and memory DC, an image can be pasted on the screen immediately instead of pixel by pixel. Listing 4.5 – Paste the image from memory DC to the screen DC dcScreen.BitBlt (0, 0, 100, 100, &dcMem, 0, 0, SRCCOPY);
28
Chapter 5 Methodology
5.1 Plotting the LPCC-based spectral envelope
5.1.1 Introduction
LPCC-based spectral envelope was obtained by smoothing the LP-based spectral envelope by
cepstral processing. The function of plotting the envelope was added under the “Plot”
“Spectral Envelope” menu and the envelope was plotted in the same screen as the LPC envelope.
5.1.2 Program Flowchart
Figure 5.1.1 – Flowchart of computation of the LPCC-based spectral envelope
Start of Calculation function
Get the frame index
Windowing the samples of the
selected frame
Compute the Makhoul’s a by
using the LPC
Compute LPC coefficients
Compute the LPC gain
Compute cepstral coefficients of
Makhoul’s “a”
Append zero quefrecy to the
cepstral coefficients
Compute LPCC based spectral
envelope
End of calculation function
29
5.1.3 Getting the frame index
Before getting the frame index, it was necessary to know the total number of frames.
Figure 5.1.2 –Frame overlapping Assume there the following parameters. Overlapping = 50 % Number of samples = 95 windowsize =20. Number_of_frames = floor(Number_of_samples / offset -1) = floor(95/(20*50%)-1) = 8
10 20 30 40 50 60 70 80 90 95
Offset Discard this frame
20 40 60 80 95
Overlapping
…
Speech signal
30
Listing 5.1.1 – Getting the total number of frames (SpanaView.cpp) void CSpanaView::allocate(speech_parameter *sp)
{
..........
sp->offset=(__int16)((sp->windowsize)*sp->window_overlap+0.5);
sp->number_of_frames=(__int16)((float)sp->number_of_samples/
(float)sp->offset-(float)(sp->windowsize)/sp->offset)+1;
..........
}
When there was a mouse left-click to the waveform of the speech signal, there would be a red
vertical line at the point of mouse click and a Fast Display dialog would be shown in Figure
5.1.3. Based on the x-coordinates of the red vertical line, the frame index can be evaluated.
Figure 5.1.3 – Finding the frame index
point.x - xoffs
M_dfMaxX – 2*xoffx
X Number of frame Index =
Y
The red vertical line
Fast Display Dialog
point.x
xoffx
= point.x - xoffs
xoffx
M_dfMaxX
= M_dfMaxX – 2*xoffx
X(0, 0)
rect1
31
Listing 5.1.2 – Finding the frame index (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)
{……….
CRect rect1;
this->GetClientRect(&rect1);
……….
m_dfMaxX = rect1.right;
index = (short)((sp.number_of_frames)/(m_dfMaxX-2*xoffs)*(point.x-xoffs)+1);
……….
}
5.1.4 Windowing the samples of the selected frame
The frame index was then used to get the windowed signals for the selected frame. Each frame
of the speech signals had been windowed once the speech file was loaded. The windowed signal
for the entire speech file could be referenced by the following pointer.
float **w; // pointer to matrix containing windowed data // range: w[0 .. sp->number_of_frames-1][0.. // sp->windowsize-1]
Figure 5.1.4 – Structure of windowed data for the selected frame
After getting the frame index, the windowed frame signal can be referenced by the following code. ……….
sp.w[index-1]; // index=1, 2, 3, …, number_of_frame ……….
w[0][0] w[0][1] w[0][windowsize-1] . . .
w[1][0] w[1][1] w[1][windowsize-1]
w[2][0] w[2][1] w[2][windowsize-1]
w[num-1][0] w[num-1][windowsize-1]
. . .
. . .
. . .
. . .
w[num-1][1]
Where num = number_of_frames
sp.w[0]
sp.w[1]
sp.w[2]
sp.w[num-1]
32
5.1.5 Computing the LPC coefficients
lpc = [1 a(1) a(2) a(3) … a(order)], where a is the LPC coefficients and pOrder is prediction order
Where autocc = autocorrelation of x (x = sp.w[index-1] )
order = prediction order K = reflection coefficients
Figure 5.1.5 – Structure of lpc
5.1.6 Computing the Makhoul’s “a”
Makhoul’s “a” =
Figure 5.1.6 – Structure of Makhoul’s “a” Listing 5.1.3 – Computing the Makhoul’s “a” (SpanaView.cpp) float * CSpanaView::cal_a_Markhoul(__int16 pOrder, float *lpc,__int16 windowsize)
{
float * a=new float[pOrder+windowsize];
calc_lpc(order,autocc,lpc,K) autocc,
order lpc, K
1 a(1) a(2) a(3) … a(pOrder) lpc =
pOrder+1
a(1) a(2) … a(pOrder) 0 0 … 0
pOrder windowsize
cal_a_Markhoul(pOrder,lpc,windowsize)
Makhoul’s lpc
33
//append the lpc to a_makhoul first
for(int i=0;i<pOrder;i++)
{
a[i]=lpc[i+1];
}
//then append zeros to a_Marhoul
for(i=0;i<windowsize;i++)
{
a[i+pOrder]=0;
}
return a;
}
5.1.7 Computing the LPC gain
Where a_makhoul = makhoul’s a x = sp.w[index-1] Listing 5.1.4 – Computing the LPC gain (SpanaView.cpp) float CSpanaView::LPCGain(float *x, float *a_makhoul,__int16 pOrder,__int16 framesize)
{
// R0=dot(x,x);
float temp=0;
float *R=new float[pOrder];
float R0=0;
float energy;
float gain;
//cal the dot product of a
for(int i=0;i<framesize;i++)
{
R0+=x[i]*x[i];
}
// for j=1:pOrder,
for (int j=1;j<=pOrder;j++)
LPCGain(x, a_makhoul,pOrder,framesize)
pOrder, x,
framesize,
a_makhoul gain
34
{
temp=0;
for (int m=0;m<framesize-j;m++)
{
temp=temp+x[m]*x[m+j];
}
R[j-1]=temp;
}
temp=0;
for (int k=0;k<pOrder;k++)
{
temp=temp+a_makhoul[k]*R[k];
}
energy=R0+temp;
gain=float(pow((double)energy,0.5));
delete R;
return gain;
}
5.1.8 Computing the cepstral coefficients
Figure 5.1.6 – Structure of cepstral coefficients, tempc
Listing 5.1.5 – Computing the cepstral coefficients (SpanaView.cpp) float * CSpanaView::lpc2cep(float *a_makhoul, __int16 pOrder)
{
//Convert to c(1) to c(pOrder)
float temp=0;
c(0) c(1) c(2) … (2*pOrder-1) tempc =
Lpc2cep(a_makhoul, pOrder) a_makhoul
pOrder tempc
2*pOrder
35
__int16 n,m;
float *c=new float[2*pOrder];
for(n=1;n<=pOrder;n++)
{
temp=0;
for (m=1;m<=(n-1);m++)
{
temp=temp+m*c[m-1]*a_makhoul[n-m-1]/n;
}
c[n-1]=a_makhoul[n-1]-temp;
}
//Convert to c(pOrder+1) to c(pOrder*2)
for (n=pOrder+1;n<=2*pOrder;n++)
{
temp=0;
for (m=1;m<=(n-1);m++)
{
temp=temp+m*c[m-1]*a_makhoul[n-m-1]/n;
}
c[n-1]=-temp;
}
//Convert to cepstral coefficients of H(z)
for(int i=0;i<2*pOrder;i++)
{
c[i]=-1*c[i];
}
return c;}
5.1.9 Appending zero quefrency
c = Where N = windowsize
Figure 5.1.7 – Structure of c
Log(gain) tempc(0) … tempc(2*pOrder-1) 0 0 … 0
1 2*pOrder N -2*pOrder-1
N
36
Listing 5.1.6 – Appending zero quefrency (SpanaView.cpp) void CSpanaView::PlotLPCSpectral()
{
……….
tempc=lpc2cep(a,sp.order);
//Append zero quefrency
c[0]=(float)log(gain);
for(i=0;i<2*sp.order;i++)
{
c[i+1]=tempc[i];
}
//append (N - 2*sp.order) zeros to c
for (i=2*sp.order+1;i<sp.windowsize;i++)
{
c[i]=0;
}
……….
}
5.2.9 Computing the smooth spectrum from LP-derived cepstrum
Y is the complex number obtained from FFT(c, N), where N is the windowsize. It should be
noted that FFT(c, N) and Real(Y) together form the function FFT_complex(…), which gives the
real part of FFT(c, N) to c_fft.
Figure 5.1.8 – Structure of the smooth spectrum
FFT(c, N) exp(c_fft)c c_fft lpCepSpecEnv_buffer
Real(Y)Y
)0(_ fftce )1(_ fftce )12/(_ +Nfftce. . . lpCepSpecEnv_buffer =
N/2+1
N/2+1
c_fft(0) c_fft(1) … c_fft(N/2) c_fft =
37
Listing 5.1.7 – Computing the smooth spectrum (SpanaView.cpp) float * CSpanaView::lpc2cep(float *a_makhoul, __int16 pOrder)
{
……….
//Compute smooth spectrum from LP-derived cepstrum
//lpCepSpecEnv=exp(real((fft(c,N))));
float *c_fft =new float[sp.windowsize];
float *lpCepSpecEnv=new float[sp.windowsize];
FFT_complex(c,c_fft,sp.windowsize);
for(i=0;i<=sp.windowsize/2;i++)
{
lpCepSpecEnv_buffer[i]=(float)exp(c_fft[i]);
}
……….
}
38
5.2 Plotting the spectral envelope by FFT-based cepstral liftering
5.2.1 Introduction
The spectral envelope by FFT-based cepstral liftering is obtained by carrying out cepstral
liftering and then FFT to the Short-Term Real Cepstrum (stRC). Similarly, the function of
plotting the spectral envelope by FFT-based cepstral liftering was added under “Plot”
“Spectral Envelope” menu. The envelope was plotted in the same screen as the LPC envelope
5.2.1 Program flowchat
Figure 5.2.1 – Flowchart of plotting the spectral envelope by FFT-based lifting
Calculation function starts
Get the frame index
Based on the frame index to get
the windowed signal
Perform liftering
Compute the short-time real
cepstrum
Compute spectral envelope
Calculation function ends
39
5.2.2 Computing the frame index and then the windowed signals of a frame
The computation of frame index and the windowed signal for the selected frame had been
discussed in session 5.1.3 and 5.1.4 respectively.
5.2.3 Computing the Short-Term Real Cepstrum (stRC).
Figure 5.2.2 – Flowchart of computing the short-time real cepstrum
Where Y is the complex number returned from FFT( ). Listing 5.2.1 –Computing the short-time real cepstrum (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)
{
……….
FFT(sp.w[index-1], sp.winbuffer1, sp.windowsize);
……….
}
void CSpanaView::PlotLPCSpectral()
{
……….
float *stRC =new float[sp.windowsize];
float *x_fft =new float[sp.windowsize];
float *x_ifft =new float[sp.windowsize];
//get log(abs(fft(x,N)))
for(i=0;i<=sp.windowsize/2;i++)
{
x_fft[i]=(float)log(sp.winbuffer1[i]);
}
//make x_fft[i] symmetrical for IFFT
FFT(sp.w, N) Abs( Y )
Log(sp.winbuffer1)IFFT
sp.w[index-1]
N
Y
sp.winbuffer1
stRC x_fft
This FFT(…) integrates both the FFT(sp.w, N) and Abs(Y). i.e. return sp.winbuffer1 directly.
40
for(i=1;i<sp.windowsize/2;i++)
x_fft[sp.windowsize/2+i]=x_fft[sp.windowsize/2-i];
//perform ifft(log(abs(fft(x,N))))
IFFT(x_fft,x_ifft,sp.windowsize);
stRC=x_ifft;
……….
}
Figure 5.2.3 – Structures of Short Term Real Cepstrum, stRC
N/2 + 1
Log(|Y(0)|) Log(|Y(1)|) . . . Log(|Y(N/2)|)x_fft =
|Y(0)| |Y(1)| . . . |Y(N/2)|
N/2 + 1
sp.winbuffer1 =
Re(Y(0)) Im(Y(0)) Re(Y(1)) Im(Y(1)) . . . Re(Y(N/2)) Im(Y(N/2))
N + 2
Y =
N
stRC(0) stRC(1) . . . stRC(N-1)stRC =
41
5.2.4 Peform liftering, cut-off time at pOrder
After liftering was performed, stRC became:
Figure 5.2.4 – Structure of stRC after liftering
Listing 5.2.2 – Perform liftering (SpanaView.cpp) void CSpanaView::PlotLPCSpectral( )
{
……….
//Perform liftering, cut-off time at pOrder
for(i=sp.order;i<(sp.windowsize-sp.order);i++)
{
stRC[i]=0;
}
……….
}
5.2.5 Computing spectral envelope based on stRC.
Figure 5.2.4 – Flowchart of computing spectral envelope based on stRC
N-2*pOder
stRC(0) stRC(1) … 0 0 … 0 stRC(N-pOrder-1) … stRC(N-1) stRC =
pOrder pOrder
N
FFT(stRC, N) Real( Y ) stRC, N Y
exp(stRC_fft) cepSpecEnv
stRC_fft
42
Figure 5.2.5 – Structure of the spectral envelope, cepSpecEnv Listing 5.2.3 – Computing the spectral envelope, cepSpecEvn (SpanaView.cpp) void CSpanaView::PlotLPCSpectral( )
{
……….
//Compute spectral envelope based on stRC
//cepSpecEnv=exp(real(fft(stRC,N)));
///float *cepSpecEnv=new float[sp.windowsize];
float *stRC_fft =new float[sp.windowsize];
FFT_complex(stRC,stRC_fft,sp.windowsize);
for(i=0;i<=sp.windowsize/2;i++)
{
CepSpecEnv[i]=(float)exp(stRC_fft[i]);
}
……….
}
Re(Y(0)) Im(Y(0)) Re(Y(1)) Im(Y(1)) . . . Re(Y(N/2)) Im(Y(N/2))
N + 2
Y =
N/2 + 1
Re(Y(0)) Re(Y(1)) . . . Re(Y(N/2))stRC_fft =
))0(Re(Ye ))1(Re(Ye ))2/(Re( NYe. . .
N/2 + 1
cepSpecEnv =
This FFT(…) integrates both the FFT( ) and Real( ). i.e. return real part only.
43
5.2.6 Flow chart of the plotting function
Figure 5.2.6 – Flow chart of the plotting function for the spectral envelopes
Start of plotting function
Declare a Memory DC, Screen DC and a
CBitmap object
Declare a CRect object to be the Virtual
Screen in Memory DC
Select the CBitmap into the Memory DC
Plot the speech signal in time domain in the
upper part of the Virtual Screen
Plot the x-axis, y-axis and other general
information
Plot the LPCC based spectral envelope in the
lower part of the Screen in Memory DC
Plot x-axis, y-axis and other general
information
End of plotting function
Plot the spectral envelope by FFT-based
cepstral liftering.
44
5.2.7 The plotting function.
Plotting the spectral envelope (variable name: CepSpecEnv) by FFT-based cepstral liftering and
LPCC based spectral envelope (variable name: lpCepSpecEnv_buffer) was done by the
PlotSpectalenvelope( ) function.
Listing 5.2.4 – Plotting the two spectral envelopes (SpanaView.cpp) void CSpanaView::PlotSpectralenvelope()
{……….
// line 4(lpCepSpecEnv_buffer)start here
CPen pen_lpCepSpecEnv(PS_SOLID, 1, RGB(0,0,255));
pOldPen = dc.SelectObject(&pen_lpCepSpecEnv);
dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv_buffer[0]-miny)*stepy));
x_coor=0;
for(i=0;i<number_of_values;x_coor+=stepx,i++)
{
dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv_buffer[i]-miny)*stepy));
}
//line 5 (cepSpecEnv) start here//////////////////////
CPen pen_CepSpecEnv(PS_SOLID,1,RGB(255,0,255));
pOldPen = dc.SelectObject(&pen_CepSpecEnv);
dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)CepSpecEnv[0]-miny)*stepy));
x_coor=0;
for(i=0;i<number_of_values;x_coor+=stepx,i++)
{ dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)CepSpecEnv[i]-miny)*stepy));
}
// line 5 end here/////////////////////////////////
……….
}
5.2.8 Starting to plot the two spectral envelopes
When there is a mouse left-click to the waveform of the speech signals, the FastDisplay dialog
would be popped out. As shown in Figure 5.2.6
45
Figure 5.2.7 – The Fast Display dialog
The FastDisplay dialog provides user a fast display of speech signal in time domain and its
corresponding frequency spectrum spectral envelope. In this version of Spana, the spectral
envelope by FFT-based Cepstral liftering and LPCC-based spectral envelope could be plotted on
the Fast Display dialog.
For simplicity, we could use the above data for plotting the two envelops. In order words,
plotting the two envelopes on the Fast Display dialog and plotting in the main window use the
same data source. After the calculation of the two envelopes, the data of them had been assigned
to the variables in the class of FastDisplay dialog. The following codes assigned the data in
CepSpecEnv, and pCepSpecEnv_buffer to CepSpecEnv and lpCepSpecEnv which were
belonging to the class of FastDisplay dialog respectively.
Listing 5.2.5 – Assigning data to the Fast Display dialog (SpanaView.cpp) void CSpanaView::OnLButtonDown(UINT nFlags, CPoint point)
{
……….
m_FastDisplayDlg.CepSpecEnv = new float[sp.windowsize];
Fast Display dialog
46
m_FastDisplayDlg.CepSpecEnv = CepSpecEnv;
m_FastDisplayDlg.lpCepSpecEnv = new float[sp.windowsize];
m_FastDisplayDlg.lpCepSpecEnv = lpCepSpecEnv_buffer;
……….
}
Listing 5.2.6 –Plotting of the two spectral envelopes on Fast Display dialog (SpEnGraphDlg.cpp)
void CSpEnGraphDlg::OnPaint()
{
……….
/////////////////plot CepSpecEnv///////////////
CPen pen_CepSpecEnv(PS_SOLID ,1 ,RGB(0,255,0));
pOldPen = dc.SelectObject(&pen_CepSpecEnv);
dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)CepSpecEnv[0]-miny)*step y));
x_coor=0;
for(i=0;i<number_of_values;x_coor+=stepx,i++)
{
dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)CepSpecEnv[i]-miny)*stepy));
}
////////////////CepSpecEnv line here////////////
////////////////plot lpCepSpecEnv////////////////
CPen pen_lpCepSpecEnv(PS_SOLID ,1 ,RGB(0 ,0 ,0));
pOldPen = dc.SelectObject(&pen_lpCepSpecEnv);
dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv[0]-miny)*stepy));
x_coor=0;
for(i=0;i<number_of_values;x_coor+=stepx,i++)
{
dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv[i]-miny)*stepy));
}
/////////////////lpCepSpecEnv end/////////////////////////
……….
}
the FastDisplay dialog
47
5.3 Plotting the Pitch Contour
5.3.1 Introduction
For every frame of voiced signals, there must be a pitch period for that frame. A line joining all
the points of pitch period for the whole speech represents the pitch contour. I had used AMDF
(Average Magnitude Difference Function) in computing the pitch period together with a
probabilistic approach to correct the errors during the computation of pitch period in this project.
5.3.2 Program Flowchart
Figure 5.3.1 – Flowchart of Pitch Detection algorithm
Compute the mean and standard
derivation
Compute all candidate pitch periods
the selected frame
Compute the zero crossing rate
Filter out the markers with the
constraints
Weight the markers with the normal
distribution
Store the pitch period marker in
m_pitch[f]
Start of Pitch Detection
The frame is voiced?
Pitch period marker
End of the file?
End of Pitch Detection
YES
NO
YES
NO
Next frame
48
5.3.3 Computation of the Mean and Standard Derivation
Computation of mean and standard derivation of the pitch period estimates for the whole speech
was done by the FindMean_Std( ) function.
Figure 5.3.2 – The program flow of the FindMean_Std( )
Low pass filter
HANNING windowing
Compute the zero
crossing rate
Voiced frame?
Find the pitch period
estimate
Compute mean for the
array
Compute standard
derivation
End of
FindMean_Std( )
Start of FindMean_Std( )
YES
NO End of file?
YES
NO
Next frame
Store the Pitch Period
49
5.3.3.1 Lowpass filtering the entire speech signal
In order to eliminate the effects of intensity variation and background noise, passed the speech
samples to a lowpass filter with 3dB attenuation at 600 Hz and 40 dB attenuation at 900 Hz..
The required filter was designed with the help of FDATool (Filter Design & Analysis Tool) in
MATLAB.
Design Procedure A. Designed the filter
Run MATLAB, and in the command line, typed: fdatool. The FDATool window will be shown as
shown in Figure1.3. Inputted all the filter parameters and clicked the “Filter Design” button to
initialize the design process
Figure 5.3.3 – The FDATool
50
When the filter design finished, the frequency response of the filter could be obtained as shown
in Figure 5.3.4
Figure 5.3.4 – Frequency Response of the required filter
B. Obtained the filter coefficients
Went to “File” “Export”, then selected “Export To Text-file”, click OK. The coefficients of the
filter have been exported to a text file.
Figure 5.3.5 – Export the filter coefficients to a text file
Opened the text file that have been created and copied all the coefficients to the lowpassfilter( )
function. The lowpassfilter( ) function was to convolute the input speech samples.
51
Listing 5.3.1 – Lowpass filter the speech samples (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)
{
……….
//filter the speech signal
lowpassfilter(sp.spcdata,spcdata_filtered,sp.number_of_samples);
……….
}
void CSpanaView::lowpassfilter(__int16 *spcdata,__int16 *spcdata_filtered,long num_spc_samples)
{
const double B[59] = {
0.006815591950326,-8.261294934178e-005,-0.0008504860921417,-0.002135981039487,
-0.003923588510516,-0.006167886847072,-0.008771445369777,-0.01161054026741,
-0.01450768607681, -0.01725123021524, -0.01960134629328, -0.02130724375831,
-0.02211597742747, -0.02179485795837, -0.02014755490484, -0.01703070669192,
-0.0123684615914,-0.006166552489493, 0.001479595354589, 0.0103867511485,
0.02028557794243, 0.03083282948718, 0.04162511803079, 0.0522280014446,
0.06218524704965, 0.07105391473281, 0.07842105987528, 0.08396391887656,
0.08738033546011, 0.08853662302354, 0.08738033546011, 0.08396391887656,
0.07842105987528, 0.07105391473281, 0.06218524704965, 0.0522280014446,
0.04162511803079, 0.03083282948718, 0.02028557794243, 0.0103867511485,
0.001479595354589,-0.006166552489493, -0.0123684615914, -0.01703070669192,
-0.02014755490484, -0.02179485795837, -0.02211597742747, -0.02130724375831,
-0.01960134629328, -0.01725123021524, -0.01450768607681, -0.01161054026741,
-0.008771445369777,-0.006167886847072,-0.003923588510516,-0.002135981039487,
-0.0008504860921417,-8.261294934178e-005, 0.006815591950326
}; for(int n=0;n<num_spc_samples;n++)
{
spcdata_filtered[n]=0;
for(int m=0;m<59;m++)
{
if ((n-m)<0)
{
}
else
spcdata_filtered[n]=__int16(B[m]*spcdata[n-m]+spcdata_filtered[n]);
}
}
Filter coefficients
Convolution
52
5.3.3.2 HANNING windowing the speech samples
In order to increase the accuracy of AMDF, for each frame of samples, they should be
preprocessed by a HANNING window. By calling the windowing function directly and passed
“HANNING” as a parameter to the windowing function, HANNING windowing was performed.
The computation steps of the windowing have been discussed in .theory.
Listing – 5.3.2 HANNING windowing a frame of samples (SpanaView.cpp)
void CSpanaView::pitch_detection(int *m_pitch_array_size) {
……….
windowing(x, frameData, win_size, HANNING, sp.norm_factor,sp.prem_factor);
……….
}
5.3.3.3 Computing the zero crossing rate
The purpose of computing the zero crossing rate was to justify whether the frame of samples was
voiced or unvoiced. The frequency of voiced sound was lower than that of unvoiced sound.
Listing 5.3.3 – Computing the zero crossing rate (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)
{
………. for(int f=0; f<num_frames; f++){
……….
zerox = 0;
for(int m=1; m<m_win_size; m++)
53
zerox += abs(sgn(frameData[m])-sgn(frameData[m-1]))/2;
zerox /= m_win_size;}
}
……….
}
5.3.3.4 Computing the pitch period estimates using AMDF
∑=
≤≤+−=N
innn MAXLAGjjixix
NjAMDF
11,)()(1)(
If the frame of samples was voiced, we would start the computation of AMDF. For each frame,
the AMDF would be computed for each lag, j, and the magnitude of AMDF would be stored in
an array, Delta [ ], which would be used in searching for the pitch period estimate. The pitch
period estimate was the lag for which the magnitude of AMDF was the global minimum in the
selected frame and the distribution of the pitch period estimates would be approximated with a
normal distribution.
On the other hand, if the frame of samples was unvoiced, the pitch period for that frame would
be set to zero. The pitch period for unvoiced frame would not be counted into the distribution of
pitch period estimates.
Listing 5.3.4 – Setting the pitch period to zero for unvoiced frame and computation of AMDF (SpanaView.cpp)
void CSpanaView::pitch_detection(int *m_pitch_array_size)
{
……….
if((zerox>0.3 && wh.bps==16) || zerox == 0)
m_pitch[f] = 0;
else if((zerox>0.6 && wh.bps==8) || zerox == 0)
m_pitch[f] = 0;
else {
If unvoiced, set the pitch period to zero
54
// AMDF
for(int i=1; i<=m_win_size; i++) {
N = 0;
Delta[i-1] = 0;
for(int j=0; j<m_win_size; j++) {
Delta[i-1] += (float)fabs(frameData[m_win_size+j-i]-frameData[m_win_size+j]);
N++;
}
Delta[i-1] /= N;
}}
……….
}
5.3.3.5 Mean and standard derivation of the pitch period estimates
After evaluating all the pitch period estimates for the whole speech, we started to compute the
mean and standard derivation of the pitch period estimates.
Listing 5.3.5 – Mean of the pitch period estimates (SpanaView.cpp) float CSpanaView::average(int num_frame,int *global_min_location)
{
int sum=0; //sum of all the lags
float mean=0;//mean of the speech
for(int i=0;i<num_frame;i++)
{
sum+=global_min_location[i];
}
mean=sum/(float)num_frame;
return mean;
}
Listing 5.3.6 – Standard derivation of the pitch period estimates (SpanaView.cpp) float CSpanaView::STD(float mean,int *global_min_location,int num_frame)
{
float std=0; //standard derivation
float var=0; //variance
for(int i=0;i<num_frame;i++){
55
var+=(global_min_location[i]-mean)*(global_min_location[i]-mean)/(num_frame);
}
std=(float)pow(var,0.5);
return std;
}
5.3.4 Computing all candidate pitch periods for the selected frame
Candidate pitch periods of a frame refer to the lags for which the AMDF were the local minima
in a frame. Searching for the local minima was accomplished by the Findlocal_min( ) function.
Figure 5.3.6 – Input and output of Findlocal_min( ) function
Findlocal_min( ) Delta[ ] num_min
local_min
Number of minima
Values of minima
local_min_location
Locations of minima
56
Figure 5.3.7 – Program flow of computing the candidate pitch periods Listing 5.3.7 – Finding all the candidate pitch periods in a frame (SpanaView.cpp) int *CSpanaView::Findlocal_min(float *Delta,int MaxLag,int *counter,float *MinData)
{
for(int j=1;j<MaxLag;j++)
{
if(j==MaxLag-1) //the last sample
{
if(Delta[j]<Delta[j-1])
{
*counter+=1;
pData[*counter-1]=j+1;
MinData[*counter-1]=Delta[j];
}
Start of FindLocalmin( )
Set i=1, counter =0
Delta[i]<(Delta[i-1]
&Delta[i-1])
Delta[i] is last
sample?
Store Delta[i] into local_min
i and counter increments
End of Delta[i]?
End of FindLocalmin( )
NO
NO
YES
YES
NO
YES
Next sample
Assign counter to num_min
57
}
else
{
if((Delta[j]<Delta[j-1])&&(Delta[j+1]>Delta[j]))
{
tempDelta=Delta[j];
tempIndex=j;
*counter+=1;
pData[*counter-1]=j+1;
MinData[*counter-1]=Delta[j];
}
}
}
return pData;
}
5.3.5 Computing the zero crossing rate
Again, in order to justify whether the frame was voiced or unvoiced, it was necessary to compute
the zeros crossing rate. Since computation of zero crossing rate had been discussed in 5.3.2.3,
please refer to that section for details.
5.3.6 Filtering out the markers with the constraints
In most AMDF-based PDAs (Pitch Detection Algorithm), the lag for which the magnitude of the
difference function is a global minimum is chosen as the pitch period estimate for that frame. In
this AMDF PDA, we not only computed the lag with global minimum, but also a set of
candidates for the pitch period in a frame was selected. Please refer to the theory. To be a marker,
the candidate pitch periods must satisfy the AMDF pattern constraints that were stated in Theory.
The computation of markers was implemented by the FindMarker( ) function.
58
Listing 5.3.8 – Finding the marker_location (lag) and marker_height (magnitude of AMDF
for the lag) (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)
{
……….
FindMarker(marker_height,marker_location,m_win_size,Delta,num_marker);
……….
}
Figure 5.3.8 – Flowchart of filtering out the markers
Find all the candidate pitch
periods
Find the constraints for each
candidate
Store the candidate as a marker
Start of FindMarker( )
Constraints
satisfied?
Any candidates?
End of FindMarker( )
YES
YES
NO
NO
59
5.3.6.1 Finding the constraints
a. global_max
The global_max was found by Findglobal_max( ) function.
Figure 5.3.9 – Program flow finding the global_max
Findglobal_max( ) Delta[ ] global_max
global_max_location
Start of
Findglobal_max( )
Store Delta[i] in buffer
Delta[i] <
Delta[i+1]
Store Delta[i+1] in
buffer
Delta[i+1]<
buffer
i=0
End of Delta[i]?
i increments
YES NO
YES
NO
YES
NO
global_max=buffer Start of
Findglobal_max( )
Next sample
NO
YES
60
Listing 5.3.9 – Finding the global_max (SpanaView.cpp) int CSpanaView::Findglobal_max(float *Delta,int MaxLag,float *MaxDelta )
{
……….
for(int j=0;j<MaxLag-1;j++)
{
if (Delta[tempIndex]>Delta[j+1])
{
tempDelta=Delta[tempIndex];
}
else
{
tempDelta=Delta[j+1];
tempIndex=j+1;
}
*MaxDelta=tempDelta;
}
return tempIndex+1;
}
b. iheight
)_,_min( iii heightrightheightleftheight = , which was computed by the
FindHeight( ) function.
Figure 5.3.10 – Input and output of FindHeight( ) function Noted that the Local_max was found before running Findlocal_min( ). Listing 5.3.10 – Finding iheight (SpanaView.cpp)
void CSpanaView::FindHeight(float *local_max,int num_min,float *height_i,float *local_min)
{
for(int i=0;i<num_min;i++)
{
FindHeight( ) Local_max
num_local_max
height_i
Array of iheight
61
if (local_max[i]<local_max[i+1])
height_i[i]=local_max[i]-local_min[i];
else
height_i[i]=local_max[i+1]-local_min[i];
}
}
c. peak_ratio
The peak_ratio was computed by the Findpeak_ratio( ) function using the formula:
peak_ratio=local_maximum/global_max.
Figure 5.3.11 – Input and output of Findpeak_ratio( ) function Listing 5.3.11 – Find peak_ratio (SpanaView.cpp) void CSpanaView::Findpeak_ratio(int num_min,float *local_max,float *global_max
,float *peak_ratio)
{
for(int i=0;i<num_min;i++)
{
peak_ratio[i]=local_max[i]/(*global_max);
}
}
d. iwidthlobe _ The FindLobe_width( ) could find the lobe_width by using the formula:
lobe_width = distance between right and left local maxima.
Figure 5.3.12 – Input and output of FindLobe_width( ) function
Findpeak_ratio Local_max
global_max
peak_ratio
FindLobe_width local_max_location
iwidthlobe _
62
Listing 5.3.12 – Finding iwidthlobe _
void CSpanaView::FindLobe_width(int *local_max_location,int num_min,int *lobe_width_i)
{
for(int i=0;i<num_min;i++)
{
lobe_width_i[i]=local_max_location[i+1]-local_max_location[i];
}
}
Listing 5.3.13 –Finding the four constraints void CSpanaView::FindMarker(float *marker_height,int *marker_location,int m_win_size,float *Delta,int
*num_marker)
{
……….
//Find the min(left_height,right_height)
FindHeight(local_max,*num_min,height_i,local_min);
//Find the difference of heights between two consecutive maxima
FindDiff_i(local_max,*num_min,diff_i);
//Find the loba width between two consecutive maxima
FindLobe_width(local_max_location,*num_min,lobe_width);
//Get the peakrato
Findpeak_ratio(*num_min,local_max,global_max,peak_ratio);
……….
}
To be a marker, the thi candidate needed to satisfy: 1. peak_ratio ≥ 0.8 2. max_3.0 globalheighti ×≥ 3. max_1.0 globaldiff i ×≤ 4. lagswidthlobe i 100_ ≤
63
Listing 5.3.14 – Filter the candidates with the constraints (SpanaView.cpp) void CSpanaView::FindMarker(float *marker_height,int *marker_location,int m_win_size,float *Delta,int
*num_marker)
{
………. if((peak_ratio[i]>=0.7)&&(height_i[i]>=0.3*(*global_max))&&(diff_i[i]<=0.1*(*global_max))&&(lobe_wi
dth[i]<=100))
{
num+=1;
marker_location[num-1]=local_min_location[i];
marker_height[num-1]=height_i[i];
}
……….
}
5.3.7 Weighting the markers with the normal distribution
The probability density function of the normal distribution with mean µ and standard deviation σ
is an example of a Gaussian function
2
2
2)(
21)( σ
µ
πσ
−−
=x
exf
Figure 5.3.14 – The graph of normal distribution
After computed all the markers of a frame, the next step was to weight the markers with a normal
distribution. Figure 1.8 showed the markers of a frame.
64
Figure 5.3.15 – AMDF and markers for a voiced frame
Substituted the marker into the Gaussian function to weight the marker with the normal
distribution. The marker with the highest height after weighting with the normal distribution
would be regarded as the pitch period of the frame.
Figure 5.3.16 – (A) Distribution approximation of the initial pitch period estimate (B) AMDF for a voiced frame. The dashed line showed the normal distribution approximation
It was noted that marker 2 was selected as the pitch period of the frame. However, after
weighting the markers with the normal distribution, marker 1 made a better candidate for the
pitch period of the frame.
Listing 5.3.15 – Weighting the markers with the normal distribution of the initial pitch period estimates (SpanaView.cpp) void CSpanaView::pitch_detection(int *m_pitch_array_size)
{
……….
markers_weighted[i]=weight(marker_location[i],*std,*mean);
int temp_pitch_index=0;
for(i=0;i<*num_marker-1;i++)
{
25 50 75 100 125 150
AMDF
lags
A B f(x)
lags lags
AMDF
Gaussian function
65
if (markers_weighted[temp_pitch_index]>markers_weighted[i+1])
temp_pitch_index=temp_pitch_index;
else
temp_pitch_index=i+1;
}
m_pitch[f]=marker_location[temp_pitch_index];
……….
}
//find the probability of the normal distribution of a given x
float CSpanaView::weight(int x,float std,float mean)
{
float expvalue=0;
expvalue=(float)((1/pow(2*3.1415926*pow(std,2),0.5))*exp(-1*(x-mean)*(x-mean)/(2*std*std)));
return expvalue;
}
The marker with highest height would be selected as the pitch period of the frame
66
5.4 Adding a zoom function to view the speech signal in time domains
5.4.1 Introduction
The zoom function can zoom speech signal in time domain and its corresponding spectral
envelopes can also be seen in the zooming window. The zoom function was added under the
“Plot” “Zoom” menu.
5.4.2 Program flowchart
Figure 5.4.1 – (A) Flowchart of OnZoom( ) hanlder. (B) Flowchart of calculation of frequency spectrum and spectral envelopes.
Create and show the Zoom Scale
dialog
Set the Zoom Indicator to TRUE
Start of OnZoom( ) handler
End of OnZoom( ) handler
Compute the LPC and
autocorrelation.
Compute the frequency spectrum
Start of computation of the
frequency spectrum and the three
End of computation function
Compute the LPCC based spectral
envelope
Compute the spectral envelope by
FFT-based cepstral liftering
Convert from linear to dB.
67
Figure 5.4.2 – Flowchart of MouseMove( ) hanlder
Get the zoom factor to
compute the new frame size
Start of OnMouseMove( )
handler
End of OnMouseMove( )
Zoom Indicator is
TRUE?
Mouse's coordinate
within the boundary?
Compute the start play sample
index
Append zeros for FFT
Windowing
Start the computation of
frequency spectrum and the
three spectral envelopes
Start of the plotting function
YES
NO
YESNO
68
5.4.3 Creating and showing the Zoom scale dialog.
In the menu bar, “Plot” “Zoom” would initialize the handler, OnZoom( ). The OnZoom( )
function would create and show the Zoom scale dialog and set the Zoom Indicator to TRUE
which would be used as an indicator for the mouse-move handler, OnMouseMove( ).
Figure 5.4.3 – Zooming the speech signal
Listing 5.4.1 – Created and showed the Zoom Scale dialog (SpanaView.cpp) void CSpanaView::OnZoom()
{
……….
m_FastZoomDlg.Create(IDD_DISPLAY_ZOOM,this);
m_FastZoomDlg.ShowWindow(SW_SHOW);
m_ZoomIndicator=TRUE;
m_FastZoomDlg.SetIndicator(m_ZoomIndicator);
m_FastZoomDlg.Invalidate();
……….
}
The Zoom Scale dialog
The Zoom window
69
5.4.3 The OnMouseMove( ) handler.
When the mouse pointer moved across the document, the OnMouseMove( ) handler would be
initialized which would then run the PlotZoom( ) function
Listing 5.4.2 – PlotZoom( ) ran when mouse pointer moved (SpanaView.cpp) void CSpanaView::OnMouseMove(UINT nFlags, CPoint point)
{
……….
PlotZoom(point);
……….
}
5.4.4 Getting the zoom factor to compute the new frame size
When the PlotZoom( ) function was run, the first step was to get the zoom factor. If the Zoom
Indicator was TRUE and the mouse pointer’s x-coordinate was within the painting area, the
value returned from the slider in the Zoom Scale dialog will be assigned to zoom factor
Figure 5.4.4 – The Zoom Scale dialog
The Zoom Scale dialog
The slider scales the zoom factor
70
Listing 5.4.3 – Get the Zoom factor (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
m_ZoomIndicator=m_FastZoomDlg.GetZoomIndicator();
if (m_ZoomIndicator==TRUE)
{
……….
//check if the mouse's coordinate is out of the window
if((rect.right>point.x)&&(point.x>xoffs))
{
……….
SliderIndicator=m_FastZoomDlg.GetSliderIndicator();
if (SliderIndicator==FALSE)
factor=1;
else
factor=m_FastZoomDlg.GetFactorValue();
//find the new window size
Zoomwindowsize=sp.windowsize;
Zoomwindowsize=int(Zoomwindowsize*factor);
……….
}
}
}
The Zoom Indicator
71
5.4.5 Calculating the Start Play Sample index
Figure 5.4.5 – Calculating the Star Play Sample Index Listing 5.4.4 – Computed the Start Play Sample Number (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
// Calculate the Start Play Sample Number
m_bPlayIndex = (unsigned long)((sp.number_of_samples)/ (m_dfMaxX-2*xoffs)*(point.x-xoffs)+0.5);
……….
}
(0, 0)
point.x
xoffs
m_dfMaxX
Start Play Sample Index point.x xoffs
m_dfMaxX=
xoffs
_
_ 2
Num of samples x
72
5.4.6 Windowing the frame samples
Where signal_timeDomain = speech signal frameData = the speech signal after windowing
Figure 5.4.5 – Windowing the speech samples
Listing 5.4.5 – HAMMING windowing the speech samples (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
float *frameData=new float[sp.windowsize];
signal_timeDomain = (__int16 *)sp.spcdata+(long)m_bPlayIndex;
windowing(signal_timeDomain, frameData, Zoomwindowsize, HAMMING,
sp.norm_factor,sp.prem_factor);
……….
}
5.4.7 Appending zeros for FFT
In order to fit frameData into the FFT ( ) function, the length of frameData should be of power
of 2. However, sp.windowsize was of power of 2 while Zoomwindowsize not. Thus, frameData
should be of length equal to sp.windowsize. However, this would introduce some unknown
signal to frameData. The unknown signal was due to the fact that the memory locations beyond
Zoomwindowsize has not been assigned properly.
. . . windowing
. . .
Zoomwindowsize
. . . . . .
sp.windowsize
unknown
signal_timeDomainframeData
73
See the Figue 5.4.5 that the data beyond the Zoomwindowsize were unknown. These unknown
data were errors. To remove them, appended zeros to the memory locations beyond
Zoomwindowsize. It would be clearer to see Figure 5.4.6.
Figure 5.4.6 – Append zeros to memory locations beyond Zoomwindowsize
Listing 5.4.6 – Append zeros to fit the FFT( ) (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
//append zeros for fft
//since there is Zoomwindowsize of data, we need to append
//(sp.windowsize-Zoomwindowsize) zeros to framedata
for(int i=0;i<sp.windowsize-Zoomwindowsize;i++)
{
frameData[i+Zoomwindowsize]=0; }
……….
}
5.4.8 Computation of frequency spectrum and the spectral envelopes
5.4.8.1 Computed the frequency spectrum
The frequency spectrum could be got by transforming the windowed speech samples into
frequency domain.
Zoomwindowsize
. . . . . .
sp.windowsize
unknown
frameData
Append zeros
. . . . . .
sp.windowsize
frameData
All zeros
74
Figure 5.4.7 – Transforming the speech samples into frequency domain
Listing 5.4.7 –Transform the windowed speech samples into frequency domain (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
FFT(frameData, sp.winbuffer1, sp.windowsize);
……….
}
5.4.8.2 Computing the LPC envelope
Prior to the computation of LPC envelope, it was necessary to compute the autocorrelation of the
windowed signal. The second step was to use the result of autocorrelation, tempautocc, to
calculate the LPC coefficients, tempLPC. Finally, put tempLPC into spectral_envelope( ) to
compute the LPC envelope. The spectral envelope was stored in sp.winbuffer2.
Figure 5.4.8 – Flowchart of computing the LPC envelope
frameData FFT sp.winbuffer1
autocorrelation cal_lpc( )
frameData, sp.order, frameData
tempautocc
tempLPC
Spectral_envelope( )
sp.winbuffer2
75
Listing 5.4.8 – Computation of the LPC envelope (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
td_autoc(Zoomwindowsize,sp.order,frameData,tempautocc);
calc_lpc(sp.order,tempautocc,tempLPC,sp.K[index-1]);
spectral_envelope(sp.order, tempLPC, sp.windowsize,
sp.winbuffer2, gain, sp.dspflag);
……….
}
Listing 5.4.9 – Computation steps of autocorrelation (SpanaView.cpp) void CSpanaView::td_autoc(__int16 win_size, __int16 order, float *indata,float *autocc)
{
……….
for (k=0;k<=order;k++)
{
sum = 0.0;
for (m=0;m<win_size-k;m++)
sum += indata[m] * indata[m+k];
autocc[k] = sum;
}
}
5.4.8.3 Computing the LPCC-based spectral envelope the spectral envelope by
FFT-based cepstral liftering.
The computation of them was performed by the function, PlotLPCSpectralZoom( ). Since the
computation step in PlotLPCSpectralZoom(…) was same as the PlotSpectral(…), the
implementation of PlotLPCSpectralZoom(…) would not be discussed here. Please refer to
sections 5.1 and 5.2 for the details.
After computing the spectral envelopes, we could start plotting the envelopes. However, it was
necessary to convert data of frequency spectrum ,LPC envelope, LPCC-based spectral envelope
and spectral envelope by FFT-based cepstral lifting from linear to dB before plotting the
envelopes.
76
Where sp.winbuffer1 = frequency spectrum sp.winbuffer2 = LPC envelope lpCepSpecEnv_Zoom = LPCC-based spectral envelope CepSpecEnv_Zoom = spectral envelope by FFT-based cepstral liftering
Figure 5.4.9 – Converting data from linear to dB
Listing 5.4.10 – Convert the linear data to dB (SpanaView.cpp) void CSpanaView::PlotZoom(CPoint point)
{
……….
PlotLPCSpectralZoom(Zoomwindowsize,tempLPC,frameData);
linear_to_log10(sp.winbuffer1, Zoomwindowsize/2, 1.0);
// convert linear data to dB
linear_to_log10(sp.winbuffer2, Zoomwindowsize/2, 1.0);
// convert linear data to dB
linear_to_log10(lpCepSpecEnv_Zoom,Zoomwindowsize/2,1.0);
// convert linear data to dB
linear_to_log10(CepSpecEnv_Zoom,Zoomwindowsize/2,1.0);
……….
}
5.4.9 The Zoom Scale Dialog
There was a slider in the Zoom Scale Dialog to scale the zooming factor which was used to scale
the window size of the zooming. There are four scales in the slider.
Figure 5.4.10 – Four scales of zoom were supported
10*log( )
sp.winbuffer1, sp.winbuffer2
lpCepSpecEnv_Zoom, CepSpecEnv_Zoom
sp.winbuffer1, sp.winbuffer2
lpCepSpecEnv_Zoom, CepSpecEnv_Zoom
Zoom increasing
Four scales
77
Figure 5.4.11 – The four scales of zoom
Listing 5.4.11 – Set the four scales (SpanaView.cpp)
void CFastZoomDlg::OnHScroll(UINT nSBCode, UINT nPos, CScrollBar* pScrollBar)
{
if (GetZoomIndicator()==TRUE)
{
m_SliderZoom.SetRange(1,4,TRUE);
78
m_SliderZoom.SetPageSize(1);
m_SliderZoom.SetTicFreq(1);
switch(m_SliderZoom.GetPos())
{
case 1: ZoomFactor=1;
break;
case 2: ZoomFactor=(float)0.8;
break;
case 3: ZoomFactor=(float)0.6;
break;
case 4: ZoomFactor=(float)0.4;
}
}
……….
}
79
5.6 Interactive Fast Display
5.6.1 Introduction
The frame of samples that the Fast Display dialog is displaying out is determined by the location
of the vertical line in the main window as shown in Figure 5.6.1. Thus, we could display the next
frame of samples by reacting with red vertical line. To do this, it was a must to get the current
location of the red vertical line. This functionality can be added by adding a handler,
OnKeyDown( ), for the KeyDown event. The main purpose of the OnKeyDown( ) function was
to get the frame index so that the Fast Display dialog could show the required frame of samples
by using the frame index.
Figure 5.6.1 – The red vertical line
The red vertical line
80
5.6.2 Program Flowchart
Figure 5.6.2 – Flowchart of the OnKeyDown( ) function
Set the new location for the red
line
Start the computation and
painting process
Start of OnKeyDown( )
key =
VK_RIGHT ?
Current frame
index selected?
key = VK_LEFT?
Frame index decrements
Frame index increments
End of OnKeyDown( )
NO
YES
NO NO
YES YES
81
5.6.3 Getting the current frame index and next frame index
Current frame index:
In order to select the next frame of samples for displaying in the Fast Display dialog, it was
necessary to know current frame index. The reason was due to the fact that any increment or
decrement of the frame index must be based on the current frame index. Otherwise, it was
impossible to know which frame of samples that the users want to display. The current frame
index is selected by clicking the left button of mouse at the waveform of the speech signals
Please refer to section 5.1.3 for the details. An indicator, m_bCheckMouse, was used to verify if
the current frame index was selected. If selected, it will be set to TRUE. There would be no
response to the key “ ” or “ ” if the indicator was set to FALSE.
New frame index:
There would be no any response to any key pressed in keyboard unless the key pressed was “ ”
or “ . If right key is passed, the frame index would increment for the key “ ” and decrement
for the key “ ”. Then, the frame index would be passed to the class of the Fast Display dialog
for further computation.
5.6.4 Setting the red vertical line to new position
The red line should be moved to right if key “ ” has been pressed and to left if key pressed is
“ ”. The red vertical line position was determined by the value of the variable, x_indicator,
which is the x-coordinate of the red vertical line in the main window. Figure 5.6.3
82
Figure 5.6.3 – Calculation of the new location of the red vertical line Listing 5.5.1 – Getting the new frame index and setting the red vertical line to new position
(SpanaView.cpp) void CSpanaView::OnKeyDown(UINT nChar, UINT nRepCnt, UINT nFlags)
{
……….
if(nChar==VK_RIGHT||nChar==VK_LEFT)
{
if(nChar==VK_RIGHT)
{
x_indicator=x_indicator+x_movement_step;
new positon
x_movement_step original position
x indicator
m_dfMaxX
xoffx
-x_movement_step
m_dfMaxX xoffx 2
Number of frames =
x indicator new = x indicator x movement step +
new position
83
m_FastDisplayDlg.win_index = m_FastDisplayDlg.win_index +1;
}
else
{
x_indicator=x_indicator-x_movement_step;
m_FastDisplayDlg.win_index = m_FastDisplayDlg.win_index -1;
Invalidate();
}
………
}
}
84
5.6 Interactive Spectral Plot
5.6.1 Introduction
When the poles and zeros in Z-Plane and sensitivity of LP parameters are adjusted, the
LPCC-based spectral envelope would change. The reason was that adjusting these parameters
would change the values of LP coefficients. Therefore, what I had done was to get the new LP
coefficients and use this data for the computation of the new LPCC-based spectral envelope.
5.6.2 Program flowchart
Figure 5.6.1 – Program flow of reacting the change in poles, zeros or sensitivity of LP parameters by changing the LPCC-based spectral envelope
Any change in zeros, poles or
sensitivity of LP parameters
Compute a new set of LP
coefficients
Compute the new LPCC-based
spectral envelope
Plot the new LPCC-based spectral
envelope
End
85
5.6.3 Computing a new set of LP coefficients
There had been events handlers created to handle the changes in poles, zeros and the sensitivity
of LP parameters in previous version Spana, so I needed not create any handler to handle these
events. Since the handlers would compute a new set of LP coefficients, I was not required to add
codes to these handlers to do so. Instead, what I needed to do was to get the new set of LP
coefficients. After the computation of Ta new set of LP coefficients, the array, sp.LPC[index-1],
would be updated with these LP coefficients. Therefore, the new set of LP coefficients could be
referenced by the following code:
………. sp.LPC[index-1]; //LP coefficients ……….
5.6.4 Computing the new LCPP-based spectral envelope
After updating the array, sp.LPC[index-1], with the new set of LP coefficients, we could start the
computation of the new LPCC-based spectral envelope. The computation was done in the
PlotSpectralEnvelope( ) function.
Listing 5.6.1 – Computation of the new LPCC-based spectral envelope (SpanaView.cpp)
void CSpanaView::PlotSpectralenvelope()
{
……….
gain=calc_gain(sp.autocc[index-1], sp.LPC[index-1], sp.order);
FFT(sp.w[index-1], sp.winbuffer1, sp.windowsize);
spectral_envelope(sp.order, sp.LPC[index-1], sp.windowsize, sp.winbuffer2, gain, sp.dspflag);
PlotLPCSpectral();
linear_to_log10(sp.winbuffer1, sp.windowsize/2, 1.0);
// convert linear data to dB
linear_to_log10(sp.winbuffer2, sp.windowsize/2, 1.0);
// convert linear data to dB
linear_to_log10(lpCepSpecEnv_buffer,sp.windowsize/2,1.0);
// convert linear data to dB
86
linear_to_log10(CepSpecEnv,sp.windowsize/2,1.0); // convert linear data to dB
……….
}
5.6.5 Plotting the new LPCC-based spectral envelope
Plotting of the new LPCC-based spectral envelope was also completed in the
PlotSpectralEnvelope( ) function.
Listing 5.6.2 – Plotting the new LPCC-based spectral envelope (SpanaView.cpp) void CSpanaView::PlotSpectralenvelope()
{
………. // new (lpCepSpecEnv_buffer)start here
CPen pen_lpCepSpecEnv(PS_SOLID, 1, RGB(0,0,255));
pOldPen = dc.SelectObject(&pen_lpCepSpecEnv);
dc.MoveTo(int(xoffs),int((rect.bottom)+yoffs-((float)lpCepSpecEnv_buffer[0]-miny)*stepy));
x_coor=0;
for(i=0;i<number_of_values;x_coor+=stepx,i++)
{
dc.LineTo((int)x_coor+xoffs,int((rect.bottom)-((float)lpCepSpecEnv_buffer[i]-miny)*stepy));
}
……….
}
When there are changes in poles, zeros or the sensitivity of LP parameters next times, above
process will be repeated.
87
Chapter 6 Results and Discussion
6.1 Plotting of the LPCC-based spectral envelope and the spectral envelope
by FFT-based cepstral liftering
Results:
Figure 6.1.1 – The spectral envelopes including LPCC-based spectral envelope (Blue), spectral envelope by FFT-based cepstral liftering (Pink) and LPC spectral envelope (Red) Discussion
It can be seen from Figure 6.1.1 that the three spectral envelopes are very close to each other. By
using this function, students can have a look on the relationship between these spectral envelopes.
They can also verify that LPCC-based spectral envelope and spectral envelope by FFT-based
cepstral liftering can model the frequency spectrum. Therefore, it is easier to tell them that the
two spectral envelopes can help in locating the formants.
88
6.2 Plotting of Pitch Contour
Results: (A)
(B)
(C)
Figure 6.2.1 – The pitch contours for the speech “seven” from (A) WaveSurfer 1.6.0, (B)
Spana (current version), (C) Spana (previous version)
Period in ms
89
(A)
(B)
(C)
Figure 6.2.2 – Pitch contours for the speech “welcome” from (A) WaveSurfer 1.6.0, (B)
Spana (current version), (C) Spana (previous version)
90
Discussion
By observing Figures 6.2.1 and 6.2.2, it is found that the envelopes of pitch contours from
current version Spana are closer to that from WaveSurfer than the envelopes of pitch contours
from the previous version Spana. Thus, it can be concluded that the performance of plotting the
pitch contour in current version has been enhanced.
91
63 Zooming the speech signals in time domain
Results:
Figure 6.3.1 – The zooming window
Figure 6.3.2 – Zooming in greater scale
92
Discussion
By seeing Figure 6.3.1, you will find that blow-up of waveform become possible with the
zooming function. Figure 6.3.2 shows that zooming scale can be changed. Another important
feature of the zooming function is that the view in the zooming window will change accordingly
with the mouse pointer. This is convenient to users because users need not to select a portion of
waveform to zoom and then press the zoom button in order to zoom the speech signals.
93
6.4 Interactive Fast Display
Results:
Figure 6.4.1 – Original view in the Fast Display dialog (frame 45)
Figure 6.4.2 – The next view in the Fast Display dialog (frame 46) when the key “ ” was pressed once
94
Discussion
In previous version Spana, if the user wants to view frame 41 in the Fast Display dialog, she
must use the mouse pointer to locate frame 41 and then click the mouse’s left button at that
location. If she wants to shift the view to the next frame, she must repeat above process.
However, she must repeat above process 100 times if she wants to view the entire speech which
contains 100 frames of signals. Thus, it is not convenient for her to do so. The Interactive Fast
Display feature provides users a more convenient way to shift the view to the next frame of
signals by using the “ ” or “ ” keys on keyboard.
95
6.5 Interactive Spectral Plot
Figure 6.5.1 – Changing the LPC spectral envelope and LPCC-based spectral envelope by moving the poles on Z-Plane
Figure 6.5.2 – Changing the LPC spectral envelope and LPCC-based spectral envelope by adjusting the zeros on Z-Plane
96
Figure 6.5.3 – Change the LPC spectral envelope and LPCC-based spectral envelope by adjusting the sensitivity of LP parameters
Discussion
By see Figures 6.5.1, 6.5.2, 6.5.3, it should be observed that the LPC spectral envelope and
LPCC-based spectral envelope can be changed by adjusting the poles and zeros on Z-Plane and
the sensitivity of LP parameters. This interactive function can help student know how the zeros,
poles and sensitivity of LP parameters affect the LPC spectral envelope and LPCC-based spectral
envelope.
97
Chapter 7 Conclusion and Recommendations
7.1 Conclusion
This project is aimed at enhancing Spana by adding new features so that it is more useful in
helping student to learn the abstract concepts of speech analysis. The new features that had been
stated in Chapter 2 include Plotting of LPCC-based spectral envelope, spectral envelope by
FFT-based cepstral liftering and Pitch Contour, addition of zooming function to view speech
signal in time domain and selection of frame index by keyboard. All these functions are
completed. Therefore, the objective of this project has been met.
For the pitch detection algorithm, there was still error even though the performance of plotting
the pitch contour had been enhanced. Experiments at extending the probabilistic approach
indicate that the error in pitch detection can be further reduced by using a finer approximation of
the normal distribution [9]. By including both distributions, it is expected that a more desirable
result of pitch detection can be obtained.
During the development of this project, it was found that it is difficult to justify whether a new
function written by myself worked or not. However, this problem can be solved by comparing
the results of the new function with that obtained in MATLAB. It is because MATLAB has
enormous built-in functions. For example, if you want to justify whether the FFT function
written by you is correct, what you need to do is simply call the FFT function in MATLAB and
compare the results between your function and the function in MATLAB.
98
7.2 Recommendations for further work
For the zooming function, there is a little flicker to the Fast Zoom dialog during the blow-up of
speech waveform. This may be irritating to users. It is because the paint in the Fast Zoom dialog
will update when the mouse pointer is moving across the waveform of the speech signal.
Hopefully, this problem can be solved in later version of Spana.
The zooming function in this version of Spana can only zoom the speech signal in time domain.
Due to the limitation of time, zooming the speech signal in spectral domain is not supported in
this version and it is suggested for further work.
Ideas never stop and there is room for enhancement to Spana. Other new features can be added
including spectrogram analysis and plotting of MFCC envelope. The interface of Spana can also
be improved such as displaying different characteristics of the speech signal in the same time by
using Multiple Document Interface.
99
References
[1] http://www.codeproject.com/bitmap/gditutorial.asp [2] http://www.codeguru.com/ [1] S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning Approach,
Prentice Hall, to appear [2] Deller, J.R. et al. Discrete-Time Processing of Speech Signals, Macmillan Pub. Company,
2000. [3] Kondoz A.M., Digital Speech: Coding for Low Bit Rate Communications Systems, J. Wiley,
1994. [4] Rabiner, L, J. and Juang, B.H. Fundamentals of Speech Recognition, Prentice Hall, 1993. [5] Prosise, J., Programming Windows with MFC 2nd Edition, Microsoft Press, 1999. [6] Kruglinskl, D.J., Wingo, S. and Shepherd G., Programming Visual C++ 5th Edition,
Microsoft Press, 1998. [7] Kain, E., The MFC Answer Book – Solutions for Effective Visual C++ Applications, Addison
Wesley, 1998 [8] Barnwell III, T. P. et al. Speech Coding: A Computer Laboratory Textbook, Join Wiley & Sons, Inc.,
1996.
[9] Ying, G.S.; Jamieson, L.H.; Michell, C.D.; A probabilistic approach to AMDF pitch detection, Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on , Volume: 2 , 3-6 Oct. 1996 Page(s): 1201 -1204 vol.2