109
4. Implementation of speech compression using Wavelet Transform
4.1 Introduction The design of the software for the speech compression is based on the concepts
of the lifting scheme and conventional filter bank method of wavelet transforms.
This software is developed in C language. However to provide graphical user
interface, to visualise the speech waveforms, recording and playing speech files
and to simplify data analysis, this software is also written in visual C++.
4.2 Design criteria • We choose to implement Haar, Cubic, CDF and Daubechies-4 wavelets with
the lifting scheme and Haar, Daubechies series (Daub-4, Daub-6, Daub-8,
Daub-10, Daub-12, and Daub-16 Wavelets with filter bank method.
• If speech file is small (less then 10 KB) then it can be compressed without
dividing into blocks (frames) or it can be compressed in blocks.
• If the speech file is large then entire file is divided into the blocks of size 256,
512 or 1024 bytes.
• To simulate near real time speech compression, speech buffers of the size
256 bytes are used.
• Level dependant and global thresholding techniques are used for lossy
speech compression. Hard thresholding and soft thresholding techniques are
used for de-noising.
• Programming is done in C language because it is flexible, structured and
efficient language. Programs written in C language requires less execution
time compare to other high level languages. Most of digital signal processors
manufacturers (Texas, AT&T, Motorola etc.) provide C compilers, simulators
and emulators. These C language compilers provide standard C language
with extensions for DSP to generated efficient code. C language has high-
level language capabilities (structure, functions, arrays etc.) and low-level
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
110
assembly language capabilities (Bit manipulation, direct hardware input-
output, calling of assembly language macros etc.)
• The software is also developed using Visual C++ to provide graphical
interface, to display speech waveform, recording and playing speech file. This
software displays original file size, compressed file size, number of zeros,
compression ratio, statistical comparison between original and reconstructed
speech that simplify results analysis.
• Optimization techniques are used to reduce computational time50. Logic
optimisation is done. In lifting scheme, floating point operations are converted
into fixed point operations. In filter bank method, fast wavelet transform
algorithm is used to reduce computational burden..
4.3 Wavelet Transform algorithms
Speech compression using wavelet transform is performed with following steps:
1. Read speech samples from the standard speech file frame by frame. Each
frame consists of 256 samples.
2. Apply wavelet transform on frame of 256 samples.
3. Apply thresholding (replace small wavelet coefficients by zero).
4. Encode wavelet coefficients.
5. Store encoded wavelet coefficients.
6. Read next frame and repeat steps 2 to 5 for entire speech file.
Speech is reconstructed using inverse wavelet transform with following steps:
1. Read the first frame of wavelet coefficients from the compressed file.
2. Decode wavelet coefficients. (Restore zeros).
3. Apply inverse wavelet transform on the frame.
4. Store the frame in the standard speech file.
5. Read next frame of wavelet coefficient and repeat steps 2 to 4.
The wavelet transform algorithms using Haar, Daubechies series, cubic
interpolation, CDF (Cohen-Daubechies-Feauveau) wavelet used in this thesis are
presented here.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
111
4.3.1 Haar Wavelet Transform algorithm Let us consider a sequence of data samples: data[0], data[1], …….., data[n] Haar transform on this sequence can be calculated with following steps:
• The sequence of data samples is divided into two frames: odd frame and
even frame.
• Samples of odd frame are subtracted from samples of even frame that
gives detail coefficients.
• Even samples are replaced by average of even and odd samples that
gives average coefficients.
The approximate coefficients “A” and detail coefficients “D” are overwrite on the
same data sequence to save memory: data[1] = D or data[1] = data[1] - data[0] data[0] = A or data[0] = data[0] + data[1]/2
Inverse Haar transform can be calculated by changing the sign of operation.
data[0] = data[0]-data[1]/2 data[1] = data[1] + data[0]
This steps can be implemented with C code as given below:
Forward Haar Wavelet Transform:
for(s=2; s<=n; s=s<<1) {
for(k=0; k<n; k+=s) { data[k+(s>>1)] = data[k+(s>>1)]-data[k];
data[k] = data[k] +((data[k+(s>>1)])>>1); }
}
A=(data[0]+data[1])/2
D= data[1]-data[0]
Input samples data[0],data[1] ..data[n]
Approximate Coefficients
Detail Coefficients
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
112
Inverse Haar Wavelet Transform:
for(s=n; s>=2; s=s>>2 ) {
for(k=0; k<n; k+=s ) {
data[k] = data[k]-((data[k+(s>>1)])>>1); data[k+(s>>1)] = data[k+(s>>1)]+data[k];
} }
Example:
Let us consider data sequence: Data = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] After splitting data into even frame and odd frame we get: Even data = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = [1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31] After first iteration of the Haar wavelet transform we get: Even data = cA#
0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = cD0 = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ] In the second iteration, detail coefficient cD0 are kept as it is and even
coefficients cA0 are further decomposed into two frames. After second iteration
we get:
Evan data = cA1 = [1 5 9 13 17 23 25 29 ] Odd data = cD1 = [2 2 2 2 2 2 2 2 ] Similarly, after third iteration we get: Even data = cA2 = [3 11 19 27] Odd data = cD2 = [ 4 4 4 4] If we arrange the transformed data in the sequence cA2, cD2, cD1, cD0, then
wavelet-transformed sequence can be given as:
Data=[3 11 19 27 4 4 4 4 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
# cA0 means approximation coefficients at level 0. cD0 means detail coefficient at level 0.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
113
In the inverse Haar transform, cA1 is calculated from cA2 and cD2. cA0 is
calculated from cA1 and cD1. Original data sequence is calculated from cA0 and
cD0.
If we apply inverse wavelet transform on the coefficient sequence without
thresholding, we get reconstructed data same as original. If we truncate small
wavelet coefficients to zero, then we get error between original data and
reconstructed data. The value of threshold depends on how much error we want
to tolerate.
Let us apply thresholding in the transformed sequence. Let us use threshold=2
i.e. all the coefficients less then “2” are reduced to zero. In our example, all the
values of the cD0 are less then “2” so we get,
Data=[ 3 11 19 27 4 4 4 4 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cA2 cD2 cD1 cD0
In the first iteration, cA1 is calculated from cA2 and cD2: cA1 = [1 5 9 13 17 23 25 29 ] cD1 = [2 2 2 2 2 2 2 2 ] In second iteration, cA0 is calculated from cA1 and cD1: cA0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] cD0 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] In third iteration data “rData” is reconstructed form cA0 and cD0. rData = [ 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30] It can be seen that reconstructed data “rData” is not same as that of original data
because of thresholding. Reconstructed data is approximate version of original
data. As threshold value increases, error between original data and
reconstructed data increases. The detailed error analysis is given in chapter 5.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
114
4.3.2 CDF (Cohen-Daubechies-Feauveau) Wavelet Transform Cohen-Daubechies-Feauveau wavelet is a family of bi-orthogonal wavelets. It is
having a finite support. Scaling function is always symmetric. It can be
implemented in two ways:
1. Using lifting scheme
2. Using conventional sub-band coding method (Filter bank method) Family of Cohen-Daubechies-Feauveau wavelet is represented by CDF(n,ñ).
Primal wavelet has n vanishing moments and dual wavelet has ñ vanishing
moments. Smoothness is directly related with the vanishing moments. As
number of vanishing moments increases, wavelet function becomes more
smooth.
In CDF(2,2), the primal and dual wavelet has two vanishing moments. It has 5
analysis filter coefficients and 3 synthesis filter coefficients.
Implementation of CDF(2,2) wavelet with lifting scheme is discussed here:
Let us consider a data set of “n” speech samples λ0,k. Where, k = 0 to n
��This data set is divided into two parts. Even frame consists of even
samples such as λ0,0 λ0,2 λ0,4 etc. Odd frame consists of odd samples
such as λ0,1 λ0,3 λ 0,5 etc.
��Odd samples are predicted from even samples by prediction function.
Difference between predicted value and actual value represents loss of
information.
Hence, loss of information can be represented by,
ϒ-1, k = λ 0,2k+1 – ½( λ -1,k + λ -1, k+1) Where, λ 0,2k+1 is actual value of odd sample. ½ ( λ -1,k + λ -1, k+1) is average of neighbouring even samples.
��In second level, all even samples are further divided into two frames and
the process repeated as shown in lifting process chart in the Fig 3.4.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
115
Now let us assume that original speech samples are stored in the data file then
following procedure is required:
��Read speech data file dynamically and put it in array called data[k].
$OO�WKH�VSHHFK�VDPSOHV� �0,k are placed in this array.
��Run wavelet transform up to Nth level. Where, N=log2(n)
��Calculate wavelet coefficient for odd sample at each level.
��Store wavelet coefficients in the output file called “output.wlt”.
For simplicity, let us consider total 16 samples. Hence value of N=4.
First Iteration:
data[1] = data[1] – ½(data[2]+data[0]) data[3] = data[3] – ½(data[4]+data[2])
data[5] = data[5] – ½(data[6]+data[4]) data[7] = data[7] – ½(data[8]+data[6])
: :
data[15] = data[15] – ½(data[16]+data[14]) (Assuming boundary data[16]=0;) Second Iteration:
data[2] = data[2] – ½(data[4]+data[0]) data[6] = data[6] – ½(data[8]+data[4]) data[10] = data[10] – ½(data[12]+data[8]) : data[14] = data[14] – ½(data[16]+data[12]) Third Iteration: data[4] = data[4] – ½(data[8]+data[0]) data[12] = data[12] – ½(data[2]+data[0]) Fourth Iteration: data[8] = data[8] – ½(data[16]+data[0])
These iterations can be implemented by following C code:
for(x=3; x>0; x--) { for(k=0; k<4; k=k+(1<<(3-x)) { data[2*k+(1<<(3-x)]-=(data[2*k+(1<<(2-x)]+data[2*k])>>1; } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
116
In general, we may write above loops for “n” samples:
N=log(n)/log 2; for(x=N; x>0; x--) { for(k=0; k<(1<<(N>>1); k=k+(1<<(N-x)) { data[2*k+(1<<(N-x)]-=(data[2*k+(1<<(N-x)]+data[2k])>>1; } }
Inverse wavelet transform is performed to obtain reconstructed samples from the
transformed coefficients. Samples are reconstructed by adding loss of
information into predicted value.
�0,2k+1 = ϒ-1, k ���ò�� �-1, k �� �-1, k+1 )
Loss of information is stored at the same location to perform in-place calculation.
If we consider n=16 and N=4 inverse wavelet transform can be calculated as under:
First Iteration: data[8] = data[8] + ½ (data[16]+data[0]) Second Iteration: data[4] = data[4] + ½(data[8]+data[0])
data[12] = data[12] + ½(data[2]+data[0])
Third Iteration: data[2] = data[2] + ½(data[4]+data[0])
data[6] = data[6] + ½(data[8]+data[4]) data[10] = data[10] + ½(data[12]+data[8]) : data[14] = data[14] + ½(data[16]+data[12])
Fourth Iteration: data[1] = data[1] + ½(data[2]+data[0])
data[3] = data[3] + ½(data[4]+data[2]) data[5] = data[5] + ½(data[6]+data[4]) data[7] = data[7] + ½(data[8]+data[6])
: :
data[15] = data[15] + ½(data[16]+data[14])
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
117
Inverse wavelet transform can be implemented with following C code:
for(x=1; x<=3; x++) { for(k=0; k<4; k=k+(1<<(3-x)) {
data[2k+(1<<(3-x)]+=(data[2k+(1<<(2-x) ]+data[2k])>>1; } }
In General, for n samples:
N=log n/log 2; for(x=1; x<=N; x++) { for(k=0; k<(1<<(N>>1); k=k+(1<<(N-x)) { data[2*k+(1<<(N-x)]+=(data[2*k+(1<<(N-x) ]+data[2*k])>>1; } }
Modified C code for Forward and inverse wavelet transform is given below: Forward wavelet transform:
for(s=2; s<n; s<<=1) { for(k=0; k<n; k=k+s) { data[k+(s>>1)]- = (data[k]+data[k+s])>>1; } }
Inverse wavelet Transform:
for(s=(n>>1); s>=2; s>>=1) { for(k=0; k<n; k=k+s) { data[k+s/2]+=(data[k]+data[k+s])>>1; } }
Example: Let us consider the same data sequence as we have used in Haar wavelet: Data = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31]
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
118
After splitting data into even frame and odd frame we get: Even data = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = [1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31] After first iteration of the CDF(2,2) wavelet transform we get:
Even data = cA0 = [0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30] Odd data = cD0 = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] Second iteration:
Even data = cA1 = [0 4 8 12 16 20 24 30 ] Odd data = cD1 = [0 0 0 0 0 0 0 0 ] Third iteration:
Even data = cA2 = [0 8 16 24] Odd data = cD2 = [ 0 0 0 0] Fourth Iteration:
Even data = cA3 = [0 16] Odd data = cD3 = [ 0 0 ] If we arrange the transformed data in the sequence cA3, cD3, cD2, cD1, cD0, then
wavelet-transformed sequence can be given as:
Data=[ 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
cA3 cD3 cD2 cD1 cD0
If we encode above sequence with run-length encoding, then it can be
represented with four bytes compare to original data size 32 bytes. Thus, CDF
wavelet gives more compact representation of data compare to Haar wavelet. In
this example, we have considered linear data; hence it can be compressed
efficiently without applying thresholding. Thus efficient loss less compression of
linear data is possible. However, real time data, such as speech samples are not
linear, hence we have to apply thresholding operator for the compression.
Filter bank Implementation of CDF(2,2) wavelet uses following filter coefficients:
Analysis LPF h = { -0.125, 0.25, 0.75, 0.25, -0.125} Analysis HPF g = { 0.25, -0.5, 0.25 } Synthesis LPF Ih = { 0.25, 0.5, 0.25 } Synthesis HPF Ig = {0.125, 0.25, -0.75, 0.25, 0.125} C code for the filter bank implementation is given in the section 4.3.4.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
119
4.3.3 Cubic interpolations in lifting scheme of wavelet transform In CDF(2,2) wavelet transform, linear interpolation is used to predict odd sample
from neighbouring even samples. If we use cubic interpolation then odd samples
are predicted from four neighbouring samples. Weighting factors associated with
each sample.are derived in the section 3.6. We will use weighting factors given in
Table 3.2. Cubic interpolating subdivision gives good prediction for the speech
signal. Boundary treatment is required for right most and left most samples.
Prediction in four different cases:
• Two neighbouring samples on left and two samples on right.
)0625.05625.05625.00625.0( 3,1,1,3,1, +−+−−−−−+−− −++−−= kjkjkjkjjkj λλλλλγ
• One neighbouring sample on left and three samples on right.
)0625.03125.09375.03125.0( 5,3,1,1,1, +−+−+−−−+−− +−+−= kjkjkjkjjkj λλλλλγ
• Three samples on left and one sample on right.
)3125.09375.03125.00625.0( 1,1,3,5,1, +−−−−−−−+−− ++−−= kjkjkjkjjkj λλλλλγ
• Four sample on left and no sample on right
)1875.21875.23125.13125.0( 1,3,5,7,1, −−−−−−−−+−− +−+−−= kjkjkjkjjkj λλλλλγ
Cubic interpolation prediction can be implemented with following C code: void CubicTransform(unsigned char data[ ],int n){ int s,k; unsigned char predict; for( s = 2; s <= (n>>1); s <<= 2 ) { for(k = 0; k < n; k += s ){ if(k==0) /* 1 left & 3 right sample*/ predict=k2*data[k]+k3*data[s]-k2*data[2*s]+k0*data[3*s]; else if(k==n-s) /* 1 right & 3 left sample*/ predict=k0*data[k-2*s]-k2*data[k-s]+k3*data[k]+k2*data[k+s]; else if(k==n-s/2) /* 4 left sample & 0 right sample*/ predict=-k2*data[k-3*s]+k5*data[k-2*s]-k4*data[k-s]+k4*data[k]; else /* 2 left sample & 2 right sample*/ predict= -k0*data[k-s]+k1*data[k]+k1*data[k+s]-k0*data[k+2*s]; data[k+s/2] -= predict; } } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
120
Inverse transform can be given by following C code: void inv_CubicTransform(unsigned char data[],int n) { int s,k; unsigned char predict; for( s = (n>>1); s >= 2; s >>= 2 ) { for(k = 0; k < n; k += s ) { if(k==0) predict=k2*data[k]+k3*data[s]-k2*data[2*s]+k0*data[3*s]; else if(k==n-s) predict=k0*data[k-2*s]-k2*data[k-s]+k3*data[k]+k2*data[k+s]; else if(k==n-s/2) predict=-k2*data[k-3*s]+k5*data[k-2*s]-k4*data[k-s]+k4*data[k]; else predict= -k0*data[k-s]+k1*data[k]+k1*data[k+s]-k0*data[k+2*s]; data[k+s/2] += predict; } } } 4.3.4 Daubechies wavelet transforms Daubechies wavelets are very popular because they are good compromise
between compact support and smoothness.
Lifting scheme implementation of Daubechies wavelet transform: Daubechies wavelet transform can be implemented using lifting scheme of
wavelet transform by factorising it into lifting steps as discussed in section 3.5.3.
Let us consider a data set of “n” speech samples λ0,k. Where, k = 0 to n. Lifting
steps for Daubechies-4 wavelet transform are as under:
• Split the data set into even and odd frames.
λ-1,k ←λ0,2k (Even samples) and ϒ-1,k ← λ0,2k+1 (Odd samples)
• Update 1 stage: Update even samples from the odd samples.
λ-1,k ←λ-1,k + √3ϒ-1,k
• Predict stage: Predict odd samples from the even samples.
ϒ-1,k ← ϒ-1,k - 43 λ-1, k –
423 − λ-1, k-1
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
121
• Update 2 stage: Update even samples from predicted odd samples
λ-1,k ← λ-1,k - ϒ-1,k+1
• Normalize stage:
λ-1,k ← 2
13 − λ-1,k and ϒ-1,k ←
2
13 + ϒ-1,k
Above lifting steps are implemented with following C code: void update1(int data[], const int N) { int half=N>>1; int j; float sqrt3 = sqrt(3); for(j=0;j<half; j++) { data[j]=data[j]+sqrt3*data[half+j]; } } void predict_daub(int data[],const int N) { int i; int half=N>>1; data[half]=data[half]-sqrt3*data[0]/4- (sqrt3-2)*data[half-1]/4; for(i=1; i<half; i++){ data[half+i]=data[half+i]-sqrt3*data[i]/4-(sqrt3-2)*data[i-1]/4; } } void update2(int data[], const int N) { int i; int half=N>>1; int j=half; for(i=0; i<half-1; i++) { data[i]=data[i]-data[half+1+i]; } data[half-1]=data[half-1]-data[half]; } void normalize(int data[], const int N) { int i; int half=N>>1; for(i=0; i<half; i++) { data[i]=(sqrt3-1)*data[i]/(sqrt(2)); data[i+half]=(sqrt3+1)*data[i+half]/(sqrt(2)); } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
122
Daubechies Inverse wavelet transform using lifting scheme is exactly reverse
process as described under:
• Undo normalize: λ-1,k ← 2
13 + λ-1,k and ϒ-1,k ←
2
13 − ϒ0,k
• Undo Update 2: λ-1,k ← λ-1,k + ϒ-1,k+1
• Undo Predict: ϒ-1,k ← f(2k+1) + 43 λ-1, k +
423 − λ-1, k-1
• Undo Update 1: λ-1,k ←λ-1,k - √3ϒ-1,k • Merge: λ0,k ← λ-1,k + ϒ-1,k
Daubechies inverse wavelet transform can be implemented with following C
code:
void undo_update1(int data[],int N) { int half=N>>1; int j; for(j=0;j<half; j++) { data[j]=data[j]-sqrt(3)*data[half+j]; } } void undo_predict_daub(int data[],const int N) { int i; int half=N>>1; data[half]=data[half]+sqrt3*data[0]/4+(sqrt3-2)*data[half-1]/4; for(i=1; i<half; i++) { data[half+i]=data[half+i]+sqrt3*data[i]/4+(sqrt3-2)*data[i-1]/4; } } void undo_update2(int data[], int N) { int i; int half=N>>1; int j=half; for(i=0; i<half-1; i++) { data[i]=data[i]+data[half+1+i]; } data[half-1]=data[half-1]+data[half]; }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
123
void undo_normalize(int data[], int N) { int i; int half=N>>1; for(i=0; i<half; i++) { data[i]=(sqrt3+1)*data[i]/(sqrt(2)); data[i+half]=(sqrt3-1)*data[i+half]/(sqrt(2)); } } Filter bank implementation of Daubechies wavelets transforms: In the filter bank implementation, speech samples f(k) are applied at the input of
Daubechies wavelet filter bank. The input speech samples are convolved with
low-pass and high-pass analysis filters h and g. The output of each filter is down
sampled by two, yielding the transformed signals cA0 and cD0. Down sampling is
performed to keep total number of transformed coefficients same as that of input
samples.
The coarse information cA0 is further applied at the input of analysis filter bank
and decomposed into cA1 and cD1. This process is repeated “N” times to achieve
“N level” decomposition.
Equation 4.1 and 4.2 describes practical implementation of Daubechies wavelet
transform. This implementation is fast compare to pyramidal algorithm in which
wavelet filter coefficients are arranged in form of matrix and matrix is multiplied
with input samples. In pyramidal algorithm, half of the computation is waste
because of down sampling.
∑−
=
+=1
0
)2()()(N
mj kmfmhkcA 4.1
∑−
=
+=1
0
)2()()(N
mj kmfmgkcD 4.2
Where, cAj(k) =Coarse coefficeints at level j
cDj(k)= Detail coefficients at level j
h(m) = Low pass filter coefficients
g(m) = High pass filter coefficients
N = Number of filter coefficients
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
124
C code for the Daubechies wavelet transform is given here:
for(k=0; k<(length_data>>1); k++) { cA[k]=0.0;
cD[k]=0.0; for(m=0;m<length_filt; m++) {
cA[k]+=h[m]*data[(m+2*k)%length_data]; cD[k]+=g[m]*data[(m+2*k)%length_data]; }
} for(int i=0; i<(length_data>>1); i++) {
data[i]=cA[i]; data[(length_data>>1)+i]=cD[i];
} As described in section 2.3.7, Daubechies Inverse wavelet transform gives
perfect reconstruction because synthesis filters are mirror image of analysis
filters. In the inverse wavelet transform, the coefficients sequences “cA” and “cD”
are applied at the input of synthesis filter bank after up sampling.
C code for the Daubechies inverse wavelet transforms:
for(m=0; m<(length_data>>1); m++) {
cA[m]=data[m]; cD[m]=data[length_data/2+m];
} for(k=0;k<length_data;k+=2) { buff[k]=0.0; data[k]=0.0; for(m=0;m<length_filt; m+=2) { buff[k]+=Ih[m]*cA[((k-m)/2+half_length)%half_length]; data[k]+=Ig[m]*cD[((k-m)/2+half_length)%half_length]; } }
for(k=1;k<length_data; k+=2) { buff[k]=0.0; data[k]=0.0;
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
125
for(m=1;m<length_filt; m+=2) { buff[k]+=Ih[m]*cA[((k-m)/2+half_length)%half_length]; data[k]+=Ig[m]*cD[((k-m)/2+half_length)%half_length]; } } for(k=0;k<length_data; k++) data[k]+=buff[k];
The list of filter Daubechies filter coefficients used in this thesis is given in
Appendix D.
4.3.5 Encoding We get number of zeros and small wavelet coefficients after applying wavelet
transform on the speech signal. If we replace small wavelet coefficients with
zeros by applying thresholding, then number of zeros will increase. If we store
the wavelet coefficients after thresholding then we will not get any compression
because storage of zeros will also require memory space. Advantage of applying
wavelet transform for the speech compression becomes evident if we apply
encoding to take advantage of series of zeros present in the sequence. If we
apply encoding directly on the speech samples, we will not get much
compression. Wavelet transform gives us sparse organized data with sequence
of zeros, hence encoding after the wavelet transform gives good compression.
Hence some encoding methods are necessary to take advantage of wavelet
transform for the speech compression.
Encoder allocates less number of bits to the zeros. To encode the wavelet-
transformed speech, we have used different methods like run length encoding,
Huffman coding and LZW coding. Run length encoding is very attractive from the
point of view of real time speech compression. Huffman coding72 is a popular
loss-less coding technique that compresses a signal by mapping the signal
symbols to variable length binary codes. LZW coding32 is dictionary based
encoding system that gives good compression ratio. All the results presented in
this study are based on LZW encoding after wavelet transform.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
126
4.4 Software development in C and Visual C++ Main programs performs following tasks:
• Read speech file. • Write speech file. • Reading and writing bit files. • Apply forward wavelet transform using different wavelets. • Encode wavelet coefficients. • Decode wavelet coefficients. • Perform inverse wavelet transform using different wavelets to reconstruct
speech file. • Statistical comparison of two speech files. • Recording and playing speech files (In Visual C++)
4.4.1 Functions used in the C program
Various functions used to perform above tasks in this software are explained
here:
Name: split Declaration: void split(int[],int) Description: This function splits input frame of 256 bytes into odd and even frames.
Splitting of the given array can be done with following codes
for(i=0; i<n; i++) { even_data[i]=data[2*i]; odd_data[i]=data[2*i+1]; }
The problem with this code is that it requires separate array for the storage of
even and odd data. The better method is to split data array in such a way that no
duplicate array is required. For example if there are total N number of data split
function should arrange first N/2 data as odd samples and rest N/2 data as even
samples.
Let us consider sample array data[8];
0 1 2 3 4 5 6 7 1st Iteration 0 2 1 4 3 6 5 7 2nd Iteration 0 2 4 1 6 3 5 7 3rd Iteration 0 2 4 6 1 3 5 7
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
127
This can be done with help of following C code:
void split(int data[],const int N) { int start=1; int end = N-1; while(start<end) { int i; int tmp; for(i=start; i<end; i+=2) { tmp=data[i]; data[i]=data[i+1]; data[i+1]=tmp; } start=start+1; end=end-1; }
Name: predict Declaration: void predict_cdf(int data[],const int N) Description: This function is used to predict odd samples from the set of even samples. Then
predicted value is subtracted from the original odd sample values in order to
remove redundancies. This function is applied on the data array after split()
function.
void predict_cdf(int data[],const int N) {
int i; int half=N>>1; int j=0; for(i=half; i<N-1; i++) { data[i]=data[i]-((data[j]+data[j+1])>>1); j++; }
} Name: update_cdf Declaration: void update_cdf(int data[], const int N) Description: This function update even samples from the predicted set of odd samples in
order to maintain overall average value of the signal. This function update (lift)
values of even samples (coarse signal) from the detail samples (original value -
predicted value). If our prediction is correct and if we are getting all details equal
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
128
to zero then this function does not perform any job. If prediction is wrong then
even samples are modified to maintain average of the signal. void update_cdf(int data[], const int N) {
int half=N>>1; int j=half; for(int i=1; i<half; i++) { data[i]=data[i]+((data[j]+data[j+1])>>2); j++; }
} Name: undo_update_cdf Declaration: void undo_update_cdf(int[], const int) Description: Undo update is the reverse process of update, which is used for inverse wavelet
transform. By using undo_update function, even samples are recovered back by
subtracting update information. Sign of mathematical operation is reversed to
obtain undo_update.
void undo_update_cdf(int data[], const int N) { int half=N>>1; int j=half; int i; for(i=1; i<half; i++) {
data[i]=data[i]-((data[j]+data[j+1])>>2); j++;
} }
Name: undo_predict_cdf Declaration: void undo_predict_cdf(int [], const int) Description: Undo predict is reverse process of predict, which is used for inverse wavelet
transform. This function adds loss of information to the predicted value. Odd
samples can be recovered by using even samples and detail values (loss of
information). Sign of mathematical operation is reversed in predict function to
obtain undo_predict.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
129
void undo_predict_cdf(int data[], const int N) { int i; int half=N>>1; int j=0; for(i=half; i<N-1; i++) {
data[i]=data[i]+((data[j]+data[j+1])>1); j++;
} }
Name: merge Declaration: void merge(int [], const int) Description:. This function is used to combine odd and even samples to obtain reconstructed
original data. This is reverse process of split function. This function passes data
array and length of data array as arguments. First half of the data array is even
samples and next half is odd samples. Both are merged using following code.
void merge(int data[],const int N) { int half=N>>1; int start=half-1; int end = half; int tmp,i; while(start>0) { for(i=start; i<end; i+=2) { tmp=data[i]; data[i]=data[i+1]; data[i+1]=tmp; } start=start-1; end=end+1; } }
Let us consider sample array data[8], N=8, half=4 0 2 4 6 1 3 5 7 First iteration: 0 2 4 1 6 3 5 7 (start=3 end=4) Second iteration: 0 2 1 4 3 6 5 7 (start=2 end=5) Third iteration: 0 1 2 3 4 5 6 7 (start=1 end=6)
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
130
Name: CdfTransform() Declaration: void CdfTransform(int [],int) Description: This function performs wavelet transform on input data samples. This function
calls three basic functions of the wavelet transform using lifting scheme: split,
predict and update function. The level of transformation is N=log2n (Where n is
length of the data). It arranges wavelet coefficients in the sequence cAN, cDN,
cDN-1, cDN-2, … . cD0. In the first iteration, it decomposes input data sequence into
coarser sequence cA0 and detail sequence cD0. In the second iteration, it further
decomposes cA0 into cA1 and cD1 and so on up to cAN and cDN. The function is
written such that it does not require separate arrays to store these sequences. In-
place calculations allow us to place all the values in the single array.
void CdfTransform(int data[],int n) { int i; for(i=n; i>1; i=i>>1) { split(data,i); predict(data,i); update(data,i); } }
Name: inv_CdfTransform Declaration: void inv_CdfTransform(int [], int) Description: This function performs inverse wavelet transform to reconstruct speech data from
wavelet coefficients. This function calls three basic functions of the inverse
wavelet transform with lifting scheme: undo update, undo predict and merge.
void inv_CdfTransform(int data[], int n) { int i; for(i=2; i<=n; i=i<<1) { undo_update(data,i); undo_predict(data,i); merge(data,i); } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
131
Name: max Declaration: int max(unsigned char[], int) Description: This function finds out maximum value of wavelet coefficient out of block (Block
may consists of 256, 512 or more samples depends on block length). This
function is useful for level dependant thresholding.
int max(unsigned char data[], int n) { int i; int maxvalue=0; for(i=0;i<n;i++) { if(maxvalue<data[i]) maxvalue=data[i]; } return maxvalue; }
Name: open_obfile Declaration: BFILE *open_obfile( char * ) Description: This function opens output bit file. This file allows us to write bit. Sample code is
given below: BFILE *open_obfile( char *name ) { BFILE *b1; b1 = (BFILE*) calloc(1, sizeof( BFILE)); if (b1==NULL) return(b1); b1->file = fopen(name, "wb"); b1->row = 0; b1->mask = 0x80; return(b1); }
Structure BFILE consists of member variables file pointer f1, unsigned char mask
and integer row. The variables mask and row are used to manage bit read write
operation.
Name: open_ibfile Declaration: BFILE *open_ibfile(char * ) Description: This function opens input bit file. This file allows us to read bit. This routine is
same as open_obfile except that the file is opened in read binary mode. Instead
of b1->file = fopen(name, "wb"); b1->file = fopen(name, "rb"); is used.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
132
Name: close_obfile Declaration: void close_obfile( BFILE * ) Description: This function closes output bit file.
void close_obfile(BFILE *b1) { if (b1->mask != 0x80) putc( b1->row, b1->file ); fclose( b1->file ); free((char*)b1 ); }
Name: close_ibfile Declaration: void close_ibfile( BFILE * ) Description: This function closes input bit file.
void close_ibfile(BFILE *b1 ) { fclose( b1->file ); free((char*)b1); }
Name: write_bit Declaration: void write_bit( BFILE *bit_file, int bit ) Description: Normal file functions available in C language allow us to write byte at a time but it
does not allow us to write single bit. This function allows us to write individual bit
in the file. The variable “mask” and “row” are defined to read and write bit. The
variable mask is initialised to 0x80. The mask is shifted to right after every bit
read/write operation. The value of mask becomes zero after total eight shift
operations and row is completed. This row is written into or read from the file
using standard C functions. Sample code for write_bit is given below:
void write_bit( BFILE *b1, int bit ) { if ( bit ) b1->row =b1->row | b1->mask; b1->mask >>= 1; if ( bit_file->mask == 0 ) { putc( b1->row, b1->f1 ); b1->row = 0; b1->mask = 0x80; } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
133
Name: write_bits Declaration: void write_bits( BFILE *bit_file,unsigned long code, int count ) Description: This function allows us to write more than one bit at a time in the specified bit file.
The number of bits is passed as an argument to the function. Local variable
“mask” is used to keep track of bits.
void write_bit(BFILE *b1, unsigned long code int bit_count) { unsigned long mask; mask=1; mask=mask<<(bit_count-1); while(mask!=0) { if ( mask & code ) b1->row =b1->row | b1->mask; b1->mask >>= 1; if ( bit_file->mask == 0 ) { putc( b1->row, b1->f1 ); b1->row = 0; b1->mask = 0x80; } mask=mask>>1; }
Name: read_bit Declaration: int read_bit( BFILE* bit_file ) Description: Normal file functions in C allow us to read byte at a time but it does not allow us
to read a single bit. This function allows us to read individual bit in the file.
int read_bit( BFILE* b1 ) { int value; if ( b1->mask == 0x80 )b1->row = getc( b1->file ); value = b1->row & b1->mask; b1->mask >>= 1; if ( b1->mask == 0 ) b1->mask = 0x80; return( value ? 1 : 0 ); }
Name: read_bits Declaration: unsigned long read_bits( BFILE *bit_file, int bit_count ) Description: This function allows us to read more than one bit at a time from the specified bit
file.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
134
unsigned long read_bits( BFILE *b1, int bit_count ) { unsigned long mask=1; unsigned long value=0; mask = mask << (bit_count-1); while ( mask != 0) { if (b1->mask == 0x80) b1->row = getc( b1->file ); if (b1->row & b1->mask) value = value| mask; mask >>= 1; b1->mask=b1->mask>>1; if (b1->mask == 0) b1->mask = 0x80; } return(value ); }
Name: count_bytes Declaration: void count_bytes( FILE *, unsigned long *); Description: To build the Huffman Tree, it is necessary to find out relative frequencies of each
wavelet coefficient. This function counts bytes in the specified file. The position of
the file input is saved and it is restored after counting.
void count_bytes(FILE *input, unsigned long *counts) { long position; int c; position = ftell( input ); while ( ( c = getc( input )) != EOF ) counts[ c ]++; fseek( input, position, SEEK_SET ); }
Name: scale_counts Declaration: void scale_counts( unsigned long *, NODE *); Description: This function finds out maximum count for any symbol in the file. All the count
values are scaled in such a fashion that it can fit in single unsigned char.
void scale_counts( unsigned long *counts, NODE *nodes ) {
unsigned long max_count; int i; max_count = 0; for ( i = 0 ; i < 256 ; i++ ) if ( counts[ i ] > max_count )
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
135
max_count = counts[ i ]; if ( max_count == 0 ) { counts[ 0 ] = 1; max_count = 1; } max_count = max_count / 255; max_count = max_count + 1; for (i = 0;i<256;i++) {
nodes[i].count = (unsigned int)(counts[i]/max_count); if ( nodes[i].count == 0 && counts[i]!=0) nodes[i].count = 1;
} nodes[ END_OF_STREAM ].count = 1;
}//End of scale_counts() NODE is the structure consists of integer variables “child0”, “child1” and associated counts. Name: build_tree Declaration: int build_tree( NODE* ); Description: This function builds Huffman tree by combining two free nodes with lowest weight
into new internal node with combine weight of the nodes. This is done by one
loop and loop exit when only one node is left.
int build_tree(NODE *nodes ) { int next_free; int i; int min_1; int min_2; nodes[ 513 ].count = 0xffff; for ( next_free = END_OF_STREAM + 1 ; ; next_free++ ) { min_1 = 513; min_2 = 513; for ( i = 0 ; i < next_free ; i++ ) if ( nodes[i].count != 0 )
{ if (nodes[i].count<nodes[ min_1 ].count)
{ min_2 = min_1; min_1 = i; }
else if (nodes[i].count<nodes[min_2].count) min_2 = i; }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
136
if ( min_2 == 513 )break; nodes[next_free].count=nodes[min_1].count+nodes[min_2].count; nodes[ min_1 ].count = 0; nodes[ min_2 ].count = 0; nodes[ next_free ].child_0 = min_1; nodes[ next_free ].child_1 = min_2; } next_free--; return( next_free ); } All nodes below 257 have count value set to their frequency. If it is non-zero
value then it is active node. Node 513 is used for comparison purpose. Initially its
value is set at 65535 that is maximum value. It is sure that all nodes have less
than this value.
Name: convert_tree_to_code Declaration: void convert_tree_to_code( NODE*,CODE*,unsigned int,int,int) Description: This function is used to assign code to each symbol value. Root node of the tree
is assigned zero and by travelling down to individual branch of the tree one or
zero is added to the code each time. Whenever we rich to leaf of desired symbol,
we store the code values for that leaf in the code array and return back to
previous node from where we start journey at the other side of the tree.
void convert_tree_to_code( NODE *nodes, CODE *codes, unsigned int code_so_far, int bits, int node ) { if ( node <= END_OF_STREAM ) { codes[ node ].code = code_so_far; codes[ node ].code_bits = bits; return; } code_so_far <<= 1; bits++; convert_tree_to_code( nodes, codes, code_so_far, bits, nodes[node].child_0 ); convert_tree_to_code( nodes, codes, code_so_far | 1, bits, nodes[ node ].child_1 ); }
Name: output_counts Declaration: void output_counts( BFILE*, NODE*); Description: This function is used to store the symbol counts in the compressed file so that
decoder can read it. All the sample values are not stored. Only present sample
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
137
values are stored. A count run consists of the value of the first symbol in the run
followed by the value of last symbol in the run and it is followed by count values
for all the symbols from first to last.
void output_counts(BFILE *output, NODE *nodes) { int first; int last; int next; int i; first = 0; while ( first < 255 && nodes[ first ].count == 0 ) first++; for ( ; first < 256 ; first = next ) { last = first + 1; for ( ; ; ) { for ( ; last < 256 ; last++ ) if ( nodes[ last ].count == 0 ) break; last--; for ( next = last + 1; next < 256 ; next++ ) if ( nodes[ next ].count != 0 ) break; if ( next > 255 ) break; if ( ( next - last ) > 3 ) break; last = next; } putc( first, output->file); putc( last, output->file); for (i = first ; i <= last ; i++ )
{ putc( nodes[ i ].count, output->file ) } } if ( putc( 0, output->file ) != 0 ) fatal_error( "Error writing byte counts\n" ); }
Name: input_counts Declaration: void input_counts( BFILE*, NODE*); Description: This function is used for decoding. This function is used to read count values
those stored during encoding.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
138
void input_counts(BFILE *input, NODE *nodes) { int first; int last; int i; int c; for ( i = 0 ; i < 256 ; i++ ) nodes[ i ].count = 0; first = getc( input->file ); last = getc( input->file ); for ( ; ; ) { for ( i = first ; i <= last ; i++ ) c = getc(input->file); nodes[ i ].count = (unsigned int) c; first = getc( input->file ); if ( first == 0 ) break; last = getc( input->file ); } nodes[ END_OF_STREAM ].count = 1; }
Name: compress_data Declaration: void compress_data( FILE*, BFILE*,CODE*); Description: This function is used to store Huffman code corresponds to wavelet coefficients
in the output file. This function uses Huffman tree and code table.
void compress_data(FILE *input, BFILE *output, CODE *codes) { int c; while((c=getc(input))!=EOF) write_bits( output,(unsigned long)codes[ c ].code, codes[c].code_bits ); write_bits(output,(unsigned long)codes[ END_OF_STREAM ].code, codes[END_OF_STREAM].code_bits ); } Name: expand_data Declaration: void expand_data( BFILE*, FILE *, NODE *,int); Description: This function reads Huffman code and decodes wavelet coefficient. The tree is
traversed starting at root node, reading a bit in, and taking either the child_0 or
child_1 path. Eventually, the tree winds down to leaf node and corresponding
symbol (wavelet coefficient) is output.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
139
void expand_data(BFILE* input,FILE* output,NODE* nodes, int root_node) { int node; for ( ; ; ) { node = root_node; do { if ( read_bit(input)) node = nodes[node].child_1; else node = nodes[node].child_0; } while (node > END_OF_STREAM); if ( node == END_OF_STREAM ) break; putc(node, output); } }
Name: print_ratios Declaration: void print_ratios( char*, char* ); Description: This function is used to print compression ratio. It takes ratio of original file size to
compressed file size and prints the ratio.
void print_ratios( char *input, char *output ) { long input_size; long output_size; int ratio; input_size = file_size( input ); if ( input_size == 0 ) input_size = 1; output_size = file_size( output ); ratio = input_size * 100L / output_size; printf( "\nInput bytes: %ld\n", input_size ); printf( "Output bytes: %ld\n", output_size ); if ( output_size == 0 ) output_size = 1; printf( "Compression ratio: %d%%\n", ratio ); }
Name: file_size Declaration: long file_size( char*); Description: This function finds out size of the specified file.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
140
long file_size( char *name ) { long eof_ftell; FILE *file; file = fopen( name, "r" ); if ( file == NULL ) return( 0L ); fseek( file, 0L, SEEK_END ); eof_ftell = ftell( file ); fclose( file ); return( eof_ftell ); }
Name: encode Declaration: void encode(char*,char*); Description: This function encodes wavelet coefficients using Huffman algorithm72. This
function calls other functions to calculate probabilities, built Huffman tree, convert
tree to code and finally allocates bits. This function assigns less number of bits to
frequently occurring symbols. For example, zeros in wavelet-transformed data. void encode(char *filein, char *fileout) { BFILE *output; FILE *input; unsigned long *counts; NODE *nodes; CODE *codes; int root_node; input = fopen( filein, "rb" ); if(input == NULL ) fatal_error( "Error in opening input file: %s\n",filein ); output = open_obfile( fileout ); if(output == NULL ) fatal_error( "Error opening output file %s\n", fileout ); counts = (unsigned long *) calloc(256, sizeof(unsigned long)); if(counts == NULL ) fatal_error( "Error allocating counts array\n" ); if((nodes = (NODE *) calloc( 514, sizeof( NODE )))== NULL ) fatal_error( "Error allocating nodes array\n" ); if((codes = (CODE *)calloc(257, sizeof( CODE )))== NULL ) fatal_error( "Error allocating codes array\n" ); count_bytes( input, counts ); scale_counts( counts, nodes ); output_counts( output, nodes ); root_node = build_tree( nodes ); convert_tree_to_code( nodes, codes, 0, 0, root_node ); compress_data( input, output, codes ); free( (char *) counts );
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
141
free( (char *) nodes ); free( (char *) codes ); close_obfile( output ); fclose( input ); print_ratios( filein, fileout); }
Name: decode Declaration: void decode(char*, char*); Description: This function decodes encoded wavelet coefficients. This function calls other
functions to read counts, built Huffman tree based on it and expand bit at a time.
Finally decoded wavelet coefficients are stored in the output file. void decode(char* filein, char* fileout) { FILE *output; BFILE *input; NODE *nodes; int root_node; input = open_ibfile( filein ); if ( input == NULL ) fatal_error( "Error opening input file %s\n", filein); output = fopen( fileout, "wb" ); if ( output == NULL ) fatal_error( "Error opening output file %s\n", fileout ); if ((nodes = (NODE *) calloc(514, sizeof( NODE)))== NULL) fatal_error( "Error allocating nodes array\n" ); input_counts(input, nodes ); root_node = build_tree( nodes ); expand_data(input, output, nodes, root_node ); free((char *) nodes ); close_ibfile( input ); fclose( output ); } Name: zero_encode Declaration: void zero_encode(char*); Description: This function encodes wavelet coefficients by run length encoding. This is the
simplest type of encoding. In this, sequence of zeros is replaced by count value
followed by zero. In the wavelet-transformed speech, there are so many zeros,
so this type of encoding is effective. void zero_encode(char filename[]) { FILE *f1,*f2; char n1,count=0; f1=fopen(filename,"rb");
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
142
f2=fopen("noutput","wb"); while(!feof(f1)) { n1=getc(f1); if(n1==0){count++;} else { if(count!=0) { putc(count,f2); putc(0x00,f2);count=0;} putc(n1,f2); } } if(count!=0) { putc(count,f2); putc(0x00,f2);} fclose(f1); fclose(f2); }
Name: zero_decode Declaration: void zero_decode(char*); Description: This function decodes wavelet coefficients from the encoded file. It reads values
sequentially. When it finds zero value, it reads previous value as a count and
substitutes zeros as per the count value.
void zero_decode(char filename[]) { FILE *f1,*f2; int n1,n2=0; int count=0; int i; f1=fopen(filename,"rb"); if(f1==NULL) printf("Error in opening %s file",filename); f2=fopen("newout","wb"); if(f2==NULL) printf("Can not create newout file"); while(!feof(f1)) { n1=getc(f1); if(n1==0) { count=n2; for(i=1;i<=count;i++) putc(0x00,f2); } else { if(n2!=0) putc((char)n2,f2); } if(feof(f1)) break; n2=getc(f1); if(n2==0) { count=n1; for(i=1;i<=count;i++) putc(0x00,f2); }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
143
else { if(n1!=0) putc((char)n1,f2); } } fclose(f1); fclose(f2); }
Name: zencode Declaration: int zencode(int[] ,int ) Description: It performs same job as zero_encode() but it encodes the wavelet coefficients
frame by frame.
int zero_encode(int data[], int size) { int n1,count=0; int i=0,j=0; while(i<size) { n1=data[i]; if(n1==0){count++;} else { if(count!=0) { ndata[j++]=count; ndata[j++]=0x00; count=0; } ndata[j++]=n1; } i++; } if(count!=0) { ndata[j++]=count; ndata[j++]=0x00;} for(i=0; i<j; i++) data[i]=ndata[i]; return(j); }
Name: zdecode Declaration: int zdecode(FILE *, int data, int size) Description: It performs the same job as zero_decode but it decodes wavelet coefficients
frame by frame from the file.
int zero_decode(FILE *input, int data[],int size) { int n1,n2=0; int count=0; int i;
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
144
int j=0; while(1) { if(feof(input)){data[j]=n2; break;} n1=getw(input); if(n1==0) { count=n2; for(i=1;i<=count;i++) {data[j]=0; j++; if(j==size) return(j);} } else { if(n2!=0) {data[j]=n2; j++; if(j==size){ fseek(input,-2,SEEK_CUR); return(j);}} } if(feof(input)) {data[j]=n1; break;} n2=getw(input); if(n2==0) { count=n1; for(i=1;i<=count;i++)
{data[j]=0; j++;if(j==size) return(j);} } else { if(n1!=0){data[j]=n1; j++; if(j==size){ fseek(input,-2,SEEK_CUR); return(j);}} } } return(j); } Name: lz_encode Declaration: void lz_encode(char*,char*); Description: This function encodes wavelet coefficients using Lempel and Ziv (LZ78)
encoding method72. void lz_encode(char *file1, char *file2) { BFILE *output; FILE *input; int character; int string_code; unsigned int index; input =fopen( file1, "rb" ); if(input == NULL ) fatal_error( "Error opening %s for input\n", file1 ); output = open_obfile(file2); if ( output == NULL ) fatal_error( "Error opening %s for output\n", file2 ); init_storage();
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
145
init_diction(); if ((string_code=getc(input))==EOF) string_code = END_OF_STREAM; while((character=getc(input))!=EOF) { index = find_child_node( string_code, character); if(DICT( index ).code_value !=-1) string_code = DICT( index ).code_value; else { DICT(index).code_value = next_code++; DICT(index).parent_code = string_code; DICT(index).character = (char) character; write_bits( output, (unsigned long) string_code, current_code_bits ); string_code = character; if(next_code>MAX_CODE ) { write_bits(output, (unsignedlong)FLUSH_CODE,current_code_bits ); init_diction(); } else if ( next_code > next_bump_code ) { write_bits(output, (unsignedlong)BUMP_CODE,current_code_bits); current_code_bits++; next_bump_code <<= 1; next_bump_code |= 1; } } close_obfile(output); fclose(input); } Name: lz_decode Declaration: void lz_decode( char *, char *) Description: This function decodes wavelet coefficients using Lempel and Ziv (LZ78)
decoding method72.
void lz_decode( char *file1, char *file2 ) { FILE *output; BFILE *input; unsigned int new_code; unsigned int old_code; int character; unsigned int count; input=open_ibfile( file1);
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
146
if ( input == NULL ) fatal_error( "Error opening %s for input\n", file1 ); output = fopen(file2 , "wb" ); if ( output == NULL ) fatal_error( "Error opening %s for output\n", file2 ); init_storage(); for ( ; ; ) { init_diction(); old_code =(unsigned int)read_bits(input, current_code_bits); if ( old_code == END_OF_STREAM ) return; character = old_code; putc( old_code, output ); for ( ; ; ) { new_code=(unsigned int)read_bits( input,current_code_bits); if (new_code==END_OF_STREAM ) return; if (new_code==FLUSH_CODE )break; if (new_code==BUMP_CODE ) { current_code_bits++; continue; } if ( new_code>=next_code ) { decode_stack[0]=(char) character; count = decode_string(1,old_code); } else count = decode_string(0,new_code); character = decode_stack[count-1]; while ( count > 0 ) putc( decode_stack[ --count ], output ); DICT( next_code ).parent_code = old_code; DICT( next_code ).character=(char)character; next_code++; old_code = new_code; } } close_ibfile( input ); fclose( output ); }
Name: read_data Declaration: void read_data(char*); Description: This function is used to read sample values from the file. This function is used for
the testing purpose.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
147
void read_data(char filename[]) { int m,i=0; int n; char c; FILE *f1; f1=fopen(filename,"rb"); if(f1==NULL) fatal_error( "Error in reading file %s\n",filename); while(!feof(f1)) { n=getc(f1); if(i%16==0) printf("\n%02x----->",i); printf("%02x ",n);i++; if(i%256==0) {if(getch()==0x1b) break;} } printf("\n End of reading\n"); fclose(f1); }
Name: write_data Declaration: void write_data(char*); Description: This function is used to write sample data (dummy data) in the file. This is used
to write sine wave data for testing purpose.
void write_data(char filename[]) { int n; long m; FILE *f1; f1=fopen(filename,"wb"); if(f1==NULL) fatal_error( "Error opening output file %s\n", filename); printf("\nEnter total no. of bytes :"); scanf("%ld",&m); while(m>=0) { n=(int)(100*sin(2*pi*p/100)); putc((char)n,f1); m--; } fclose(f1); }
Name: read_wave Declaration: void read_wave(char*); Description:
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
148
This function is used to read content of speech file (wav file). This function reads
header information such as length of file, samples per second, bytes per
samples, channels etc. and actual sample values stored in the file.
void read_wave() { int choice; int n; char filename[20]; FILE *fp; char data[17]; data[16]=’\0’; printf("Enter name of the file :"); scanf("%s",filename); fp = fopen(filename,"rb"); int i=fread(data,1,16,fp); if(i!=16){printf("Error!"); } printf("READING WAVE FILE:\n"); printf("header %c %c %c %c %ld %8s\n", *data,*(data+1),*(data+2),*(data+3),*(long*)(data+4),data+8); i=fread(data,1,16,fp); int type=*((int*)(data+4)); int channels=*((int*)(data+6)); long sample_rate=*((long*)(data+8)); long byte_per_sec= *((long*)(data+12)); printf("\n type= %d\n channels= %d\n sample rate=%ld", type,channels,sample_rate); printf("\n byte per second=%ld\n",byte_per_sec); i=fread(data,1,4,fp); int block_size=*((int*)(data)); int bits=*((int*)(data+2)); printf("Block length : %d \n",block_size); printf("No of bits per sample : %d\n",bits); fseek(fp,50,0); i=fread(data,1,8,fp); printf("%c%c%c%c\n",*(data),*(data+1),*(data+2),*(data+3)); long data_size = *((long*)(data+4)); printf("size of file %s is %ld\n",filename,data_size); printf("Press Enter to observe content of the file :\n"); if(getch()==0x0d) { i=0; fseek(fp,58,0); while(!feof(fp)) { if(i%16==0){printf("\n%02d --->",i);} n=getc(fp); printf("%02x ",n); i++; if(i%256==0){getch();} } } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
149
Name: compare Declaration: void compare(char*); Description: This function is used for statistical comparison between original speech file and
reconstructed speech file. This is used for objective speech quality measure. This
function find outs mean absolute deviation, root mean square error (RMSE) and
signal to noise ratio.
void compare(char filename1[]) { int i,r,s,diff; long n=0; long deviation=0; long sum_sqrs=0; long sqdiff,sqrs,sqrr,error=0; double mse,rmse,mean_dev; float SNR; char filename2[25],c,d; FILE *f1,*f2; f1=fopen(filename1,"rb"); check(filename1); printf("\n\nComprasion with file:"); scanf("%s",filename2); f2=fopen(filename2,"rb"); check(filename2); while(!feof(f1)) { s=getc(f1); //Signal r=getc(f2); //Reconstructed Signal diff=s-r; sqrs=pow(s,2); sqdiff= pow(diff,2); sum_sqrs=sum_sqrs+sqrs; error=error+(long)pow(diff,2); deviation=deviation+abs(diff); n++; } mse = (float)error/(n-1); rmse=sqrt(mse); if(error!=0) SNR=10*log(sum_sqrs/error)/log(10); else SNR=100; mean_dev=(double)deviation/n; printf("\n %d samples are different",j); printf("\n Total mean absolute deviation is = %f",mean_dev); printf("\n Total Mean square error MSE = %lf",mse); printf("\n Total RMSE %lf",rmse); printf("\n SNR : %f",SNR); fclose(f1); fclose(f2); }//End of comparision
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
150
4.4.2 Classes and its members for Visual C++ program All the functions explained in the section 4.4.1 are also used in the Visual C++
program as a member function of the class “CspeechDlg”. Classes and its
member functions used in Visual C++ software are explained here.
Name: CSpeechDlg
Member functions:
The CspeechDlg class provides a encapsulation of everything required for
speech compression using wavelet transform. This class consists of wide verities
of the functions to perform wavelet transform using Haar, Daubechies and
Cohen-Daubechies-Feauveau biorthogonal wavelets using filter bank and lifting
scheme. The list of member functions for the wavelet transform is given here.
The source code for these functions is discussed in section 4.3 and 4.4.1. The
same source code is used in Visual C++ software.
void CSpeechDlg::split(BYTE[], const int) void CSpeechDlg::predict(BYTE[], const int) void CSpeechDlg::update(BYTE[], const int) void CSpeechDlg::CdfTransform(BYTE[], const int) void CSpeechDlg::undo_update(BYTE[], const int) void CSpeechDlg::undo_predict(BYTE[],const int) void CSpeechDlg::merge(BYTE[], const int); void CSpeechDlg::inv_CdfTransform(BYTE[],const int) void CSpeechDlg::HaarTransform(BYTE[], const int) void CSpeechDlg::predict_Haar(BYTE[],const int) void CSpeechDlg::update_Haar(BYTE[],const int) void CSpeechDlg::undo_update_Haar(BYTE[],const int) void CSpeechDlg::undo_predict_Haar(BYTE[],int n) void CSpeechDlg::inv_HaarTransform(BYTE[], const int) void CSpeechDlg::transform_daub4(int [], int n)
void CSpeechDlg::inv_transform_daub4(int [], int n) void CSpeechDlg::normalize(int[],const int)
void CSpeechDlg::update_ int[],const int) void CSpeechDlg::predict_daub(int[],const int)
void CSpeechDlg::update_daub1(int[],const int) void CSpeechDlg::undo_normalize(int[],const int) void CSpeechDlg::undo_update_daub2(int[],const int) void CSpeechDlg::undo_predict_daub(int[],const int) void CSpeechDlg::undo_update_daub1(int[],const int) void CSpeechDlg::CubicTransform(int[],const int)
void CSpeechDlg::inv_CubicTransform(int[],const int) void CSpeechDlg::compare(char*, char*)
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
151
CSpeechDlg class consists of other member functions that are invoked by
clicking the command buttons.
void CSpeechDlg::OnPlay() void CSpeechDlg::OnRecord() void CSpeechDlg::OnFileFwt() void CSpeechDlg::OnFileIwt() void CSpeechDlg::OnCompress() void CSpeechDlg::OnExpand() void CSpeechDlg::OnSelectFile() void CSpeechDlg::OnAddNoise() void CSpeechDlg::OnDenoise() void CSpeechDlg::OnSelendokComboWavelet() void CSpeechDlg::OnSelendokComboBlockLength() void CSpeechDlg::OnRadioLifting() void CSpeechDlg::OnRadioFilter() void CSpeechDlg::OnClear()
Name: CWave The CWave class provides a simple encapsulation for the recording and
playback of waveform audio. This class is responsible for combining actions of
gettings samples from the speech file or input devices. It saves samples to the
file. It provides buffer for the temporary storage of samples. It consists of follwing
member functions.
Member functions: SaveSamples: This function is used to save samples coming from recording
device to standard wave file. Name of the file is passed as an argument. Header
information such as file size, channels, samples rate, bytes per second etc., is
stored first. Samples are obtained from buffer and stored in the file.
void CWave::SaveSamples(CString strFile) { ASSERT( m_buffer.GetNumSamples()>0); CFile *f; CFile myfile(strFile,CFile::modeCreate|CFile::modeWrite); f=&myfile; f->Write("RIFF", 4); DWORD dwFileSize= m_buffer.GetNumSamples()*m_pcmWaveFormat.nBlockAlign+36; f->Write(&dwFileSize, sizeof(dwFileSize)) ; f->Write("WAVEfmt ", 8) ;
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
152
DWORD dwFmtSize = 16L; f->Write(&dwFmtSize, sizeof(dwFmtSize)) ; f->Write(&m_pcmWaveFormat.wFormatTag, sizeof(m_pcmWaveFormat.wFormatTag)); f->Write(&m_pcmWaveFormat.nChannels, sizeof(m_pcmWaveFormat.nChannels)); f->Write(&m_pcmWaveFormat.nSamplesPerSec, sizeof(m_pcmWaveFormat.nSamplesPerSec)); f->Write(&m_pcmWaveFormat.nAvgBytesPerSec, sizeof(m_pcmWaveFormat.nAvgBytesPerSec)); f->Write(&m_pcmWaveFormat.nBlockAlign, sizeof(m_pcmWaveFormat.nBlockAlign)); f->Write(&m_pcmWaveFormat.wBitsPerSample, sizeof(m_pcmWaveFormat.wBitsPerSample)); f->Write("data", 4) ; DWORD dwNum = m_buffer.GetNumSamples() * m_pcmWaveFormat.nBlockAlign; f->Write(&dwNum, sizeof(dwNum)) ; f->Write(m_buffer.GetBuffer(), dwNum) ; myfile.Close(); f->Close(); } GetSamples: This function is used to read samples from the specified file. It
reads header information as well as actual sample values. void CWave::GetSamples(CString strFile) { char szTmp[10]; CFile *f; CFile myfile(strFile, CFile::modeRead); f=&myfile; WAVEFORMATEX pcmWaveFormat; ZeroMemory(szTmp, 10 * sizeof(char)); f->Read(szTmp, 4 * sizeof(char)) ; if (strncmp(szTmp, _T("RIFF"), 4) != 0) ::AfxThrowFileException(CFileException::invalidFile,-1, f->GetFileName()); DWORD dwFileSize = m_buffer.GetNumSamples()* m_pcmWaveFormat.nBlockAlign+36; f->Read(&dwFileSize, sizeof(dwFileSize)); ZeroMemory(szTmp, 10 * sizeof(char)); f->Read(szTmp, 8 * sizeof(char)) ; if (strncmp(szTmp, _T("WAVEfmt "), 8) != 0) ::AfxThrowFileException(CFileException::invalidFile, -1, f->GetFileName()); DWORD dwFmtSize f->Read(&dwFmtSize, sizeof(dwFmtSize)); f->Read(&pcmWaveFormat.wFormatTag, sizeof(pcmWaveFormat.wFormatTag));
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
153
f->Read(&pcmWaveFormat.nChannels, sizeof(pcmWaveFormat.nChannels)); f->Read(&pcmWaveFormat.nSamplesPerSec, sizeof(pcmWaveFormat.nSamplesPerSec)); f->Read(&pcmWaveFormat.nAvgBytesPerSec, sizeof(pcmWaveFormat.nAvgBytesPerSec)); f->Read(&pcmWaveFormat.nBlockAlign,
sizeof(pcmWaveFormat.nBlockAlign)); f->Read(&pcmWaveFormat.wBitsPerSample, sizeof(pcmWaveFormat.wBitsPerSample)); ZeroMemory(szTmp,10*sizeof(char)); f->Read(szTmp, 4*sizeof(char)) ; m_pcmWaveFormat = pcmWaveFormat; DWORD dwNum=dwFileSize-36; DWORD dwNum1; f->Read(&dwNum1, sizeof(dwNum)) ; m_buffer.SetNumSamples(dwNum / pcmWaveFormat.nBlockAlign, pcmWaveFormat.nBlockAlign); f->Read(m_buffer.GetBuffer(), dwNum) ; myfile.Close(); f->Close(); } SetBuffer: This function is used to copy data buffer to internal wave data buffer.
if flag bCopy is set, then it calls member function CopyBuffer of class
CWaveBuffer. . If bCopy is reset, then it calls member function SetBuffer of class
CWaveBuffer. void CWave::SetBuffer(void* pBuffer,DWORD dwNumSample,bool bCopy) { ASSERT(pBuffer); ASSERT(dwNumSample > 0); ASSERT(m_pcmWaveFormat.nBlockAlign > 0); if (bCopy) { m_buffer.CopyBuffer(pBuffer, dwNumSample, m_pcmWaveFormat.nBlockAlign); } else { m_buffer.SetBuffer(pBuffer, dwNumSample, m_pcmWaveFormat.nBlockAlign); } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
154
Following functions are used to get the speech buffer, number of samples and
length of buffer from the CWaveBuffer class.
void* CWave::GetBuffer() const { return m_buffer.GetBuffer(); } DWORD CWave::GetNumSamples() const { return m_buffer.GetNumSamples(); } DWORD CWave::GetBufferLength() const { return(m_buffer.GetNumSamples()*m_pcmWaveFormat.nBlockAlign); } Following function is used to build the format of the “wav” file to save recorded
speech. This information will be saved as header information.
void CWave::BuildFormat(WORD nChannels, DWORD nFrequency, WORD nBits) { m_pcmWaveFormat.wFormatTag = WAVE_FORMAT_PCM; m_pcmWaveFormat.nChannels = nChannels; m_pcmWaveFormat.nSamplesPerSec = nFrequency; m_pcmWaveFormat.nAvgBytesPerSec = nFrequency*nChannels*nBits/8; m_pcmWaveFormat.nBlockAlign = nChannels*nBits / 8; m_pcmWaveFormat.wBitsPerSample = nBits; m_buffer.SetNumSamples(0L, m_pcmWaveFormat.nBlockAlign); } Name: CWaveBuffer
The CWaveBuffer class handles intermediate buffer. It performs various tasks
like it copies the buffer, set the buffer, set number of samples in buffer, get
number of samples from existing buffer. To perform these tasks, following
member functions are used.
Member functions: void CWaveBuffer::CopyBuffer(void* pBuffer,DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize); if (!m_pBuffer) SetNumSamples(dwNumSamples, nSize);
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
155
if(__min(m_dwNum, dwNumSamples) * nSize > 0) { ZeroMemory(m_pBuffer, m_dwNum * m_nSampleSize); CopyMemory(m_pBuffer, pBuffer, __min(m_dwNum, dwNumSamples)*nSize); } } void CWaveBuffer::SetBuffer(void *pBuffer, DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize); delete[] m_pBuffer; m_pBuffer = pBuffer; m_dwNum = dwNumSamples; m_nSampleSize = nSize; } int CWaveBuffer::GetSampleSize() const { return m_nSampleSize; } DWORD CWaveBuffer::GetNumSamples() const { return m_dwNum; } void CWaveBuffer::SetNumSamples(DWORD dwNumSamples, int nSize) { ASSERT(dwNumSamples >= 0); ASSERT(nSize > 0); void* pBuffer = NULL; pBuffer = new char[nSize * dwNumSamples]; SetBuffer(pBuffer, dwNumSamples, nSize); } Name: CWaveDevice
The CWaveDevice class is used to manage Audio input/output device. It gets
device number, determines whether wave can be played during play and
determines whether wave can be recorded during recording.
Member Functions: GetDevice() function returns MS Windows device number. IsInputFormat()
checks whether wave can be recorded or not. IsOutputFormat() checks whether
wave can be played or not.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
156
bool CWaveDevice::IsInputFormat(const CWave& wave) { return (waveInOpen( NULL, GetDevice(), &wave.GetFormat(), NULL, NULL, WAVE_FORMAT_QUERY) == MMSYSERR_NOERROR); } bool CWaveDevice::IsOutputFormat(const CWave& wave) { return (waveOutOpen( NULL, GetDevice(), &wave.GetFormat(), NULL, NULL, WAVE_FORMAT_QUERY) == MMSYSERR_NOERROR); } CWaveDevice::CWaveDevice(const CWaveDevice ©) { m_nDevice = copy.GetDevice(); } inline UINT CWaveDevice::GetDevice() const { return m_nDevice; } Name: CWaveIn
The CWaveIn class provides an encapsulation of the functionality of input device
driver. It handles recording function. It sets wave format, sets device for the
recording and adds new buffers with help of various member functions.
Member functions:
Record() function used to record speech from the sound card(input device).
Pause() used to pause the recording process. Continue() to continue recording
process. Open() to open input device for recording. SetDevice() sets the device
for the recording. MakeWave() creates new wave buffer so as it can be used to
store samples of speech in the file.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
157
bool CWaveIn::Record(UINT nTaille) { ASSERT(nTaille > 0); ASSERT(m_hWaveIn); if ( !Stop() ) { return false; } m_bResetRequired = false; FreeListOfBuffer(); FreeListOfHeader(); SetWaveFormat( m_wave.GetFormat() ); m_nIndexWaveHdr = NUMWAVEINHDR - 1; m_nBufferSize = nTaille; for (int i = 0; i < NUMWAVEINHDR; i++) { if ( !AddNewHeader(m_hWaveIn) ) return false; } if ( IsError(waveInStart(m_hWaveIn)) ) return false; return true; } DWORD CWaveIn::GetNumSamples() { DWORD dwTotal = 0L; POSITION pos = m_listOfBuffer.GetHeadPosition(); while (pos){ CWaveBuffer* p_waveBuffer =(CWaveBuffer*) listOfBuffer.GetNext(pos); dwTotal += p_waveBuffer->GetNumSamples(); } return dwTotal; } CWave CWaveIn::MakeWave() { void* pBuffer=new char[GetNumSamples()* m_wave.GetFormat().nBlockAlign]; DWORD dwPosInBuffer = 0L; POSITION pos = m_listOfBuffer.GetHeadPosition(); while (pos) { CWaveBuffer* p_waveBuffer = (CWaveBuffer*) m_listOfBuffer.GetNext(pos); CopyMemory((char*)pBuffer+dwPosInBuffer, p_waveBuffer->GetBuffer(), p_waveBuffer->GetNumSamples() * p_waveBuffer->GetSampleSize()); dwPosInBuffer += p_waveBuffer->GetNumSamples() * p_waveBuffer->GetSampleSize(); }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
158
m_wave.SetBuffer( pBuffer, GetNumSamples() ); return m_wave; } bool CWaveIn::Continue() { if (m_hWaveIn) { return !IsError( waveInStart(m_hWaveIn) ); } return true; } bool CWaveIn::Open() { return !IsError( waveInOpen(&m_hWaveIn,m_waveDevice.GetDevice(), &m_wave.GetFormat(),(DWORD)waveInProc,NULL,CALLBACK_FUNCTION)); } bool CWaveIn::Pause() { if (m_hWaveIn) { return !IsError( waveInStop(m_hWaveIn) ); } return true; } Name: CWaveInterface
The CWaveInterface provides interface with input output device for playing and
recording speech files. Its member functions provide device name for input and
output. It also returns number of input and output devices.
Member functions:
GetWaveInDevice() returns input device name. GetWaveOutDevice() returns
output device. GetWaveInCount() returns number of input devices detected.
GetWaveOutCount() returns number of output devices detected.
CString CWaveInterface::GetWaveInName(UINT nIndex) { ASSERT(nIndex < GetWaveInCount()); WAVEINCAPS tagCaps; switch (waveInGetDevCaps(nIndex, &tagCaps, sizeof(tagCaps))) { case MMSYSERR_NOERROR: return tagCaps.szPname; break; default: return ""; } }
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
159
UINT CWaveInterface::GetWaveOutCount() { return waveOutGetNumDevs(); } UINT CWaveInterface::GetWaveInCount() { return waveInGetNumDevs(); } Name: CWaveOut
The CWaveOut class provides an encapsulation of the functionality of output
device driver. It handles play function. It sets wave format, sets device for the
play and adds new buffers by using various member functions.
Member functions: bool CWaveOut::Play(DWORD dwStart, DWORD dwEnd) { if ( !Stop() ) return false; m_dwStartPos=(dwStart==-1)? 0L :dwStart* m_wave.GetFormat().nBlockAlign; m_dwEndPos = (dwEnd == -1) ? m_wave.GetBufferLength() : __min(m_wave.GetBufferLength(),dwEnd)* m_wave.GetFormat().nBlockAlign; m_nIndexWaveHdr = NUMWAVEOUTHDR - 1; for (int i = 0; i < NUMWAVEOUTHDR; i++) { if (!AddNewHeader(m_hWaveOut)) return false; } return true; } bool CWaveOut::Stop() { if (m_hWaveOut != NULL) { m_dwStartPos = m_dwEndPos; if ( IsError(waveOutReset(m_hWaveOut)) ) { return false; } } return true; } bool CWaveOut::AddFullHeader(HWAVEOUT hwo, int nLoop) { if ( GetBufferLength() == 0) { return false; } m_nIndexWaveHdr=(m_nIndexWaveHdr==NUMWAVEOUTHDR-1)?
0 : m_nIndexWaveHdr + 1;
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
160
m_tagWaveHdr[m_nIndexWaveHdr].lpData = (char*)m_wave.GetBuffer()+m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = m_dwEndPos-m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = WHDR_BEGINLOOP | WHDR_ENDLOOP; m_tagWaveHdr[m_nIndexWaveHdr].dwLoops = nLoop; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = (DWORD)(void*)this; if ( IsError(waveOutPrepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)))) { return false; } if ( IsError(waveOutWrite(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)))) { waveOutUnprepareHeader( hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR) ); m_tagWaveHdr[m_nIndexWaveHdr].lpData = NULL; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = NULL; m_nIndexWaveHdr--; return false; } m_dwStartPos = m_dwEndPos - m_dwStartPos; return true; } bool CWaveOut::AddNewHeader(HWAVEOUT hwo) { if(GetBufferLength()==0) return false; m_nIndexWaveHdr=(m_nIndexWaveHdr==NUMWAVEOUTHDR-1)? 0 : m_nIndexWaveHdr + 1; m_tagWaveHdr[m_nIndexWaveHdr].lpData = (char*)m_wave.GetBuffer()+m_dwStartPos; m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = GetBufferLength(); m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = (DWORD)(void*)this; if ( IsError(waveOutPrepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr], sizeof(WAVEHDR))) ) { return false; } if(IsError(waveOutWrite(hwo,&m_tagWaveHdr[m_nIndexWaveHdr], sizeof(WAVEHDR)))) { waveOutUnprepareHeader(hwo, &m_tagWaveHdr[m_nIndexWaveHdr],sizeof(WAVEHDR)); m_tagWaveHdr[m_nIndexWaveHdr].lpData = NULL;
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
161
m_tagWaveHdr[m_nIndexWaveHdr].dwBufferLength = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwFlags = 0; m_tagWaveHdr[m_nIndexWaveHdr].dwUser = NULL; m_nIndexWaveHdr--; return false; } m_dwStartPos += GetBufferLength(); return true; } DWORD CWaveOut::GetBufferLength() { return __min(m_dwWaveOutBufferLength, m_dwEndPos-m_dwStartPos); } DWORD CWaveOut::GetPosition() { if (m_hWaveOut) { MMTIME mmt; mmt.wType = TIME_SAMPLES; if ( IsError(waveOutGetPosition(m_hWaveOut, &mmt, sizeof(MMTIME)))) { return -1; } else return mmt.u.sample; } return -1; } bool CWaveOut::IsPlaying() { bool bResult = false; if(m_nIndexWaveHdr>-1&& m_tagWaveHdr[m_nIndexWaveHdr].dwFlags!=0) { bResult|=!(m_tagWaveHdr[m_nIndexWaveHdr].dwFlags & WHDR_DONE==WHDR_DONE); } return bResult; } bool CWaveOut::ResetRequired(CWaveOut* pWaveOut) { return (pWaveOut->m_dwStartPos >= pWaveOut->m_dwEndPos); }
Name: CWaveformDlg The CWaveformDlg class provides display of waveforms. It is having member
function OnPaint() responsible for drawing the speech waveforms.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
162
Member function: void CWaveformDlg::OnPaint() { int Gain,Offset; CPaintDC dc(this); // device context for painting CBrush mybrush(RGB(255,255,0)); CRect rect(0,0,1000,500); dc.Rectangle(rect); dc.FillRect(rect,&mybrush); CPen mypen(PS_SOLID, 1, RGB(0,0,255)); int m_nMapMode=MM_TWIPS; dc.SetMapMode(m_nMapMode); CSpeechDlg *pWnd=(CSpeechDlg*)GetParent(); if(pWnd) { if(pWnd->cmd==1) { if(pWnd->BufferLength<10000) { m_nMapMode=MM_LOMETRIC; dc.SetMapMode(m_nMapMode); Gain=50-pWnd->m_Gain/2; Offset=-pWnd->m_Offset*2; } else { Gain=(100-pWnd->m_Gain)*2; Offset=-pWnd->m_Offset*20; } for(int i=0; i<pWnd->BufferLength; i++) { if(i==0) dc.MoveTo(0,0); dc.LineTo(i,Offset-Gain*(pWnd->sample[i])/5); } } } else MessageBox("No Valid parent pointer"); } 4.4.3 Test Set up Test set up is prepared in C and Visual C++ for the speech compression using
wavelet transform.
In C Programs command line arguments are used to specify file name and type
of operation.
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
163
For Example: daub <operation> <filename> accepts two arguments. The first
argument specifies type of operation i.e. forward wavelet transform, inverse
wavelet transform, reading a file, writing a file etc. The second argument
specifies name of the file on which operation is performed. If user does not
specify any argument it displays following usage message:
Following commands are used for different wavelets: Lifting scheme of wavelet transform: Wavelet Command Haar lifthaar <operation> <filename> CDF cdf <operation> <filename> Daubechies Liftdaub <operation> <filename> Cubic prediction Cubic <operation> <filename> Filter bank algorithm: Wavelet Command Haar Haar <operation> <filename> Daub-8 daub8 <operation> <filename> Daubechies series daub <operation> <filename>
Speech Compression using Wavelet Transform Daubechies wavelet using filter bank method Usage: daub <operation> <filename> Example: daub fwt data : It performs forward wavelet transform on "data" file Other Operations: iwt : Inverse wavelet transform read: Read data file write: Write data file compare : compare two data files plotdiff: Plot the difference between two data files
CHAPTER 4: IMPLEMENTATION OF SPEECH COMPRESSION USING WAVELET TRANSFORM
164
A test set up developed by us using Visual C++ Programming is given in Fig. 4.1 This test set up provides following facilities for the analysis.
• Select the Wavelet.
• Record and play speech files.
• Selection of speech files.
• Threshold selection.
• Block length selection.
• Threshold variation for lossy compression.
• Display of speech waveform.
• Displays number of zeros after wavelet transforms.
• Displays original file size, compressed file size and compression ratio.
• Display parameters like SNR, Mean Absolute Deviation, Mean square error
(MSE), RMSE for statistical comparison between original speech file and
reconstructed speech file.
The detailed user manual for Visual C++ based “Wavelet based speech
compression” software is given in the Appendix C.
Fig 4.1 Snap shot of the test setup for speech compression using wavelet transform