ee269 signal processing for machine learning - cepstrum

EE269Signal Processing for Machine Learning

Cepstrum

Instructor : Mert Pilanci

Stanford University

October 11 2021

Linear systems and additive noise

I Linear systems, e.g., filters, can easily separate additive noisefrom useful information when we know the frequency range ofthe noise and information

y[n] = x[n] + w[n]

I In vector notation

Hy = Hx+Hw

Multiplicative or convolutive noise

I This is harder if the signal and noise are convoluted, e.g., inspeech processing

y[n] = x[n] ∗ w[n]

I w[n] is the flowing air (noise source)

I h[n] is the vocal tract (filter)

We can develop an operator that can separate convolutedcomponents by transforming convolution into addition

Cepstrum

I Developed to separate convoluted signals

y[n] = x[n] ∗ w[n]

Discrete Fourier Domain:

Y [k] = X[k]W [k]

I Take logarithms

log[Y [k]] = logX[k] + logW [k]

I we can apply a linear filter to log Y [k] to separate

I equivalently we can take DFT of log Y [k] and process infrequency domain

cepstrum is the DFT (or DCT) of the log spectrum

Application: Mel-frequency spectrum

I perceptual scale of pitches

I 1 mels = 1000 Hz

I a formula to convert f hertz into m mels

m = 2595 log10

⇣1 + f

I weighted DFT magnitude

I mel-frequency spectrum MF [r] is defined as

MF [r] =X

|Vr[k]X[k]|2

I Vr[k] is the triangular weighting function for the rth filter.

I bandwidths are constant for center frequencies ¡ 1kHz and

then increase exponentially

I identical to convolutions with 22 filters

I weighted DFT magnitude

I mel-frequency spectrum MF [r] is defined as

MF [r] =X

|Vr[k]X[k]|2

I Vr[k] is the triangular weighting function for the rth filter.

I bandwidths are constant for center frequencies ¡ 1kHz and

then increase exponentially

I identical to convolutions with 22 filters

MF[r] =X

|Vr[k]X[k]|2

I Mel Frequency Cepstral Coe�cient (MFCC)

MFCC[m] =RX

log(MF[r]) cos

✓r +

�(1)

I i.e., inner-product with cosines MFCC[m] = hlogMF[r], cm[r]i

Application: Speaker Identification

I train a k-Nearest Neighbor classifier to classify frames

I AN4 dataset (CMU): 5 male and 5 female subjects speaking

words and numbers

I collect the training samples into frames of 30 ms with an

overlap of 75%

I calculate MFCC

I train a k-Nearest Neighbor classifier on the frames

I for a given test signal, predictions are made every frame

I most frequently occurring label is declared as the speaker

speaker 1 (blue) and speaker 2 (red) time domain signals

frame based MFCC features

I average accuracy is 92.93%

ee269 signal processing for machine learning - cepstrum

Documents

signal processing examples with c64x digital signal...

digital signal processing:digital signal...

sam signal processing examples statistical signal processing...

multirate signal processing multirate signal processing :...

international journal of signal processing systems vol. 2...

minimum-phase signal calculation using the real cepstrum ·...

signal representations:...

ieee signal processing society conference organizer...

sphinx iii signal processing front end...

homomorphic filtering and speech processing using cepstrum...

chapter 9 basic signal processing - cs.princeton.edu 9 basic...

wavelets and signal processing - ieee signal processing...

coreel.comcoreel.com/...radardigitalsignalprocessingsystem.pdf ·...

mapping signal processingmapping signal processing

biological signal & signal processing

signal processing for electronic nose, signal processing

introduction - parra lab · biomedical signal processing...

advanced digital signal processing part 5: multi-rate...

genomic signal processing - ieee signal processing magazine

higher - order gabor spectra a mathematical model...