ece 420-embedded dsp laboratory - university of …...composite algorithms 3 •algorithms composed...

ECE 420- Embedded DSP LaboratoryLecture 3 – Pitch Detection

Thomas MoonFebruary 10, 2020

• Lab2 – Audio filtering• Lab3• Domain transformation (Fourier Transform)• Spectrogram (STFT)

• Lab4 (this week)• Autocorrelation• Pitch detection

Lab Summary So Far

2

Composite Algorithms

3

• Algorithms composed of one or more steps/pieces

• Might involve decision points (“control flow”) or combinations from different paths

• Allows for sophisticated algorithms but increases system complexity substantially

• Flow graph helps manage system and convey operation to others

Step AStep B

Step C Step D

Step E

• Always implement components as separate functions/classes• Sometimes these will be in different libraries anyway

• e.g.) Kiss FFT

• Separation of concerns allows for better focus to the code, generally resulting in better coding style and code that is much easier to debug

• Creates code segments that can be portable among projects and not tied to a specific implementation

Implementing Composite Algorithms

4

• Good practice to validate components of your code first• Writing each step as separate functions facilitates for good unit

testing• Increases probability of success when putting everything

together

• Types of testing• Unit• Integration• System/End-to-End

• These testing categories get cover progressively larger aspects of the system, but at higher cost

A Note on Testing

5

• How do you know if something is working?• Test signals!

• What test signals do you use?• They might be provided for you• You can make them up• You can use a repository of test data• You can acquire them

• Classes of test data• ‘easy’ vs. ‘hard’• ‘noisy’ vs. ‘clean’• Corner cases

A Note on Testing

6

• A key is to have ‘high quality’ data to establish ‘ground truth’• Nothing is worse than getting misled about algorithm

performance due to incorrect data• Allows for quantitative measures of performance if sufficiently

large set available

• Simple and clean data can facilitate establishing basic algorithm functionality

• Always a good idea to at least try with a non-ideal test signal as a sanity check• Some algorithms can catastrophically break down when certain

assumptions are not met or noise is introduced!

A Note on Testing

7

• Speech signal is an acoustic signal produced from a speech production system.

• Non-stationary signal: its frequency contents change over time

Speech Signals

8

Glottal airflow

output speech signalvocal tract

Source-filter Model

9

Excitation Generator

Linear System ℎ(𝑡)

Excitation Parameters

ß voiced/unvoiced, loudness, pitch, etc.

Vocal TractParameters

ß resonance freq., filter response

0 𝑓 𝑓 0 𝑓

𝑇( 𝑇(

source spectrum output spectrumfilter response(one or more resonances)

• This lab is a pitch detection implementation

• What exactly is pitch? Not actually strictly defined.• Period of signal in time domain• Fundamental frequency in spectral domain

Pitch

10

𝑇(

• Operate on a single frame of audio input1. initial decision: voiced or unvoiced (silent)?2. If voiced, perform pitch analysis and report the

pitch frequency.

Basic Pitch Detection Algorithm

11

VoicedDecision

PitchAnalysis 𝑓(

“No Signal”

Y

NFrame

1. Voiced sounds – generated by vibration of the vocal cords

2. Unvoiced sounds – speech generated without vibration of the vocal cords

3. Silence/noise – no active speech

• We will only be concerned with classifying and measuring pitch for Voiced frames.

Categorization of Frames

12

• Non-stationary speech signal• pitch changes over time (frame size should be short enough)

• Frame size selection• Likely not an integer multiple of period àspectral leakage in DFT (pitch does not fall into exact DFT bin)

• Need at least one period or more • Larger frame size means more latency and more variance on

pitch

• Non-ideal environment• Multiple concurrent signals• Noise

Challenges of Pitch Detection

13

• Voiced signals tend to be louder and more sustained

• Unvoiced signals & silence/noise have more abrupt characteristics

• Over the span of a frame, we would therefore expect voiced signals to have a higher energy than unvoiced/silent frames

• Calculate the energy of the frame, compare to a threshold for voiced decision (“Train” the threshold value)

Voiced Detection

∑𝑥 𝑛 ,Y

N

Frame > E/ Voiced

Not Voiced

Pitch Calculation by Autocorrelation

𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 =∑78(9:; 𝑥 𝑛 𝑥∗[𝑛 − 𝑙]

∑78(9:; |𝑥 𝑛 |,

Autocorrelation by circular convolution (normalized by its energy)

circular-delayed by 𝑙 samples

𝑙 = 𝑇(

𝑅11


𝑙 = 𝑇(

𝑅11

• Provides an estimate of how self-similar a signal is given a particular delay (lag) 𝑙.

• When offset corresponds to period of signal, values coherently combine to yield a large correlation value.


𝑙 = 𝑇(

𝑅11

• When 𝑅11[𝑙] be maximized?

àAlways 𝑅11[0]!(This can cause a wrong estimation on the pitch)


𝑙 = 𝑇(

𝑅11

• Given 𝑓? = 8𝑘𝐻𝑧, 𝑅11 has a peak at 𝑙 = 100. What is the pitch frequency of this signal?

àE(((;((

= 80𝐻𝑧

àUse this relation to set your frequency range.

*fundamental frequencymale: 85-180Hzfemale: 165-255Hz

Autocorrelation: Noise Reduction𝑥[𝑛] = cos 2𝜋𝑓(𝑛

𝑣[𝑛]~𝑁(0, 𝜎,)

𝑅11 𝑙 = P78(

9:;

cos 2𝜋𝑓(𝑛 Q cos 2𝜋𝑓((𝑛 − 𝑙)

=12P78(

9:;

(cos 2𝜋𝑓(𝑙 + cos 2𝜋𝑓((2𝑛 − 𝑙) )

𝑥[𝑛] = 𝑣[𝑛]

𝑅11 𝑙 = P78(

9:;

𝑣[𝑛] Q 𝑣[𝑛 − 𝑙]

𝑅11 𝑙 = P78(

9:;

(𝑣[𝑛]), = 𝑁𝜎,

𝑅11 𝑙 = P78(

9:;

𝑣[𝑛] Q 𝑣 𝑛 − 𝑙 ≈ 0

𝑅11 𝑙

𝑥[𝑛]

=𝑁2cos 2𝜋𝑓(𝑙

≈ 0

𝑖𝑓 𝑙 = 0,

𝑖𝑓 𝑙 ≠ 0,

Basic Pitch Detection Algorithm

20

VoicedDecision

Autocorrelation

Frame

Peak Detection

Estimated Pitch

Not Voiced

• “Big O” notation• describes how run-time grows as input size grows• 𝑂 𝑁 ,𝑂 𝑁, , 𝑂 log𝑁 , 𝑂 1 , 𝑂(29)

• Key property of complexity analysis is that the higher order term dominates• 𝑂 𝑁 + 𝑁, = 𝑂(𝑁,)• 𝑂(3𝑁,) = 𝑂(𝑁,)

• Always a good idea to know the complexity of the algorithms you are using• Know what you are getting yourself into!

• For a complicated composite algorithm, each step might have different complexity• Knowing complexity tells you where the dominant computation step• Optimize there!

Algorithmic Complexity

21

Autocorrelation: Computation

22

𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 =∑78(9:; 𝑥 𝑛 𝑥∗[𝑛 − 𝑙]

∑78(9:; |𝑥 𝑛 |,

𝑂 𝑁, operation

𝑥 𝑛 ⨂𝑦 𝑛 ⟺ 𝑋 𝑘 𝑌[𝑘]

𝑥∗[𝑛] ⟺ 𝑋∗ 𝑁 − 𝑘

𝑥 𝑁 − 𝑛 ⟺ 𝑋 𝑁 − 𝑘

𝑥 𝑛 ⨂𝑥∗ −𝑛 ⟺ ? ? ?

Circular convolution property

Time and frequency reverse property

Conjugate in time property

𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 = 𝐼𝐹𝐹𝑇{𝑋 𝑘 𝑋∗ 𝑘 }

𝑂 𝑁𝑙𝑜𝑔𝑁 operation

à 𝑋 𝑘 𝑋∗[𝑘]

𝑂 𝑁, vs 𝑂 𝑁𝑙𝑜𝑔𝑁

23

• Pitch estimation searches for a peak value in the autocorrelation function

• Two main challenges• Lag of 0 will generally have the largest overall value, but

is certainly not the value we are looking for• Multiples of the pitch (higher order harmonics) will also

yield local peaks but must be rejected

Challenges in Autocorrelation

24

Categories of Pitch Analysis

25

26

𝑑 𝜏 = P78;

g

𝑥 𝑛 − 𝑥 𝑛 + 𝜏 ,

𝑑 𝑇( = P78;

g

𝑥 𝑛 − 𝑥 𝑛 + 𝑇( , = 0

𝑅11 𝑙 = 𝐼𝐹𝐹𝑇 𝑋 𝑘 𝑋∗ 𝑘 = 𝐼𝐹𝐹𝑇{ 𝑋 𝑘 ,}

𝐶 𝑙 = 𝐼𝐹𝐹𝑇{log |𝑋[𝑘]|}

• Log operation equalizes strong components and raises low level energy regions. Thus, Cepstrum is robust to strong formants but becomes sensitive to noise.

• The source (pitch) and the filter (formants) are additive in the logarithm, thus clearly separate.

• YIN’s algorithm enhanced the accuracy by normalization, threshold, and interpolation techniques.

• Lead-in to the Final Project• Forming up groups for both Assigned Lab and Final Project

• Explore a DSP algorithm from the literature• Implementation in Python for this stage, NOT on tablet yet

• Proposal for Assigned Lab1. Overview of proposed algorithm, cite source(s)2. Plan for testing and validation3. Rough idea(s) for Final Project applicationàDue March 9

• Assigned Lab Report submission at end of project• Demo incorporated into Final Proposal design review

Assigned Lab

27

• Lab 3: Real time spectral analyzer Quiz/Demo

• Lab 4: Pitch Analyzer

• Be thinking about Assigned Project Labs / Groups

This week…

28

ece 420-embedded dsp laboratory - university of …...composite algorithms 3 •algorithms composed...

Documents