ece 420-embedded dsp laboratory - university of …...composite algorithms 3 •algorithms composed...
TRANSCRIPT
ECE 420- Embedded DSP LaboratoryLecture 3 – Pitch Detection
Thomas MoonFebruary 10, 2020
• Lab2 – Audio filtering• Lab3• Domain transformation (Fourier Transform)• Spectrogram (STFT)
• Lab4 (this week)• Autocorrelation• Pitch detection
Lab Summary So Far
2
Composite Algorithms
3
• Algorithms composed of one or more steps/pieces
• Might involve decision points (“control flow”) or combinations from different paths
• Allows for sophisticated algorithms but increases system complexity substantially
• Flow graph helps manage system and convey operation to others
Step AStep B
Step C Step D
Step E
• Always implement components as separate functions/classes• Sometimes these will be in different libraries anyway
• e.g.) Kiss FFT
• Separation of concerns allows for better focus to the code, generally resulting in better coding style and code that is much easier to debug
• Creates code segments that can be portable among projects and not tied to a specific implementation
Implementing Composite Algorithms
4
• Good practice to validate components of your code first• Writing each step as separate functions facilitates for good unit
testing• Increases probability of success when putting everything
together
• Types of testing• Unit• Integration• System/End-to-End
• These testing categories get cover progressively larger aspects of the system, but at higher cost
A Note on Testing
5
• How do you know if something is working?• Test signals!
• What test signals do you use?• They might be provided for you• You can make them up• You can use a repository of test data• You can acquire them
• Classes of test data• ‘easy’ vs. ‘hard’• ‘noisy’ vs. ‘clean’• Corner cases
A Note on Testing
6
• A key is to have ‘high quality’ data to establish ‘ground truth’• Nothing is worse than getting misled about algorithm
performance due to incorrect data• Allows for quantitative measures of performance if sufficiently
large set available
• Simple and clean data can facilitate establishing basic algorithm functionality
• Always a good idea to at least try with a non-ideal test signal as a sanity check• Some algorithms can catastrophically break down when certain
assumptions are not met or noise is introduced!
A Note on Testing
7
• Speech signal is an acoustic signal produced from a speech production system.
• Non-stationary signal: its frequency contents change over time
Speech Signals
8
Glottal airflow
output speech signalvocal tract
Source-filter Model
9
Excitation Generator
Linear System ℎ(𝑡)
Excitation Parameters
ß voiced/unvoiced, loudness, pitch, etc.
Vocal TractParameters
ß resonance freq., filter response
0 𝑓 𝑓 0 𝑓
𝑇( 𝑇(
source spectrum output spectrumfilter response(one or more resonances)
• This lab is a pitch detection implementation
• What exactly is pitch? Not actually strictly defined.• Period of signal in time domain• Fundamental frequency in spectral domain
Pitch
10
𝑇(
• Operate on a single frame of audio input1. initial decision: voiced or unvoiced (silent)?2. If voiced, perform pitch analysis and report the
pitch frequency.
Basic Pitch Detection Algorithm
11
VoicedDecision
PitchAnalysis 𝑓(
“No Signal”
Y
NFrame
1. Voiced sounds – generated by vibration of the vocal cords
2. Unvoiced sounds – speech generated without vibration of the vocal cords
3. Silence/noise – no active speech
• We will only be concerned with classifying and measuring pitch for Voiced frames.
Categorization of Frames
12
• Non-stationary speech signal• pitch changes over time (frame size should be short enough)
• Frame size selection• Likely not an integer multiple of period àspectral leakage in DFT (pitch does not fall into exact DFT bin)
• Need at least one period or more • Larger frame size means more latency and more variance on
pitch
• Non-ideal environment• Multiple concurrent signals• Noise
Challenges of Pitch Detection
13
• Voiced signals tend to be louder and more sustained
• Unvoiced signals & silence/noise have more abrupt characteristics
• Over the span of a frame, we would therefore expect voiced signals to have a higher energy than unvoiced/silent frames
• Calculate the energy of the frame, compare to a threshold for voiced decision (“Train” the threshold value)
Voiced Detection
∑𝑥 𝑛 ,Y
N
Frame > E/ Voiced
Not Voiced
Pitch Calculation by Autocorrelation
𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 =∑78(9:; 𝑥 𝑛 𝑥∗[𝑛 − 𝑙]
∑78(9:; |𝑥 𝑛 |,
Autocorrelation by circular convolution (normalized by its energy)
circular-delayed by 𝑙 samples
𝑙 = 𝑇(
𝑅11
circular-delayed by 𝑙 samples
𝑙 = 𝑇(
𝑅11
• Provides an estimate of how self-similar a signal is given a particular delay (lag) 𝑙.
• When offset corresponds to period of signal, values coherently combine to yield a large correlation value.
circular-delayed by 𝑙 samples
𝑙 = 𝑇(
𝑅11
• When 𝑅11[𝑙] be maximized?
àAlways 𝑅11[0]!(This can cause a wrong estimation on the pitch)
circular-delayed by 𝑙 samples
𝑙 = 𝑇(
𝑅11
• Given 𝑓? = 8𝑘𝐻𝑧, 𝑅11 has a peak at 𝑙 = 100. What is the pitch frequency of this signal?
àE(((;((
= 80𝐻𝑧
àUse this relation to set your frequency range.
*fundamental frequencymale: 85-180Hzfemale: 165-255Hz
Autocorrelation: Noise Reduction𝑥[𝑛] = cos 2𝜋𝑓(𝑛
𝑣[𝑛]~𝑁(0, 𝜎,)
𝑅11 𝑙 = P78(
9:;
cos 2𝜋𝑓(𝑛 Q cos 2𝜋𝑓((𝑛 − 𝑙)
=12P78(
9:;
(cos 2𝜋𝑓(𝑙 + cos 2𝜋𝑓((2𝑛 − 𝑙) )
𝑥[𝑛] = 𝑣[𝑛]
𝑅11 𝑙 = P78(
9:;
𝑣[𝑛] Q 𝑣[𝑛 − 𝑙]
𝑅11 𝑙 = P78(
9:;
(𝑣[𝑛]), = 𝑁𝜎,
𝑅11 𝑙 = P78(
9:;
𝑣[𝑛] Q 𝑣 𝑛 − 𝑙 ≈ 0
𝑅11 𝑙
𝑥[𝑛]
=𝑁2cos 2𝜋𝑓(𝑙
≈ 0
𝑖𝑓 𝑙 = 0,
𝑖𝑓 𝑙 ≠ 0,
Basic Pitch Detection Algorithm
20
VoicedDecision
Autocorrelation
Frame
Peak Detection
Estimated Pitch
Not Voiced
• “Big O” notation• describes how run-time grows as input size grows• 𝑂 𝑁 ,𝑂 𝑁, , 𝑂 log𝑁 , 𝑂 1 , 𝑂(29)
• Key property of complexity analysis is that the higher order term dominates• 𝑂 𝑁 + 𝑁, = 𝑂(𝑁,)• 𝑂(3𝑁,) = 𝑂(𝑁,)
• Always a good idea to know the complexity of the algorithms you are using• Know what you are getting yourself into!
• For a complicated composite algorithm, each step might have different complexity• Knowing complexity tells you where the dominant computation step• Optimize there!
Algorithmic Complexity
21
Autocorrelation: Computation
22
𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 =∑78(9:; 𝑥 𝑛 𝑥∗[𝑛 − 𝑙]
∑78(9:; |𝑥 𝑛 |,
𝑂 𝑁, operation
𝑥 𝑛 ⨂𝑦 𝑛 ⟺ 𝑋 𝑘 𝑌[𝑘]
𝑥∗[𝑛] ⟺ 𝑋∗ 𝑁 − 𝑘
𝑥 𝑁 − 𝑛 ⟺ 𝑋 𝑁 − 𝑘
𝑥 𝑛 ⨂𝑥∗ −𝑛 ⟺ ? ? ?
Circular convolution property
Time and frequency reverse property
Conjugate in time property
𝑅11 𝑙 = 𝑥 𝑛 ⨂𝑥∗ −𝑛 = 𝐼𝐹𝐹𝑇{𝑋 𝑘 𝑋∗ 𝑘 }
𝑂 𝑁𝑙𝑜𝑔𝑁 operation
à 𝑋 𝑘 𝑋∗[𝑘]
𝑂 𝑁, vs 𝑂 𝑁𝑙𝑜𝑔𝑁
23
• Pitch estimation searches for a peak value in the autocorrelation function
• Two main challenges• Lag of 0 will generally have the largest overall value, but
is certainly not the value we are looking for• Multiples of the pitch (higher order harmonics) will also
yield local peaks but must be rejected
Challenges in Autocorrelation
24
Categories of Pitch Analysis
25
26
𝑑 𝜏 = P78;
g
𝑥 𝑛 − 𝑥 𝑛 + 𝜏 ,
𝑑 𝑇( = P78;
g
𝑥 𝑛 − 𝑥 𝑛 + 𝑇( , = 0
𝑅11 𝑙 = 𝐼𝐹𝐹𝑇 𝑋 𝑘 𝑋∗ 𝑘 = 𝐼𝐹𝐹𝑇{ 𝑋 𝑘 ,}
𝐶 𝑙 = 𝐼𝐹𝐹𝑇{log |𝑋[𝑘]|}
• Log operation equalizes strong components and raises low level energy regions. Thus, Cepstrum is robust to strong formants but becomes sensitive to noise.
• The source (pitch) and the filter (formants) are additive in the logarithm, thus clearly separate.
• YIN’s algorithm enhanced the accuracy by normalization, threshold, and interpolation techniques.
• Lead-in to the Final Project• Forming up groups for both Assigned Lab and Final Project
• Explore a DSP algorithm from the literature• Implementation in Python for this stage, NOT on tablet yet
• Proposal for Assigned Lab1. Overview of proposed algorithm, cite source(s)2. Plan for testing and validation3. Rough idea(s) for Final Project applicationàDue March 9
• Assigned Lab Report submission at end of project• Demo incorporated into Final Proposal design review
Assigned Lab
27
• Lab 3: Real time spectral analyzer Quiz/Demo
• Lab 4: Pitch Analyzer
• Be thinking about Assigned Project Labs / Groups
This week…
28