lecture 3: audio applications - angewandte numerische …...topological time series analysis -...
Post on 04-Jul-2020
9 Views
Preview:
TRANSCRIPT
Lecture 3: Audio Applications
Topological Time Series Analysis - Theory And Practice
Jose Perea, Michigan State University. Chris Tralie, Duke University
7/20/2016
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Table of Contents
I Audio Data / Biphonation
B Music Data
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Digital Audio Basics: Representation/Sampling
I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the
highest frequency of a bandlimited signal to avoidaliasing
I Very high sampling rate!B 1 second chunk lives in R44100
B 3 second chunk lives in R132300!
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Digital Audio Basics: Representation/Sampling
I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the
highest frequency of a bandlimited signal to avoidaliasing
I Very high sampling rate!B 1 second chunk lives in R44100
B 3 second chunk lives in R132300!
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Biphonation
I 2 noncommensurate frequencies present at the same timein biological phenomena
e.g. cos(t) + cos(πt)
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnies
High Valence Negative
Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnies
High Valence Positive
B We’ll be focusing on the positive clip today...
Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnies
High Valence Positive
B We’ll be focusing on the positive clip today...Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnie Audio
Interactively Show Audio File
I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time
(≈23 milliseconds of audio)
Have Students Find Steady State Region
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnie Audio
Interactively Show Audio File
I Base frequencies on the order of 1000hz (Window size?)
I By default, only using 512 samples after the starting time(≈23 milliseconds of audio)
Have Students Find Steady State Region
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnie Audio
Interactively Show Audio File
I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time
(≈23 milliseconds of audio)
Have Students Find Steady State Region
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Horse Whinnie Audio
Interactively Show Audio File
I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time
(≈23 milliseconds of audio)
Have Students Find Steady State Region
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Biphonation Finding Competition
I Pan through audio file to find best region of biphonation, asmeasured by persistence of second most persistent class
I May be corrupted due to noiseI Will keep a running tab of best score on the board!
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Table of Contents
B Audio Data / Biphonation
I Music Data
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Tempo / Repetition
I Music is full of repetition
I Tempo is determined by a train of music “pulses”/“beats” ina periodic pattern
I Foot tappingI Tempo usually 50 - 200 beats per minute
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Tempo / Repetition
I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in
a periodic pattern
I Foot tappingI Tempo usually 50 - 200 beats per minute
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Tempo / Repetition
I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in
a periodic pattern
I Foot tapping
I Tempo usually 50 - 200 beats per minute
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Tempo / Repetition
I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in
a periodic pattern
I Foot tappingI Tempo usually 50 - 200 beats per minute
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Tempo / Repetition
Don’t Stop Believin (120 beats per minute)
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Raw Audio Delay Embedding
I τ ∗ dim = 22050 (why?)
I dT = 441I Taking first 3 seconds of audio
Run it! What happens?
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Raw Audio Delay Embedding
I τ ∗ dim = 22050 (why?)I dT = 441
I Taking first 3 seconds of audio
Run it! What happens?
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Raw Audio Delay Embedding
I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio
Run it! What happens?
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Raw Audio Delay Embedding
I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio
Run it! What happens?
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms: Definition
Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )
S[k ,n] =
∣∣∣∣∣∣∣∣∣FFT
x
nH
nH + 1...
nH + W − 1
[k ]
∣∣∣∣∣∣∣∣∣2
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms: Definition
Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )
S[k ,n] =
∣∣∣∣∣∣∣∣∣FFT
x
nH
nH + 1...
nH + W − 1
[k ]
∣∣∣∣∣∣∣∣∣2
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms: Definition
S[k ,n] =
∣∣∣∣∣∣∣∣∣FFT
x
nH
nH + 1...
nH + W − 1
[k ]
∣∣∣∣∣∣∣∣∣2
Window 2
Window 3
Window 1
hop
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Spectrograms
B Look at Journey example, show percussion
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
f [n] =W−1∑k=0
s(log(S[k + 1,n])− log(S[k ,n]))
where
s(x) ={
x x > 00 otherwise
}Indicator function for “audio onsets”
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
Show module, show Journey example
B By what factor have we reduced the sampling rate?
B Show synchronized audio
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
Show module, show Journey example
B By what factor have we reduced the sampling rate?
B Show synchronized audio
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
Show module, show Journey example
B By what factor have we reduced the sampling rate?
B Show synchronized audio
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):
51-60.
2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.
3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.
e.g. in [1]
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Audio Novelty Functions
Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):
51-60.
2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.
3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.
e.g. in [1]
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Music Vs Speech
Show module
B A sliding window of sliding windows!
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Music Vs Speech
Show module
B A sliding window of sliding windows!
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Conclusions
I Quasiperiodicity (biphonation) is present in nature
I Due to noise/artifacts, sometimes necessary to searcharound
I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window
embeddings can pick up on rhythmic periodicities in music
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Conclusions
I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search
around
I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window
embeddings can pick up on rhythmic periodicities in music
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Conclusions
I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search
aroundI Summary features often better than raw data
I After proper preprocessing, TDA on sliding windowembeddings can pick up on rhythmic periodicities in music
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
Conclusions
I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search
aroundI Summary features often better than raw dataI After proper preprocessing, TDA on sliding window
embeddings can pick up on rhythmic periodicities in music
Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications
top related