lecture 3: audio applications - angewandte numerische …...topological time series analysis -...

Post on 04-Jul-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 3: Audio Applications

Topological Time Series Analysis - Theory And Practice

Jose Perea, Michigan State University. Chris Tralie, Duke University

7/20/2016

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Table of Contents

I Audio Data / Biphonation

B Music Data

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Digital Audio Basics: Representation/Sampling

I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the

highest frequency of a bandlimited signal to avoidaliasing

I Very high sampling rate!B 1 second chunk lives in R44100

B 3 second chunk lives in R132300!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Digital Audio Basics: Representation/Sampling

I 1D time series x [n], sampled at 44100hzB Shannon Nyquist: Need to sample at at least twice the

highest frequency of a bandlimited signal to avoidaliasing

I Very high sampling rate!B 1 second chunk lives in R44100

B 3 second chunk lives in R132300!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Biphonation

I 2 noncommensurate frequencies present at the same timein biological phenomena

e.g. cos(t) + cos(πt)

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnies

High Valence Negative

Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnies

High Valence Positive

B We’ll be focusing on the positive clip today...

Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnies

High Valence Positive

B We’ll be focusing on the positive clip today...Briefer, Elodie F., et al. ”Segregation of information about emotional arousal and valence in horse whinnies.”Scientific reports 4 (2015).

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)

I By default, only using 512 samples after the starting time(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Horse Whinnie Audio

Interactively Show Audio File

I Base frequencies on the order of 1000hz (Window size?)I By default, only using 512 samples after the starting time

(≈23 milliseconds of audio)

Have Students Find Steady State Region

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Biphonation Finding Competition

I Pan through audio file to find best region of biphonation, asmeasured by persistence of second most persistent class

I May be corrupted due to noiseI Will keep a running tab of best score on the board!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Table of Contents

B Audio Data / Biphonation

I Music Data

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Tempo / Repetition

I Music is full of repetition

I Tempo is determined by a train of music “pulses”/“beats” ina periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tapping

I Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Tempo / Repetition

I Music is full of repetitionI Tempo is determined by a train of music “pulses”/“beats” in

a periodic pattern

I Foot tappingI Tempo usually 50 - 200 beats per minute

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Tempo / Repetition

Don’t Stop Believin (120 beats per minute)

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)

I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441

I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Raw Audio Delay Embedding

I τ ∗ dim = 22050 (why?)I dT = 441I Taking first 3 seconds of audio

Run it! What happens?

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms: Definition

Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms: Definition

Aka the Squared Magnitude Short-Time Fourier Transform.GivenB A discrete signal xB A window size W (implicitly τ = 1)B A hop size H (like dT )

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms: Definition

S[k ,n] =

∣∣∣∣∣∣∣∣∣FFT

x

nH

nH + 1...

nH + W − 1

[k ]

∣∣∣∣∣∣∣∣∣2

Window 2

Window 3

Window 1

hop

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Spectrograms

B Look at Journey example, show percussion

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

f [n] =W−1∑k=0

s(log(S[k + 1,n])− log(S[k ,n]))

where

s(x) ={

x x > 00 otherwise

}Indicator function for “audio onsets”

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

Show module, show Journey example

B By what factor have we reduced the sampling rate?

B Show synchronized audio

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):

51-60.

2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.

3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.

e.g. in [1]

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Audio Novelty Functions

Lots of variants1 Ellis, Daniel PW. ”Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007):

51-60.

2 Gouyon, Fabien, Simon Dixon, and Gerhard Widmer. ”Evaluating low-level features for beat classificationand tracking.” 2007 IEEE International Conference on Acoustics, Speech and SignalProcessing-ICASSP’07. Vol. 4. IEEE, 2007.

3 Boeck, Sebastian, and Gerhard Widmer. ”Maximum filter vibrato suppression for onset detection.”Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland.2013.

e.g. in [1]

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Music Vs Speech

Show module

B A sliding window of sliding windows!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Music Vs Speech

Show module

B A sliding window of sliding windows!

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Conclusions

I Quasiperiodicity (biphonation) is present in nature

I Due to noise/artifacts, sometimes necessary to searcharound

I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

around

I Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

aroundI Summary features often better than raw data

I After proper preprocessing, TDA on sliding windowembeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

Conclusions

I Quasiperiodicity (biphonation) is present in natureI Due to noise/artifacts, sometimes necessary to search

aroundI Summary features often better than raw dataI After proper preprocessing, TDA on sliding window

embeddings can pick up on rhythmic periodicities in music

Topological Time Series Analysis - Theory And Practice Lecture 3: Audio Applications

top related