gct731 fall 2014 topics in music technology - music information retrieval audio and music...

17
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1

Upload: emory-eaton

Post on 11-Jan-2016

235 views

Category:

Documents


3 download

TRANSCRIPT

Introduction to MIR

GCT731 Fall 2014Topics in Music Technology- Music Information RetrievalAudio and Music Representations (Part 2)1OutlinesAudio and Music representations (contd)Frequency scaling using spectrogramPitch scale in musicPitch scale in human hearingMapping frequency scale using STFTConstant-Q transformAuditory Filter bankTools

22MotivationSpectrogram is the most standard way of visualizing sounds.Good to see the harmonic structure of a single tone.However, is it the the best way to visualize musical signals

Musical signalsMusical notes are not linearly scaledThe majority of notes is located in low frequency rangeFrequency is not intuitive for human hearing as well.

3Piano (Chromatic Scale)

Aerosmith Jaded

3Pitch Scale in MusicMusical notes are scaled logarithmically in frequency

Music tuning systemsDifferent ways of sub-dividing the octave Just Intonation: using harmonics e.g.) 1:1, 9:8, 5:4, 4:3, 3:2, 5:3, 15:9, 2:1 (diatonic scale) Pythagorean tuning: 3:2 (5th) for all notesEqual temperament: 1: 21/12 between two adjacent notese.g.) MIDI note (m) and frequency (f) in Hz

4

http://newt.phys.unsw.edu.au/jw/notes.html

4Pitch Scale in Human hearingHuman also perceive tones in a log scale

Psychoacoustic pitch scaleMel scale: based on pitch ratio of tones

Bark scale: based on critical band measurement

Equivalent Regular Bandwidth (EBR) rate: based on critical band measurement but with a different approach5

5Mapping frequency scale using STFTMapping linear frequency scale to a log-like scaleComputing each mapping point by multiplying weight (i.e. interpolation coefficient)

LimitationSimple but time frequency resolutions are still constrained on STFT

6

(M: mapping matrix, X: spectrogram, Y: scaled spectrogram)Constant-Q transformA more sophisticated way of obtaining log-frequency scaleUse a set of sinusoidal kernels (wavelets) such that the frequencies are logarithmically spaced the kernels (i.e. filters) have constant Q = frequency/bandwidth

7

7Constant-Q transformTime-frequency resolutions are not uniformHigh frequency-resolution and low time-resolution in low frequency rangeLow frequency-resolution and high time-resolution in high frequency range

8

Short-Time Fourier TransformConstant-Q transformExample of Constant-Q transform9

Log-frequency Spectrogram(mapping)Log-frequency Spectrogram(Constant-Q transform)

RegularSpectrogramExample of Constant-Q transform10

Log-frequency Spectrogram(mapping)Log-frequency Spectrogram(Constant-Q transform)RegularSpectrogram

Auditory Filter bankA set of filter bank that imitates the magnitude and delay of traveling waves on basilar membrane in cochlear Produce 3-D representation (time-channel-lag) or auditory images

11input

Cochlear Filter banks ...HCHCHC...Stabilize&Combine

outputOval windowHigh Freq.Low Freq.

11Types of Auditory Filter banksGamma-tone Filter banks Gamma-tone

Used in Pattersons Auditory Filter banks based on ERB

Pole-Zero Filter Cascade (Lyon)12

Hair-Cell13(Inner) Hair-cellTransform mechanical movement into neural spikes

Modeled as cascade of Half-wave rectificationCompressionLow-pass filtering

This conducts a non-linear processing Generate new harmonic partialsAssociated with missing fundamentals

Example of Auditory Filter Bank (Correlogram)14Piano (Chromatic Scale)

Example of Auditory Filter Bank (Correlogram)15

Aerosmith JadedToolsAudio Editor and AnalysisAudacityAdobe AuditionPraatSonicVisualizerSndTool

Software LibraryConstant-Q transform Toolbox (Matlab): http://www.cs.tut.fi/sgn/arg/CQT/ Auditory Toolbox (Matlab):https://engineering.purdue.edu/~malcolm/interval/1998-010/Auditory Image Model (C++):https://code.google.com/p/aimc/

16ReferencesConstant-Q transformJ.C. Brown, Calculation of a constant Q transform, 1991 Schrkhuber and Klapuri, Constant-Q transform toolbox for music processing, 2010M. Drfler, N. Holighaus, T. Grill and G. Velasco, Constructing an Invertible Constant-Q Transform with Non-stationary Gabor Frames, 2011Auditory Filter bankR.D. Patterson, M.H. Allerhand, C. Giguere, Timedomain modeling of peripheral auditory processing: A modular architecture and a software platform, 1995R. F. Lyon, Machine Hearing: An Emerging Field, 2010R. F. Lyon, A. C. Katsiamis, and E. M. Drakakis, "History and future of auditory filter models, 2010R. F. Lyon, Cascades of two-poletwo-zero asymmetric resonators are good models of peripheral auditory function, 2011

1717