gct731 fall 2014 topics in music technology - music information retrieval audio and music...
TRANSCRIPT
Introduction to MIR
GCT731 Fall 2014Topics in Music Technology- Music Information RetrievalAudio and Music Representations (Part 2)1OutlinesAudio and Music representations (contd)Frequency scaling using spectrogramPitch scale in musicPitch scale in human hearingMapping frequency scale using STFTConstant-Q transformAuditory Filter bankTools
22MotivationSpectrogram is the most standard way of visualizing sounds.Good to see the harmonic structure of a single tone.However, is it the the best way to visualize musical signals
Musical signalsMusical notes are not linearly scaledThe majority of notes is located in low frequency rangeFrequency is not intuitive for human hearing as well.
3Piano (Chromatic Scale)
Aerosmith Jaded
3Pitch Scale in MusicMusical notes are scaled logarithmically in frequency
Music tuning systemsDifferent ways of sub-dividing the octave Just Intonation: using harmonics e.g.) 1:1, 9:8, 5:4, 4:3, 3:2, 5:3, 15:9, 2:1 (diatonic scale) Pythagorean tuning: 3:2 (5th) for all notesEqual temperament: 1: 21/12 between two adjacent notese.g.) MIDI note (m) and frequency (f) in Hz
4
http://newt.phys.unsw.edu.au/jw/notes.html
4Pitch Scale in Human hearingHuman also perceive tones in a log scale
Psychoacoustic pitch scaleMel scale: based on pitch ratio of tones
Bark scale: based on critical band measurement
Equivalent Regular Bandwidth (EBR) rate: based on critical band measurement but with a different approach5
5Mapping frequency scale using STFTMapping linear frequency scale to a log-like scaleComputing each mapping point by multiplying weight (i.e. interpolation coefficient)
LimitationSimple but time frequency resolutions are still constrained on STFT
6
(M: mapping matrix, X: spectrogram, Y: scaled spectrogram)Constant-Q transformA more sophisticated way of obtaining log-frequency scaleUse a set of sinusoidal kernels (wavelets) such that the frequencies are logarithmically spaced the kernels (i.e. filters) have constant Q = frequency/bandwidth
7
7Constant-Q transformTime-frequency resolutions are not uniformHigh frequency-resolution and low time-resolution in low frequency rangeLow frequency-resolution and high time-resolution in high frequency range
8
Short-Time Fourier TransformConstant-Q transformExample of Constant-Q transform9
Log-frequency Spectrogram(mapping)Log-frequency Spectrogram(Constant-Q transform)
RegularSpectrogramExample of Constant-Q transform10
Log-frequency Spectrogram(mapping)Log-frequency Spectrogram(Constant-Q transform)RegularSpectrogram
Auditory Filter bankA set of filter bank that imitates the magnitude and delay of traveling waves on basilar membrane in cochlear Produce 3-D representation (time-channel-lag) or auditory images
11input
Cochlear Filter banks ...HCHCHC...Stabilize&Combine
outputOval windowHigh Freq.Low Freq.
11Types of Auditory Filter banksGamma-tone Filter banks Gamma-tone
Used in Pattersons Auditory Filter banks based on ERB
Pole-Zero Filter Cascade (Lyon)12
Hair-Cell13(Inner) Hair-cellTransform mechanical movement into neural spikes
Modeled as cascade of Half-wave rectificationCompressionLow-pass filtering
This conducts a non-linear processing Generate new harmonic partialsAssociated with missing fundamentals
Example of Auditory Filter Bank (Correlogram)14Piano (Chromatic Scale)
Example of Auditory Filter Bank (Correlogram)15
Aerosmith JadedToolsAudio Editor and AnalysisAudacityAdobe AuditionPraatSonicVisualizerSndTool
Software LibraryConstant-Q transform Toolbox (Matlab): http://www.cs.tut.fi/sgn/arg/CQT/ Auditory Toolbox (Matlab):https://engineering.purdue.edu/~malcolm/interval/1998-010/Auditory Image Model (C++):https://code.google.com/p/aimc/
16ReferencesConstant-Q transformJ.C. Brown, Calculation of a constant Q transform, 1991 Schrkhuber and Klapuri, Constant-Q transform toolbox for music processing, 2010M. Drfler, N. Holighaus, T. Grill and G. Velasco, Constructing an Invertible Constant-Q Transform with Non-stationary Gabor Frames, 2011Auditory Filter bankR.D. Patterson, M.H. Allerhand, C. Giguere, Timedomain modeling of peripheral auditory processing: A modular architecture and a software platform, 1995R. F. Lyon, Machine Hearing: An Emerging Field, 2010R. F. Lyon, A. C. Katsiamis, and E. M. Drakakis, "History and future of auditory filter models, 2010R. F. Lyon, Cascades of two-poletwo-zero asymmetric resonators are good models of peripheral auditory function, 2011
1717