automatic chord recognition using neural networks · automatic chord recognition using neural...
TRANSCRIPT
Automatic Chord Recognition using Neural Networks
Audio Signal Processing
Ayoub Ghriss
MVA, 2017
Computational Statistics MVA, 2017 p 1 / 20
Contents
1 Introduction
2 Features Extraction
3 Features Processing
4 Learning Architecture
5 Conclusion
Computational Statistics MVA, 2017 p 2 / 20
Introduction
Computational Statistics MVA, 2017 p 3 / 20
A bit of music and history
• Automatic Chord Recognition, first start in 1991• Classic methods : KNN, Logistic, HMM• Leap in precision with the introduction of NeuralNetworks
Computational Statistics MVA, 2017 p 4 / 20
Chords and Pitch
• Pitch : subjective perception of a note’s height• The pitch system is composed of 12 classes,• An octave means a doubling of the frequency,each octave : fn = 2
112 fn−1
• Define an equivalence relation between notes• A chord is combination of 2 or more pitches
Computational Statistics MVA, 2017 p 5 / 20
Chords and Pitch
Computational Statistics MVA, 2017 p 6 / 20
Features Extraction
Computational Statistics MVA, 2017 p 7 / 20
Short-Time Fourier Transform
• Discrete FT on equally spaced segments of thesong
• Frequency mapping can be scaled (Mel,Logarithmic)
Drawbacks :• Constant resolution and frequency difference• Not suited to represent the pitch class concept
Computational Statistics MVA, 2017 p 8 / 20
Constant Q Transform
fk = fmin.2kB (1)
where k is the frequency index,B is the number ofbins per octave.
X (k ; r) =1Nk
∑n
x(nr)w(n)e j2πnQ/Nk (2)
Q = 12B−1 , and the resolution−1Nk = [Q fs
fk]
Computational Statistics MVA, 2017 p 9 / 20
PCP, the first ACR System
• Can be seen as 1-bin Constant Q Transform• Uses pitches pattern matching using NN andhand crafted score
Takuya Fujishima, Realtime Chord Recognition ofMusical Sound
Computational Statistics MVA, 2017 p 10 / 20
CQT vs STFT : who’s the best ?
Muhammad Huzaifah, Comparison ofTime-Frequency Representations.
Computational Statistics MVA, 2017 p 11 / 20
Features Processing
Computational Statistics MVA, 2017 p 12 / 20
Pre-processing
• Time slicing : concatenating adjacent frames• First low pass filter : yn = αyn−1 + (1− α)xn• A pair of low pass filters : exponentiallyweighted mean ∑
i=−r
ra−|i |x [.+ r ]
• Other papers : Geometric mean/ Median filter
Computational Statistics MVA, 2017 p 13 / 20
Learning Architecture
Computational Statistics MVA, 2017 p 14 / 20
Computational Statistics MVA, 2017 p 15 / 20
Deep Belief Networks
• Layers of fully connected layers of RestrictedBoltzmann Machines
• A stochastic neural network :• One layer of visible units : chords• One layer of hidden units : latent variables• the hidden units of layer i are the visible for thelayer i + 1
• Seen as an out-dated model in the Deepcommunity
Computational Statistics MVA, 2017 p 16 / 20
Deep architectures
• Recurrent Neural NetworksNicolas Boulanger-Lewandowski, AudioChord Recogntion with Recurent NeuralNetworks.
• Convolutional NN:Anis Rojb al., Music Transcription by DeepLearning with Data and “Artificial Semantic”Augmentation
Computational Statistics MVA, 2017 p 17 / 20
Conclusion
Computational Statistics MVA, 2017 p 18 / 20
Conclusions
• Improvement with the introduction with neuralnetworks
• little difference in performance between differentstructures
• Smoothness of solutions to be improved
Matthias Mauch & Katy Noland & SimonDixon (2009) Using Musical Structure toEnhance Automatic Chord Transcription.
Computational Statistics MVA, 2017 p 19 / 20
Conclusions
Computational Statistics MVA, 2017 p 20 / 20