deeplearning forjazz walking bass transcription · pdf filedeeplearning forjazz walking bass...
Post on 06-Feb-2018
445 Views
Preview:
TRANSCRIPT
Fraunhofer IDMT
Jakob Abeer1,3, Stefan Balke2, Klaus Frieler3
Martin Pfleiderer3, Meinard Mller2
Deep Learning for Jazz Walking Bass Transcription
1 Semantic Music Technologies Group, Fraunhofer IDMT, Ilmenau, Germany2 International Audio Laboratories Erlangen, Germany3 Jazzomat Research Project, University of Music Franz Liszt, Weimar, Germany
Fraunhofer IDMT 2
n Example: Miles Davis: So What (Paul Chambers: b)
n Our assumptions for this work:
n Quarter notes (mostly chord tones)
n Representation: beat-wise pitch values
Motivation What is a Walking Bass Line?
T
ri A
gu
sN
ura
dh
im
D
Dm7 (D, F, A, C)
C A F A D F A D A D A F AF
Fraunhofer IDMT 3
Motivation How is this useful?
n Harmonic analysis
n Composition (lead sheet) vs. actual performance
n Polyphonic transcription from ensemble recordings is challenging
n Walking bass line can provide first clues about local harmonic changes
n Features for style & performer classification
Fraunhofer IDMT 4
Problem Setting
n Challenges
n Bass is typically not salient
n Overlap with drums (bass drum) and piano (lower register)
n High variability of recording quality and playing styles
n Example: Lester Young Body and Soul
n Goals
n Train a DNN to extract bass pitch saliency representation
n Postprocessing: manual beat-annotations for beat-wise bass pitch
Fraunhofer IDMT 5
Outline
n Dataset
n Approach
n Bass Saliency Mapping
n Semi-Supervised Model Training
n Beat-Informed Late Fusion
n Evaluation
n Results
n Summary & Outlook
Fraunhofer IDMT 6
Dataset
n Weimar Jazz Database (WJD) [1]
n 456 high-quality jazz solo transcriptions
n Annotations: solo melody, beats, chords, segments (phrase, chorus ...)
n 41 files with bass annotations
n Data augmentation(+): Pitch-shifting +/- 1 semitone (sox library [2])
Fraunhofer IDMT 7
Bass-Saliency Mapping
n Data-driven approach
n Use Deep Neural Network (DNN) to learn mapping from magnitudespectrogram to bass saliency representation
n Spectral Analysis
n Resampling to 22.05 kHz
n Constant-Q magnitude spectrogram (librosa [3])
n Pitch range 28 (41.2 Hz) 67 (392 Hz)
n Multilabel classification
n Input dimensions: 40 * NContextFramesn Output dimensions: 40
n Learning Target: Bass pitch annotations from the WJD
Fraunhofer IDMT 8
DNN Hyperparameter
n Layer-wise training [4, 5]
n Least-squares estimate for weight & bias initialization
n (3-5) fully connected layers, MSE loss
n Frame-stacking (3-5 context frames) & feature normalization
n Activation functions: ReLU, Sigmoid (final layer)
n Dropout & L2 weight regularization
n Adadelta optimizer
n Mini-batch size = 500
n 500 epochs / layer
n learning rate = 1
Fraunhofer IDMT 9
Semi-Supervised Training
n Goal: Select prediction on unseen data as additional training data
F: FeaturesT: Targets
Fraunhofer IDMT 10
Semi-Supervised TrainingSparsity-based Selection
n Train model on labelled dataset D1+ (3899 notes)
n Predictions on unlabelled dataset D2+ (11697 notes)
n Select additional training data via sparsity greater than threshold t
n Re-training
Fraunhofer IDMT 11
Beat-Informed Late Fusion
n Use manual beat-annotations from the Weimar Jazz Database
n Find most salient pitch per beat
Beats
Frames
1 2 3 4
Fraunhofer IDMT 12
Evaluation
n Use manual beat-annotations from the Weimar Jazz Database
n Compare against state-of-the-art bass transcription algorithms
n D: Dittmar, Dressler, and Rosenbauer [8]
n SG: Salamon, Serr, and Gmez [9]
n RK: Ryynnen and Klapuri [7]
Fraunhofer IDMT 13
Example
n Chet Baker: Lets Get Lost (0:04 0:09)
Initial model
M1 - without data aug.M1+ - with data aug.
Semi-supervised learning
M20,+ - t0M21,+ - t1M22,+ - t2M23,+ - t3
D - Dittmar et al.SG - Salamon et al.RK - Ryynnen & Klapuri
M1+
Fraunhofer IDMT 14
Results
Fraunhofer IDMT 15
Summary
n Data-driven approach seems to enhance non-salient instruments.
n Beneficial
n Data augmentation & dataset enlargement
n Frame stacking (stable bass pitches)
n Beat-informed late fusion
n Semi-supervised training did not improve accuracy but made bass-saliencymaps sparser
n Model is limited to training sets pitch range
Fraunhofer IDMT 16
Acknowledgements
n Thanks to the authors for sharing bass transcription algorithms / results.
n Thanks to the students for transcribing the bass lines
n German Research Foundation
n DFG-PF 669/7-2 : Jazzomat Research Project (2012-2017)
n DFG MU 2686/6-1: Stefan Balke and Meinard Mller
Fraunhofer IDMT 17
References
[1] Weimar Jazz Database: http://jazzomat.hfm-weimar.de[2] sox http://sox.soundforge.net[3] McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O., librosa: Audio
and Music Signal Analysis in Python, in Proc. of the Scientific Computing with Python conference (Scipy), Austin, Texas, 2015.
[4] Uhlich, S., Giron, F., and Mitsufuji, Y., Deep neural network based instrument extraction from music, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 21352139, Brisbane, Australia, 2015.
[5] Balke, S., Dittmar, C., Abeer, J., and Muller, M., Data-Driven Solo Voice Enhancement for Jazz Music Retrieval, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017.
[6] Hoyer, P.O., Non-negative Matrix Factorization with Sparseness Constraints, J. of Machine Learning Research, 5, pp. 14571469, 2004.
[7] Ryynanen, M.P. and Klapuri, A., Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music J., 32, pp. 7286, 2008
[8] Dittmar, C., Dressler, K., and Rosenbauer, K., A Toolbox for Automatic Transcription of Polyphonic Music, Proc. of the Audio Mostly Conf., pp. 5865, 2007.
[9] Salamon, J., Serr, J., and Gmez, E., Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming, Int. J. of Multimedia Inf. Retrieval, 2, pp. 4558, 2013
Fraunhofer IDMT 18
n Jazzomat Research Project & Weimar Jazz Database
n http://jazzomat.hfm-weimar.de/
n Python code and trained model available
n https://github.com/jakobabesser/walking_bass_transcription_dnn
n Additional online demos
n https://www.audiolabs-erlangen.de/resources/MIR/2017-AES-WalkingBassTranscription
Thank You!
top related