deeplearning forjazz walking bass transcription · pdf filedeeplearning forjazz walking bass...

Download DeepLearning forJazz Walking Bass Transcription · PDF fileDeepLearning forJazz Walking Bass Transcription ... nWeimar Jazz Database ... A., “Automatic transcription of melody, bass

If you can't read please download the document

Upload: lyhuong

Post on 06-Feb-2018

445 views

Category:

Documents


23 download

TRANSCRIPT

  • Fraunhofer IDMT

    Jakob Abeer1,3, Stefan Balke2, Klaus Frieler3

    Martin Pfleiderer3, Meinard Mller2

    Deep Learning for Jazz Walking Bass Transcription

    1 Semantic Music Technologies Group, Fraunhofer IDMT, Ilmenau, Germany2 International Audio Laboratories Erlangen, Germany3 Jazzomat Research Project, University of Music Franz Liszt, Weimar, Germany

  • Fraunhofer IDMT 2

    n Example: Miles Davis: So What (Paul Chambers: b)

    n Our assumptions for this work:

    n Quarter notes (mostly chord tones)

    n Representation: beat-wise pitch values

    Motivation What is a Walking Bass Line?

    T

    ri A

    gu

    sN

    ura

    dh

    im

    D

    Dm7 (D, F, A, C)

    C A F A D F A D A D A F AF

  • Fraunhofer IDMT 3

    Motivation How is this useful?

    n Harmonic analysis

    n Composition (lead sheet) vs. actual performance

    n Polyphonic transcription from ensemble recordings is challenging

    n Walking bass line can provide first clues about local harmonic changes

    n Features for style & performer classification

  • Fraunhofer IDMT 4

    Problem Setting

    n Challenges

    n Bass is typically not salient

    n Overlap with drums (bass drum) and piano (lower register)

    n High variability of recording quality and playing styles

    n Example: Lester Young Body and Soul

    n Goals

    n Train a DNN to extract bass pitch saliency representation

    n Postprocessing: manual beat-annotations for beat-wise bass pitch

  • Fraunhofer IDMT 5

    Outline

    n Dataset

    n Approach

    n Bass Saliency Mapping

    n Semi-Supervised Model Training

    n Beat-Informed Late Fusion

    n Evaluation

    n Results

    n Summary & Outlook

  • Fraunhofer IDMT 6

    Dataset

    n Weimar Jazz Database (WJD) [1]

    n 456 high-quality jazz solo transcriptions

    n Annotations: solo melody, beats, chords, segments (phrase, chorus ...)

    n 41 files with bass annotations

    n Data augmentation(+): Pitch-shifting +/- 1 semitone (sox library [2])

  • Fraunhofer IDMT 7

    Bass-Saliency Mapping

    n Data-driven approach

    n Use Deep Neural Network (DNN) to learn mapping from magnitudespectrogram to bass saliency representation

    n Spectral Analysis

    n Resampling to 22.05 kHz

    n Constant-Q magnitude spectrogram (librosa [3])

    n Pitch range 28 (41.2 Hz) 67 (392 Hz)

    n Multilabel classification

    n Input dimensions: 40 * NContextFramesn Output dimensions: 40

    n Learning Target: Bass pitch annotations from the WJD

  • Fraunhofer IDMT 8

    DNN Hyperparameter

    n Layer-wise training [4, 5]

    n Least-squares estimate for weight & bias initialization

    n (3-5) fully connected layers, MSE loss

    n Frame-stacking (3-5 context frames) & feature normalization

    n Activation functions: ReLU, Sigmoid (final layer)

    n Dropout & L2 weight regularization

    n Adadelta optimizer

    n Mini-batch size = 500

    n 500 epochs / layer

    n learning rate = 1

  • Fraunhofer IDMT 9

    Semi-Supervised Training

    n Goal: Select prediction on unseen data as additional training data

    F: FeaturesT: Targets

  • Fraunhofer IDMT 10

    Semi-Supervised TrainingSparsity-based Selection

    n Train model on labelled dataset D1+ (3899 notes)

    n Predictions on unlabelled dataset D2+ (11697 notes)

    n Select additional training data via sparsity greater than threshold t

    n Re-training

  • Fraunhofer IDMT 11

    Beat-Informed Late Fusion

    n Use manual beat-annotations from the Weimar Jazz Database

    n Find most salient pitch per beat

    Beats

    Frames

    1 2 3 4

  • Fraunhofer IDMT 12

    Evaluation

    n Use manual beat-annotations from the Weimar Jazz Database

    n Compare against state-of-the-art bass transcription algorithms

    n D: Dittmar, Dressler, and Rosenbauer [8]

    n SG: Salamon, Serr, and Gmez [9]

    n RK: Ryynnen and Klapuri [7]

  • Fraunhofer IDMT 13

    Example

    n Chet Baker: Lets Get Lost (0:04 0:09)

    Initial model

    M1 - without data aug.M1+ - with data aug.

    Semi-supervised learning

    M20,+ - t0M21,+ - t1M22,+ - t2M23,+ - t3

    D - Dittmar et al.SG - Salamon et al.RK - Ryynnen & Klapuri

    M1+

  • Fraunhofer IDMT 14

    Results

  • Fraunhofer IDMT 15

    Summary

    n Data-driven approach seems to enhance non-salient instruments.

    n Beneficial

    n Data augmentation & dataset enlargement

    n Frame stacking (stable bass pitches)

    n Beat-informed late fusion

    n Semi-supervised training did not improve accuracy but made bass-saliencymaps sparser

    n Model is limited to training sets pitch range

  • Fraunhofer IDMT 16

    Acknowledgements

    n Thanks to the authors for sharing bass transcription algorithms / results.

    n Thanks to the students for transcribing the bass lines

    n German Research Foundation

    n DFG-PF 669/7-2 : Jazzomat Research Project (2012-2017)

    n DFG MU 2686/6-1: Stefan Balke and Meinard Mller

  • Fraunhofer IDMT 17

    References

    [1] Weimar Jazz Database: http://jazzomat.hfm-weimar.de[2] sox http://sox.soundforge.net[3] McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O., librosa: Audio

    and Music Signal Analysis in Python, in Proc. of the Scientific Computing with Python conference (Scipy), Austin, Texas, 2015.

    [4] Uhlich, S., Giron, F., and Mitsufuji, Y., Deep neural network based instrument extraction from music, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 21352139, Brisbane, Australia, 2015.

    [5] Balke, S., Dittmar, C., Abeer, J., and Muller, M., Data-Driven Solo Voice Enhancement for Jazz Music Retrieval, in Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017.

    [6] Hoyer, P.O., Non-negative Matrix Factorization with Sparseness Constraints, J. of Machine Learning Research, 5, pp. 14571469, 2004.

    [7] Ryynanen, M.P. and Klapuri, A., Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music J., 32, pp. 7286, 2008

    [8] Dittmar, C., Dressler, K., and Rosenbauer, K., A Toolbox for Automatic Transcription of Polyphonic Music, Proc. of the Audio Mostly Conf., pp. 5865, 2007.

    [9] Salamon, J., Serr, J., and Gmez, E., Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming, Int. J. of Multimedia Inf. Retrieval, 2, pp. 4558, 2013

  • Fraunhofer IDMT 18

    n Jazzomat Research Project & Weimar Jazz Database

    n http://jazzomat.hfm-weimar.de/

    n Python code and trained model available

    n https://github.com/jakobabesser/walking_bass_transcription_dnn

    n Additional online demos

    n https://www.audiolabs-erlangen.de/resources/MIR/2017-AES-WalkingBassTranscription

    Thank You!