audio: generation & extractionfidler/teaching/2015/slides/csc2523/... · experiment 1- learning...
TRANSCRIPT
![Page 1: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/1.jpg)
Audio:Generation&Extraction
CharuJaiswal
![Page 2: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/2.jpg)
MusicComposition– whichapproach?
• FeedforwardNNcan’tstoreinformationaboutpast(orkeeptrackofpositioninsong)• RNNasasinglesteppredictorstrugglewithcomposition,too
• Vanishinggradientsmeanserrorflowvanishesorgrowsexponentially• Networkcan’tdealwithlong-termdependencies
• Butmusicisallaboutlong-termdependencies!
2
![Page 3: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/3.jpg)
Music
• Long-termdependenciesdefinestyle:• Spanningbarsandnotescontributetometricalandphrasalstructure
• Howdoweintroducestructureatmultiplelevels?• EckandSchmidhuberàLSTM
3
![Page 4: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/4.jpg)
WhyLSTM?
• Designedtoobtainconstanterrorflowthroughtime• Protecterrorfromperturbations
• Uses linearunitstoovercomedecayproblemswithRNN
• Inputgate:protectsflowfromperturbationbyirrelevantinputs• Outputgate:protectsotherunitsfromperturbationfromirrelevantmemory• Forgetgate:resetmemorycellwhencontentisobsolete
Hochreiter &Schmidhuber, 1997 4
![Page 5: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/5.jpg)
DataRepresentation
Chords:
Notes:
EckandSchmidhuber,2002 5
Onlyquarternotes
Norests
TrainingmelodieswrittenbyEck
Datasetof4096segments
![Page 6: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/6.jpg)
Experiment1- LearningChords
• Objective:showthatLSTMcanlearn/representchordstructureintheabsenceofmelody• Network:• 4cellblocksw/2cellseacharefullyconnectedtoeachother+input• Outputlayerisfullyconnectedtoallcellsandtoinputlayer
• Training&testing:predictprobabilityofanotebeingonoroff• Usenetworkpredictionsforensuingtimestepswithdecisionthreshold• CAVEAT:treatoutputsasstatisticallyindependent.Thisisuntrue!(Issue#1)
• Result:generatedchordsequences
6
![Page 7: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/7.jpg)
Experiment2– LearningMelodyandChords
• CanLSTMlearnchord&melodystructure,andusethesestructuresforcomposition?• Network:• Differenceforex1.:chordcellblockshaverecurrentconnectionstothemselves+melody;melodycellblocksareonlyrecurrentlyconnectedtomelody
• Training:predictprobabilityforanotetobeonoroff
7
![Page 8: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/8.jpg)
Samplecomposition
• Trainingset:http://people.idsia.ch/~juergen/blues/train.32.mp3
• Chord+melodysample:http://people.idsia.ch/~juergen/blues/lstm_0224_1510.32.mp3
8
![Page 9: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/9.jpg)
Issues
• Noobjectivewaytojudgequalityofcompositions• Repetitionandsimilaritytotrainingset• Considerednotestobeindependent• Limitedtoquarternotes+norests• Usessymbolicrepresentations(modifiedsheetnotation)à howcouldithandlereal—timeperformancemusic(MIDIoraudio)• Wouldallowinteraction(liveimprovisation)
9
![Page 10: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/10.jpg)
AudioExtraction(sourceseparation)
• Howdoweseparatesources?• Engineeringapproach:decomposemixedaudiosignalintospectrogram,assigntime-frequencyelementtosource• Idealbinarymask:eachelementisattributedtosourcewithlargestmagnitudeinthesourcespectrogram• Thisisthenusedtoest.referenceseparation
10
![Page 11: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/11.jpg)
DNNApproach
• Dataset:63popsongs(50fortraining)• binarymaskcomputed:determinedbycomparingmagnitudesofvocal/non-vocalspectrogramsandassigningmaska‘1’whenvocalhadgreatermag
11
![Page 12: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/12.jpg)
DNN
• Trainedafeed-forwardDNNtopredictbinarymasksforseparatingvocalandnon-vocalsignalsforasong• Spectrogramwindowwasunpackedintoavector• Probabilisticbinarymask:testingusedslidingwindow,andoutputofmodeldescribedpredictionsofbinarymaskinslidingwindowformat• Confidencethreshhold (alpha):Mv binarymask
12
![Page 13: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/13.jpg)
SeparationofsourcesusingDNN
13
![Page 14: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/14.jpg)
Separationqualityasafunctionofalpha
14
SIR(red)=signal-to-interferenceratio
SDR(green)=signal-to-distortion
SAR(blue) =signal-to-artefact
SARandSIRcanbeinterpretedasenergeticequivalentsofpositivehitrate(SIR)andfalsepositiverate(SAR)
![Page 15: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/15.jpg)
Like-to-likeComparison
15
PlotsmeanSARasafunctionofmeanSIRforbothmodels
DNNprovides~3dBbetterSARperformance foragivenSIRindexmean,~5dBforvocalandandonlyasmalladvantagefornon-vocalsignals
DNNseemstohavebiaseditslearnings towardmakinggoodpredictionsviacorrectpositiveidentificationofvocalsounds
![Page 16: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence](https://reader030.vdocuments.net/reader030/viewer/2022021719/5b9ed03609d3f2e02c8c3f20/html5/thumbnails/16.jpg)
CritiqueofPaper+NextSteps
• DNNseemstohavebiaseditslearningstowardmakinggoodpredictionsviacorrectpositiveidentificationofvocalsounds• OnlyasmalladvantagetousingDNNvs.traditionalapproach• Expanddataset
16