audio-visual source association for string ensembles ......audio-visual source association for...
TRANSCRIPT
![Page 1: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/1.jpg)
Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis
Bochen Li, Chenliang Xu, Zhiyao Duan
University of Rochester
14th Sound and Music Computing ConferenceJuly 5 – 8, 2017Espoo, Finland
114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 2: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/2.jpg)
• Music à multi-modal art form
• See and listen à more enjoyment
• Popular music video streaming service38.4%Music
Others
Background
14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND 2
![Page 3: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/3.jpg)
Background
Multi-modal MIR
• Instrument Recognition
• Playing Activity Detection
• Polyphonic Music Analysis
• Fingering Estimation
• Conductor Following
314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 4: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/4.jpg)
Detected Players
Separated Sound Tracks
String Music Performance
The Problem – Audio-visual Source Association
414TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 5: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/5.jpg)
String Music Performance
Audio-visual Source Association
The Problem – Audio-visual Source Association
514TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 6: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/6.jpg)
• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading
player/instrument
The Problem – Audio-visual Source AssociationApplication
614TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 7: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/7.jpg)
The Problem – Audio-visual Source AssociationApplication
714TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading
player/instrument
![Page 8: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/8.jpg)
The Problem – Audio-visual Source AssociationApplication
814TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
• Intuitive and user-friendly interaction with music performance videos• Smart Music Editor• Concert cameras automatically take close-up shots of the leading
player/instrument
![Page 9: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/9.jpg)
Prior WorkBow Motion Analysis
• Bow Motion <–> Note Onsets
914TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 10: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/10.jpg)
Prior WorkLimitations
• When players have the same rhythm
1014TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 11: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/11.jpg)
Proposed System OverviewVibrato Features for String Instruments
• Vibrato à Audio pitch fluctuations• Vibrato à Fine motions of left hand• Correlate pitch fluctuations with fine motions of left hand
1114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 12: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/12.jpg)
Method – Audio AnalysisScore-informed Source Separation
Vibrato Extraction
[2] Z. Duan and B. Pardo, “Soundprism: An online system for score-informed source separation of music audio,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, pp., 2011.
• Score-informed pitch refinement on separated sources• Auto-correlation on pitch trajectory
• Audio-score alignment• Harmonic mask
1214TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 13: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/13.jpg)
Method – Video AnalysisHand Tracking• Kanade-Lucas-Tomasi (KLT) tracker with 30 feature points• Bounding box: 70*70 pixels, centered at the median position of feature points• Re-initialize feature points every 20 frames
1314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 14: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/14.jpg)
Method – Video AnalysisFine-grained Motion Capture
• Optical flow estimation à pixel-level motion velocities• Average the motion velocities within the bounding box:
• Subtract its moving average to eliminate body motion:
Original Frame Color-encoded Optical Flow v(t)
1414TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 15: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/15.jpg)
Method – Video AnalysisFine-grained Motion Capture
• Principal Component Analysis (PCA) à Identify principal motion along the
fingerboard à 1-D Motion Velocity Curve:
• Integration on V(t)à Motion Displacement Curve:
1514TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 16: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/16.jpg)
Method – Video AnalysisFine-grained Motion Capture
1614TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 17: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/17.jpg)
Method – Source-player Association
Motion Displacement CurvePitch Contour Associated player
Not associated player
Pitch & Motion (normalized)
One note
1714TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 18: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/18.jpg)
Method – Source-player Association
• Note-level matching scoreà Cross-correlation
• Track-level matching scoreà Sum of note-level matching score
Audio track index
Normalized pitch
Normalized motion
Total number of vibrato notes in the p-th track
18
Player index
!-th note
14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 19: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/19.jpg)
Method – Source-player Association
M1,1 M2,1 M3,1 M4,1
M1,2 M2,2 M3,2 M4,2
M1,3 M2,3 M3,3 M4,3
M1,4 M2,4 M3,4 M4,4
Output the permutation that maximizes the association score
• Association score
19
Track-level matching score
Total number of tracks (i.e.,
players)
One permutation
14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 20: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/20.jpg)
ExperimentsDataset: URMP Dataset [3]• Individually recorded and assembled together• 14 instruments, 44 piece arrangements
[3] B. Li *, X. Liu *, K. Dinesh, Z. Duan, and G. Sharma, “Creating a musical performance dataset for multimodal musicanalysis: Challenges, insights, and applications,” IEEE Trans. Multimedia, under review.
2014TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 21: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/21.jpg)
ExperimentsPiece Selection• 19 pieces → 5 duets, 4 trios, 7 quartets, 3 quintets • Selection criteria: contains at most 1 non-string instrument• Same set as the baseline system (bow motion ßà note onset)
Evaluation Measure• Note-level Matching Accuracy:
The % of vibrato notes that are best matched to the correct player, according to the note-level matching score• Piece-level Association Accuracy:
The % of pieces that the correct association is returned, according to the piece-level association score(Polyphony increases à Number of error candidates increases in factorial rate)
2114TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 22: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/22.jpg)
Experiments
Results: Note-level Matching Accuracy
2214TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
Median / Mean
Accuracy by random
guess
![Page 23: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/23.jpg)
ExperimentsResults: Piece-level Association Accuracy
• Overall Accuracy: 94.7% (18 out of 19) Compared with Baseline: 89.5% (based on bow motion/audio onset)
• Error Case: No vibrato is used in the performance
2314TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND
![Page 24: Audio-visual Source Association for String Ensembles ......Audio-visual Source Association for String Ensembles through Multi-modal Vibrato Analysis Bochen Li, Chenliang Xu, Zhiyao](https://reader035.vdocuments.net/reader035/viewer/2022062510/611db795852e55090f7d7726/html5/thumbnails/24.jpg)
Conclusions & Future Work
Future Work
• Combine all motion features in string music
Bow & Vibrato & Body movement & …
• Video à Vibrato analysis (rate & extent)
From monophonic to polyphonic
• Step into woodwind & brass instruments
• Audio-visual Source Separation
24
Conclusions
• Audio-visual source association for string music, by correlating pitch fluctuations and left-hand motions
• Highly effective, not demanding on camera angles
• Limitations: Vibrato is not guaranteed to appear in all pieces
14TH SOUND AND MUSIC COMPUTING CONFERENCE, JULY 5-8, 2017, ESPOO, FINLAND