trends dio and oustic signal...
TRANSCRIPT
![Page 1: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/1.jpg)
Trends
in
Audio and Acoustic Signal Processing
ICASSP 2011
Malcolm Slaney, Yahoo! Research Silicon Valley, USA
Patrick A. Naylor, Imperial College London, UK
![Page 2: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/2.jpg)
What do we mean by ‘Trends’?
ACTIVITY# papers submitted
ACTIVITY# papers submitted
ACHIEVEMENTS
Things we can do now
that we couldn’t beforeACHIEVEMENTS
Things we can do now
that we couldn’t before
CHALLENGES
Things we want to do
but can’t do yetCHALLENGES
Things we want to do
but can’t do yet
OPPORTUNITIES
Things we didn’t
know we
wanted to do
OPPORTUNITIES
Things we didn’t
know we
wanted to do
CURIOSITIES
Things that might be
interesting but we don’t
know what to do with them
CURIOSITIES
Things that might be
interesting but we don’t
know what to do with them
ResearchResearch
![Page 3: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/3.jpg)
![Page 4: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/4.jpg)
Historical Trends – ICASSP Submissions
Year
Su
bm
issi
on
s
![Page 5: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/5.jpg)
Historical Trends – ICASSP Accepted Papers
Year
Acc
ep
ted
Pa
pe
rs
![Page 6: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/6.jpg)
Music at ICASSP
• Three sessions
– SS-L5: Music Signal Processing Exploiting Musical Knowledge
– AE-L3: Music Signal Processing
– AE-P7: Music Signal Processing
• Reasons
– New EDICS
– More content
– Commercially relevant
![Page 7: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/7.jpg)
Big Datasets
• Easy to rip CDs
– Copyright issues
• Million-Song Dataset
– Distribute features
– Columbia and EchoNest
![Page 8: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/8.jpg)
MIREX Competition
![Page 9: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/9.jpg)
Musical Separation
• Sound separation
– Uses
• Understanding (key, melody)
• Transcriptions
• Multipitch estimation
– With better models
• HMM
• Scores
– Techniques
• NMF
• Matching Pursuit
• PLCA
From: POLYPHONIC AUDIO-TO-SCORE ALIGNMENT BASED ON BAYESIAN LATENT HARMONIC ALLOCATION HIDDEN MARKOV MODEL. Akira Maezawa,
Hiroshi G. Okuno, Tetsuya Ogata, Kyoto University, Japan; Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan
![Page 10: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/10.jpg)
Music Research
• Tagging
– Genre
– Emotion
• Miscellaneous
– Morphing
– Similarity
AASP-P2.1: SOUND MORPHING BY FEATURE INTERPOLATION, Marcelo Caetano, Xavier Rodet, Institut de Recherche
et Coordination Acoustique/Musique, France
![Page 11: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/11.jpg)
Applications vs. Algorithms?
![Page 12: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/12.jpg)
If I’m going to be Queen, I suppose I
will not have much time left for
Audio and Acoustic Signal Processing
If I’m going to be Queen, I suppose I
will not have much time left for
Audio and Acoustic Signal Processing
My Expectation
Maximization
algorithm has converged !
My Expectation
Maximization
algorithm has converged !
If there is a trend towards things that look nice (applications), let’s not
lose sight of the fundamental power behind them (algorithms).
![Page 13: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/13.jpg)
Microphone Array Signal Processing
• Hearing aids
• TV / Entertainment
• Linear, planar/cylindrical,
spherical, distributed
• Spacing and orientation
• Localization of sources
• Tracking
• Extraction/Separation
• Inference of room geometry
APPLICATIONS
GEOMETRY and DISTRIBUTION
TASKS
![Page 14: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/14.jpg)
the Eigenmike®
– mh Acoustics
32 elements
8.4 cm rigid sphere
![Page 15: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/15.jpg)
2 - 64 elements, 0.5 m, linear ‘wing’ array
![Page 16: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/16.jpg)
![Page 17: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/17.jpg)
Planar Array Geometry Directivity Index
measured vs theoretical
![Page 18: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/18.jpg)
Source Separation
• ‘Cocktail party problem’
– Colin Cherry in 1950s
– Audio signals from multi-talker
distant talking scenarios
– Behavior of a listener presented
with two speech signals
simultaneously
Colin Cherry
‘TRENDSETTER’
![Page 19: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/19.jpg)
Source Separation
• Determined and underdetermined scenarios
– Clustering based blind source separation
– Permutation problem (EM)
– Reverberation times of, say, 100 - 500 ms
![Page 20: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/20.jpg)
Speech Enhancement
• Dereverberation technology
real-world applications
– Single and multichannel
– Acoustic channel inversion
– Speech and Music
![Page 21: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/21.jpg)
Speech Enhancement
• Dereverberation1 technology
real-world applications
– Single and multichannel
– Acoustic channel inversion
– Speech and Music
[1] P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. Springer, 2010.
![Page 22: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/22.jpg)
Synergies
•• Joint dereverberation and blind source separationJoint dereverberation and blind source separation
![Page 23: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/23.jpg)
•• Speech recognition of reverberant speechSpeech recognition of reverberant speech
CleanSpeechHMM
CleanSpeechHMM
Reverb-erationModel
Reverb-erationModel
![Page 24: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/24.jpg)
Recurring Theme
S P A R S I T Y!
![Page 25: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/25.jpg)
History of Sparsity at ICASSP
Matching PursuitMatching Pursuit
Compressed SensingCompressed Sensing
Deep Belief NetworksDeep Belief Networks
L1 RegularizationL1 Regularization
I thought:
Dumb!
I thought:
Dumb!
![Page 26: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/26.jpg)
Spatial-Temporal Receptive Fields
• Original sparse representation (spikes!)
SS-L7.1: SPEECH PROCESSING WITH A CORTICAL REPRESENTATION OF AUDIO, Nima Mesgarani, Shihab Shamma,
University of Maryland, United States
![Page 27: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/27.jpg)
Deep Belief Network
From: SS-L7.4: LEARNING A BETTER REPRESENTATION OF SPEECH SOUND WAVES USING RESTRICTED BOLTZMANN
MACHINES, Navdeep Jaitly, Geoffrey Hinton, University of Toronto, Canada
These are NOT your average wavelet/Gabor response!!
![Page 28: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/28.jpg)
Nonlinear Modeling via Sparsity
Basis 1
Basis 2
All-Possible Combinations Over-complete Basis
![Page 29: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/29.jpg)
Industrial Perspectives
“Remaining challenges [in source separation] could include BSS for
unknown/dynamic number of sources.”Tomohiro Nakatani NTT Communication Science Laboratories
![Page 30: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/30.jpg)
Industrial Perspectives
Mixed-signal ICs for mobile phones
“Moore’s Law is driving DSP speed and memory capacity … enabling
implementation of sophisticated DSP functions that have resulted from years of
research in acoustic signal processing. The end-user experience is one of natural
wideband voice communication, devoid of acoustic background noise and
unwanted artefacts.”
Anthony Magrath Director of DSP Technology, Wolfson Microelectronics
![Page 31: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/31.jpg)
Industrial Perspectives
“The applications of sound capture, speech enhancement, and audio processing
technologies shift gradually from communications mostly, towards speech
recognition and building natural human-machine interfaces for mobile devices, in
cars, and in our living rooms.”Ivan Tashev Microsoft Research
![Page 32: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/32.jpg)
iTunes Screen Shot
![Page 33: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/33.jpg)
After-thought
• Trend
– Origin: Old English trendan ‘revolve, rotate’, of Germanic origin
• “What goes around, comes around” (?)
![Page 34: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/34.jpg)
Texture
![Page 35: Trends dio and oustic Signal Processingsignalprocessingsociety.org/.../trends-in-audio-and-acoustic-signal...dio and oustic Signal Processing ICASSP 2011 ... Hiroshi G. Okuno,](https://reader034.vdocuments.net/reader034/viewer/2022052608/5ab315fc7f8b9a6b468e11e0/html5/thumbnails/35.jpg)
Sparsity
• Better representations
– Sparse
• Matching pursuit
• DBNs
• features for recognition
– New angles (cortical and textures)
– Subspaces (latent and otherwise)