machine listening in silicon part of: accelerated perception & machine learning in stochastic...

22
Machine Listening in Silicon Part of: “Accelerated Perception & Machine Learning in Stochastic Silicon” project

Upload: andrea-freshwater

Post on 29-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Slide 2 Machine Listening in Silicon Part of: Accelerated Perception & Machine Learning in Stochastic Silicon project Slide 3 Who? UIUC: Students: M. Kim, J. Choi, A. Guzman-Rivera, G. Ko, S. Tsai, E. Kim. Faculty: Paris Smaragdis, Rob Rutenbar, Naresh Shanbhag Intel: Jeff Parkhurst Ryszard Dyrga, Tomasz Szmelczynski Intel Technology Poland Georg Stemmer Intel, Germany Dan Wartski, Ohad Falik Intel Audio Voice and Speech (AVS), Israel Slide 4 Motivating ideas: Make machines that can perceive Use stochastic hardware for stochastic software Discover new modes of computation Machine Listening component: Perceive == Listen Escape local optimum of Gaussian/MSE/ 2 Project overview Slide 5 Making systems that understand sound Think computer vision, but for sound Broad range of fundamentals and applications Machine learning, DSP, psychoacoustics, music, Speech, media analysis, surveying, monitoring, Machine Listening? What can we gather from this? Slide 6 Machine listening in the wild Highlight discovery In videos Incident discovery in streets Surveillance for emergencies Some of this work is already in place Mostly projects on recognition and detection More apps in medical, mechanical, geological, architectural, Slide 7 The CrowdMic project PhotoSynth for audio, construct audio recordings from crowdsourced audio snippets Collaborative audio devices Harnessing the power of untethered open mics E.g. conf-call using all phones and laptops in room And theres more to come Slide 8 Today is all about small form factors We all carry a couple of mics in our pockets, but we dont carry the vector processors they need! Can we come up with new better systems? Which run on more efficient hardware? And perform just as well, or better? The Challenge Slide 9 Sound has a pesky property, additivity We almost always observe sound mixtures Models for sound analysis are monophonic Designed for isolated, clean sounds So we like to first extract and then process The Testbed: Sound Mixtures ++= Slide 10 Theres no shortage of methods (they all suck by the way) But these are computationally some of the most demanding algorithms in audio processing So we instead catered to a different approach that would be a good fit for hardware i.e. Rob told me that he can do MRFs fast Focusing on a single sound Slide 11 We like to visualize sounds as spectrograms 2D representations of energy over time and frequency For multiple mics we observe level differences These are known as ILDs (Interaural Level Differences) A bit of background Slide 12 For each spectrogram pixel we take an ILD And plot their histogram Each sound/location will produce a mode Finding sources Slide 13 Assign each pixel to a source et voila But it looks a little ragged And we use these as labels Slide 14 Thus a Markov Random Field Each pixel is a node that influences its neighbors Incorporates ILDs and smoothness constraints Makes my hardware friends happy Slide 15 The whole pipeline LEFT timefreqtime freq RIGHT Spectrograms Binary, pairwise MRF Observe: ILDs Inference Binary Mask: Which freqs belong to which source at each time point? source0 source1 ~15dB SIR boost Slide 16 Iteration Per pixel depth info Obj. Markov Random Field Nodes: Data cost Edges: Smoothness cost 3D depth map by MRF MAP inference Reusing the same core Oh, and we use this for stereo vision too Slide 17 Our work outperforms up-to-date GPU implementations Performance Result: Single Frame Tsukuba (384x288,16) Real-time BP [Yang 2006] Tile-based BP [Liang 2011] Fast BP [Xiang 2012] Our work GPU NVIDIA GeForce 7900 GTX NVIDIA GeForce 8800 GTS NVIDIA GeForce GTX 260 N/A # Iteration (4 scales) = (5,5,10,2) (B, T I, T O ) = (12, 20, 5) (3 scales) = (9,6,2) T O = 5 Time (msec) 80.897.361.4 26.10 Min. Energy N/A396,953N/A 393,434 Its also pretty fast Slide 18 Error Resilient MRF Inference via ANT Algorithmic Noise Tolerance Power saving by ANT Complexity overhead = 45% Estim.: 42 % at V dd = 0.75V And we made it error resilient Slide 19 ILDs suffer front-back confusion and require some distance between the microphones So we also added Interaural Phase Differences (IPD) Back to source separation again Slide 20 They work best when ILDs fail E.g. when sensors are far apart Input ILD IPD Joint 30cm1cm15cm Why add IPDs? Slide 21 Incorporated NMF-based denoisers Systems that learn by example what to separate Adding one more element Slide 22 Porting the whole system in hardware We havent ported the front-end yet Evaluating the results with speech recognition Extending this model to multiple devices As opposed to one device with multiple mics So whats next? Slide 23 Kim, Smaragdis, Ko, Rutenbar. Stereophonic Spectrogram Segmentation Using Markov Random Fields, in IEEE Workshop for Machine Learning in Signal Processing, 2012 Kim & Smaragdis. Manifold Preserving Hierarchical Topic Models for Quantization and Approximation, in International Conference on Machine Learning, 2013 Kim & Smaragdis Single Channel Source Separation Using Smooth Nonnegative Matrix Factorization with Markov Random Fields, in IEEE Workshop for Machine Learning in Signal Processing, 2013 Kim & Smaragdis. Non-Negative Matrix Factorization for Irregularly-Spaced Transforms, in IEEE Workshop for Applications of Signal Processing in Audio and Acoustics, 2013 Traa & Smaragdis. 2013. Blind Multi-Channel Source Separation by Circular-Linear Statistical Modeling of Phase Differences, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013 Choi, Kim, Rutenbar, Shanbhag. Error Resilient MRF Message Passing Hardware for Stereo Matching via Algorithmic Noise Tolerance, IEEE Workshop on Signal Processing Systems, 2013 Zhang, Ko, Choi, Tsai, Kim, Rivera, Rutenbar, Smaragdis, Park, Narayanan, Xin, Mutlu, Li, Zhao, Chen, Iyer. EMERALD: Characterization of Emerging Applications and Algorithms for Low-power Devices, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, 2013 Relevant publications