msee defense
TRANSCRIPT
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment
Phil Townsend
MSEE Candidate
University of Kentucky
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Overview1) Introduction
- Adaptive Beamforming and the GSC
2) Amplitude Scaling Improvements
- 1/r Model, Acoustic Physics, Statistical
3) Automatic Target Alignment
- Thresholded Cross Correlation using PHAT-β
4) Array Geometry Analysis
- Volumetric Beamfield Plots
- Monte Carlo Test of Geometric Parameters
5) Final Conclusions and Questions
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Part 1: Introduction
• What's beamforming?• A spatial filter that enhances sound
based on its spatial position through the coherent processing of signals from distributed microphones.– Reduce room noise/effects– Suppress interfering speakers
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Adaptive Beamforming• Optimization of Generalized Filter
Coefficients
– Often requires minimizing output energy while keeping target component unchanged
• Estimate statistics on the fly– Input Correlation Matrix unknown/changing
• Gradient Descent Toward Optimal Taps– Constrained Lowest Energy Output Forms
Unique Minimum to Bowl-Shaped Surface
y [ n]=W optT [ n] X [n ]
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Visualization of Gradient Descent
From http://en.wikipedia.org/wiki/Gradient_descent; Image in Public Domain
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Generalized Sidelobe Canceller (GSC)
• Simplifies Frost's constrained adaptation into two stages– A fixed, Delay-Sum Beamformer– A Blocking Matrix that's adaptively filtered
and subtracted.– Adaptation can be any algorithm; we use
NLMS here– Simplification comes mostly from enforcing
distortionless response
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
GSC (con't)• Upper branch DSB result
• Lower branch BM tracks are
where traditional Blocking Matrix is
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
GSC (con't)• Final output is
• Adaption algorithm for each BM track is (NLMS, much faster than constrained)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Limitations of Current Models and Methods
• Blocking Matrix Leakage
– Farfield assumption not valid for immsersive microphone arrays
– Target steering might be incorrect
• Most research limited to equispaced linear arrays
– Hard to construct
– Limited useful frequency range
– Want to explore other geometries and find the best
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Part 2: Amplitude Correction
• Nearfield acoustics means target component has different amplitude in each microphone
• Propose and test a few models to correct cancellation– 1/r Model– Sound propagation filtering– Statistical filtering
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Simple 1/r Model
• The acoustic wave equation is solved by a function inversely proportional in r
• so make a BM using that fact (keep tracks in distance order)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
ISO Acoustic Physics Model• Fluid dynamics can be taken into
account to design a filter based on distance, temperature, humidity, and pressure (ISO standard 9613)
• Might allow us to add easily-obtainable information to enhance beamforming
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Statistical Amplitude Scaling• Lump all corruptive effects together and
minimize energy of difference of tracks
• Carry out as a function of frequency to get
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
ISO and Statistical BM's• ISO Model (Frequency Domain)
• Statistical Scaling (Frequency Domain)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
A Perfect Blocking Matrix
• Audio Cage data was collected with targets and speakers separate, so a perfect BM can be simulated
• Shows upper bound on possible improvement
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Experimental Evaluation of Methods
• Set initial intelligibility to around .3• Beamform for many target and noise
scenarios• Find mean correlation coefficient of BM
tracks (want as low as possible) and overall output (want as large as possible)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Results• Most real methods make little difference
– Statistical scaling a little worse b/c of bad SNR
– ISO filtering a little better b/c of more info– 1/r model made no difference
• Perfect BM made slight improvement, but array geometry was most important!
• Listen to some examples...
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Output Correlation Chart
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
BM Correlation Chart
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Part 3: Automatic Steering• If steering delays aren't right then target
signal leakage occurs and DSB is weaker.
• Cross correlation is a highly robust technique for finding similarities between signals, so use to fine tune delays
• Apply window and correlation strength thresholds to try to improve performance in poor SNR environment
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
GCC and PHAT-β• Find the cross correlation between tracks
over only a small window of possible movements
and whiten to make the spike stand out
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Correlation Coefficient Threshold • Since environment is noisy and speaker
might go silent, update only if max correlation is sufficiently strong
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Experimental Evaluation• Same setup as before
– Initial intel ~.3– Find output correlation with closest mic
• Vary correlation threshold .1 to .9
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Results• Tighter threshold better but updates never help
vs original GSC– Low threshold: erratic focal point movement– High threshold: can't recover from bad
updates– Low SNR makes good estimates very
difficult
• Retrace of lags (multilateration) shows search window D should be tighter
• Array geometry still more important
• Listen to some more examples...
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Output Correlation ChartNormal GSC Performance for Comparison
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Part 4: Array Geometry• Since array geometry is the most
important factor, we need to find what the best layouts are and why
• Start by generating beamfields to visualize array performance and look for patterns qualitatively
• Then propose parameters and run computer simulations quantitatively
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Volumetric Beamfield Plots• GSC beamfield changes over time, but
DSB is root of the system and performance is constant.
• Need to see performance in three dimensions
• Use layered approach with colors to indicate intensity and transparency to see features inside the space
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Linear Array• Generally good performance
– Office too small for sidelobes to appear
• Mainlobe elongated toward array
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Perimeter Array• Also generally good
– Very tight mainlobe
• No height resolution– Not a problem in an office though– Motivation for ceiling arrays
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Random Arrays• Performance highly variable
– One best of the lot, one very bad
• Need to find ways to describe and select best random arrays (coming soon)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
A Monte Carlo Experiment for Analysis of Geometry
• Propose the following parameters for describing array geometry in 2D and evaluate array performance for many randomly-chosen geometries:– Centroid
• Array center of gravity (mean position)
– Dispersion• Mic spread (standard deviation of positions)
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Parameter Examples
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Monte Carlo (con't)• For a given centroid and dispersion,
evaluate the array based on:– PSR – Peak to Side lobe Ratio
• Worst-case interference
– MLW – Main Lobe Width• Tightness of enhancement area• Redefined in 2D to use x and y 3dB widths
w3dB= x3dB2 y3dB2
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Monte Carlo Simulation
• Test variation of one parameter while holding the other constant.
• Generate random positions from an 8x8m square and target a sound source 1m below center
• Choose 120 random geometries for each run (a “class” of arrays)
• Compare to rectangular array
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Layout
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Centroid Displacement
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Dispersion
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Results• Centroid centered over target always best
– Irregular arrays more robust when centroid shifts
• Dispersion a classic tradeoff– Tightly-packed array: tight mainlobe but strong
sidelobes– Widely-spread array: wide mainlobe but weak
sidelobes
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Part 5. Final Conclusions & Future Work• Statistical methods for improving GSC ineffective
– Low SNR introduces large error
• Introducing separate, concrete info helped
– ISO model gave a tiny improvement
– More accurate target position (laser, SSL) always best for steering
• Array geometry is most important to improving performance
– Linear array good, but random arrays have potential to do better
– Found that a ceiling array should be centered over its intended target, but...
– Open question: how does one describe the best array for beamforming on human speech?
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Special Thanks
• Advisor– Dr. Kevin Donohue
• Thesis Committee Members– Dr. Jens Hannemann– Dr. Samson Cheung
• Everyone at the UK Vis Center
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Questions?
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Extra Slides
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
Frost Algorithm• Solution to the constrained optimization
subject to the constraint (C a selection matrix)
The constraint vector dictates the sum of column weights, often F = [1 0 0 0...]
• Solution (P and F constant matrices):
www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257