msee defense

Post on 09-Jun-2015

251 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment

Phil Townsend

MSEE Candidate

University of Kentucky

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Overview1) Introduction

- Adaptive Beamforming and the GSC

2) Amplitude Scaling Improvements

- 1/r Model, Acoustic Physics, Statistical

3) Automatic Target Alignment

- Thresholded Cross Correlation using PHAT-β

4) Array Geometry Analysis

- Volumetric Beamfield Plots

- Monte Carlo Test of Geometric Parameters

5) Final Conclusions and Questions

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Part 1: Introduction

• What's beamforming?• A spatial filter that enhances sound

based on its spatial position through the coherent processing of signals from distributed microphones.– Reduce room noise/effects– Suppress interfering speakers

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Adaptive Beamforming• Optimization of Generalized Filter

Coefficients

– Often requires minimizing output energy while keeping target component unchanged

• Estimate statistics on the fly– Input Correlation Matrix unknown/changing

• Gradient Descent Toward Optimal Taps– Constrained Lowest Energy Output Forms

Unique Minimum to Bowl-Shaped Surface

y [ n]=W optT [ n] X [n ]

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Visualization of Gradient Descent

From http://en.wikipedia.org/wiki/Gradient_descent; Image in Public Domain

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Generalized Sidelobe Canceller (GSC)

• Simplifies Frost's constrained adaptation into two stages– A fixed, Delay-Sum Beamformer– A Blocking Matrix that's adaptively filtered

and subtracted.– Adaptation can be any algorithm; we use

NLMS here– Simplification comes mostly from enforcing

distortionless response

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

GSC (con't)• Upper branch DSB result

• Lower branch BM tracks are

where traditional Blocking Matrix is

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

GSC (con't)• Final output is

• Adaption algorithm for each BM track is (NLMS, much faster than constrained)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Limitations of Current Models and Methods

• Blocking Matrix Leakage

– Farfield assumption not valid for immsersive microphone arrays

– Target steering might be incorrect

• Most research limited to equispaced linear arrays

– Hard to construct

– Limited useful frequency range

– Want to explore other geometries and find the best

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Part 2: Amplitude Correction

• Nearfield acoustics means target component has different amplitude in each microphone

• Propose and test a few models to correct cancellation– 1/r Model– Sound propagation filtering– Statistical filtering

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Simple 1/r Model

• The acoustic wave equation is solved by a function inversely proportional in r

• so make a BM using that fact (keep tracks in distance order)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

ISO Acoustic Physics Model• Fluid dynamics can be taken into

account to design a filter based on distance, temperature, humidity, and pressure (ISO standard 9613)

• Might allow us to add easily-obtainable information to enhance beamforming

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Statistical Amplitude Scaling• Lump all corruptive effects together and

minimize energy of difference of tracks

• Carry out as a function of frequency to get

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

ISO and Statistical BM's• ISO Model (Frequency Domain)

• Statistical Scaling (Frequency Domain)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

A Perfect Blocking Matrix

• Audio Cage data was collected with targets and speakers separate, so a perfect BM can be simulated

• Shows upper bound on possible improvement

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Experimental Evaluation of Methods

• Set initial intelligibility to around .3• Beamform for many target and noise

scenarios• Find mean correlation coefficient of BM

tracks (want as low as possible) and overall output (want as large as possible)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Results• Most real methods make little difference

– Statistical scaling a little worse b/c of bad SNR

– ISO filtering a little better b/c of more info– 1/r model made no difference

• Perfect BM made slight improvement, but array geometry was most important!

• Listen to some examples...

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Output Correlation Chart

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

BM Correlation Chart

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Part 3: Automatic Steering• If steering delays aren't right then target

signal leakage occurs and DSB is weaker.

• Cross correlation is a highly robust technique for finding similarities between signals, so use to fine tune delays

• Apply window and correlation strength thresholds to try to improve performance in poor SNR environment

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

GCC and PHAT-β• Find the cross correlation between tracks

over only a small window of possible movements

and whiten to make the spike stand out

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Correlation Coefficient Threshold • Since environment is noisy and speaker

might go silent, update only if max correlation is sufficiently strong

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Experimental Evaluation• Same setup as before

– Initial intel ~.3– Find output correlation with closest mic

• Vary correlation threshold .1 to .9

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Results• Tighter threshold better but updates never help

vs original GSC– Low threshold: erratic focal point movement– High threshold: can't recover from bad

updates– Low SNR makes good estimates very

difficult

• Retrace of lags (multilateration) shows search window D should be tighter

• Array geometry still more important

• Listen to some more examples...

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Output Correlation ChartNormal GSC Performance for Comparison

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Part 4: Array Geometry• Since array geometry is the most

important factor, we need to find what the best layouts are and why

• Start by generating beamfields to visualize array performance and look for patterns qualitatively

• Then propose parameters and run computer simulations quantitatively

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Volumetric Beamfield Plots• GSC beamfield changes over time, but

DSB is root of the system and performance is constant.

• Need to see performance in three dimensions

• Use layered approach with colors to indicate intensity and transparency to see features inside the space

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Linear Array• Generally good performance

– Office too small for sidelobes to appear

• Mainlobe elongated toward array

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Perimeter Array• Also generally good

– Very tight mainlobe

• No height resolution– Not a problem in an office though– Motivation for ceiling arrays

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Random Arrays• Performance highly variable

– One best of the lot, one very bad

• Need to find ways to describe and select best random arrays (coming soon)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

A Monte Carlo Experiment for Analysis of Geometry

• Propose the following parameters for describing array geometry in 2D and evaluate array performance for many randomly-chosen geometries:– Centroid

• Array center of gravity (mean position)

– Dispersion• Mic spread (standard deviation of positions)

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Parameter Examples

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Monte Carlo (con't)• For a given centroid and dispersion,

evaluate the array based on:– PSR – Peak to Side lobe Ratio

• Worst-case interference

– MLW – Main Lobe Width• Tightness of enhancement area• Redefined in 2D to use x and y 3dB widths

w3dB= x3dB2 y3dB2

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Monte Carlo Simulation

• Test variation of one parameter while holding the other constant.

• Generate random positions from an 8x8m square and target a sound source 1m below center

• Choose 120 random geometries for each run (a “class” of arrays)

• Compare to rectangular array

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Layout

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Centroid Displacement

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Dispersion

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Results• Centroid centered over target always best

– Irregular arrays more robust when centroid shifts

• Dispersion a classic tradeoff– Tightly-packed array: tight mainlobe but strong

sidelobes– Widely-spread array: wide mainlobe but weak

sidelobes

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Part 5. Final Conclusions & Future Work• Statistical methods for improving GSC ineffective

– Low SNR introduces large error

• Introducing separate, concrete info helped

– ISO model gave a tiny improvement

– More accurate target position (laser, SSL) always best for steering

• Array geometry is most important to improving performance

– Linear array good, but random arrays have potential to do better

– Found that a ceiling array should be centered over its intended target, but...

– Open question: how does one describe the best array for beamforming on human speech?

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Special Thanks

• Advisor– Dr. Kevin Donohue

• Thesis Committee Members– Dr. Jens Hannemann– Dr. Samson Cheung

• Everyone at the UK Vis Center

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Questions?

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Extra Slides

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

Frost Algorithm• Solution to the constrained optimization

subject to the constraint (C a selection matrix)

The constraint vector dictates the sum of column weights, often F = [1 0 0 0...]

• Solution (P and F constant matrices):

www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

top related