Signal Processing Algorithms for Wireless Acoustic Sensor Networks
Alexander Bertrand
Electrical Engineering Department (ESAT)Katholieke Universiteit Leuven
06-07-2010, University of Oldenburg, MEDI-AKU-SIGNAL Kolloquium
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
Tracking of speech powerNoise reduction
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
4
Traditional sensor array DSP
centralized processing
known / fixed sensor positions
Sensor array DSP
Long distance (SNR drops 6dB for each doubling of distance)
Sharp angle
#microphones is limited
5
Distributed sensor arrays
Wireless acoustic sensor network (WASN)
• More spatial information• More sensors• Subset: high SNR
recordings
6
• Challenges
3) Distributed processing
1) Unknown/changing positions, link failure ADAPTIVE
2) Bandwidth efficiency
4) Subset selection
Distributed sensor arrays
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
Multi-channel Wiener Filtering (MWF)
2
1min ( ) ( ) ( )HE d w
w y
1( )n
1( )d
- Goal: estimate speech component in 1 of the N microphones
- Output = sum of filtered microphone signals:
W1
W2
W3
W4
+ Clean speech
1( )y
( ) ( ) ( ) y d n
Multi-channel Wiener Filtering (MWF)
1( ) ( ) ( )yy yd w R r
1( )n
1( )d
- Goal: estimate speech component in 1 of the N microphones
- Output = sum of filtered microphone signals:
W1
W2
W3
W4
+ Clean speech
1( )y
( ) ( ) ( )Hyy E R y y
* *1 1( ) ( ) ( ) ( ) ( ) ( ) 1 0 ... 0
T
yd ddE d E d r y d R
Multi-channel Wiener Filtering (MWF)
- Goal: estimate speech component in 1 of the N microphones
- Output = sum of filtered microphone signals:
- Needs: - N x N noise+speech correlation matrix Ryy - N x 1 clean speech correlation (column of Rdd)
- Rdd can be estimated using Rdd= Ryy- Rnn using voice activity detection (VAD) mechanism
W1
W2
W3
W4
+ Clean speech
Multi-channel Wiener Filtering (MWF)
RECAP
- Given: N microphone signals
- Choose one (arbitrary) reference microphone
- MWF computes optimal filters such that sum of outputs is as close as possible to speech component in target microphone
Noise frame: destructive interference
Noise = electro music
F1
F2
F3
F4
+
Noise = electro music
F1
F2
F3
F4
+
Speech frame: constructive interference
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
7. Subset selection
8. Conclusions
15
Example: binaural hearing aids
MWF left MWF right
Binaural link
large bandwidth needed
full matrix inversion
= 2-node WASN
16
Example: binaural hearing aids
w11
Binaural link
g12
+
g21 w22
+
Converges to optimum if single desired source
(Doclo et al., 2007)
17
Motivation for DANSE
• > 2 nodes ?e.g. supporting external sensor nodes or multiple hearing aid users.
18
Motivation for DANSE
• > 2 nodes ?e.g. supporting external sensor nodes or multiple hearing aid users.
19
Motivation for DANSE
• > 2 nodes ?e.g. supporting external sensor nodes or multiple hearing aid users.
20
Motivation for DANSE
• > 2 nodes ?e.g. supporting external sensor nodes or multiple hearing aid users.
21
Motivation for DANSE
• > 2 nodes
• Multiple desired sources e.g. conversation monitoring.
22
Motivation for DANSE
• > 2 nodes
• Multiple desired sources e.g. conversation monitoring.
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
24
DANSE
• Previous requires more general framework:Distributed adaptive node-specific signal estimation (DANSE)
• Allows for multiple nodes (fully connected topology)
• Allows for multiple target sources: Estimating K sources requires communication of K-channel signals(DANSEK)
DANSE
Considered here:
• Fully connected WSN
• Multi-channel sensor signal observations
• Goal: each node estimates node-specific signal, but common latent signal subspace (dimension= # targets)
26
3 nodes, fully connected
27
Binaural hearing aids (revisited)
w11
Binaural link
g12
+
g21 w22
+
28
w11(2)
Binaural link
g12(2)
+ +
w11(1) g12(1)
w22(2)g21(2)
w22(1)g21(1)
Converges to optimum if #desired sources ≤ 2
J=2, DANSE2 (K=2)
auxiliary channels(capture signal
space)
Binaural hearing aids (revisited)
29
Binaural link
+ +
J=2, DANSEK
1z
2z
1d 2d
11W 12G 21G 22W
Converges to optimum if K= # desired sources
KK
Binaural hearing aids (revisited)
Sequential updating
Sequential round-robin update
31
DANSE with simultaneous updating
- Simultaneous updating: parallel computing
- Sometimes convergence to optimal solution, but not always
- Solution: relaxation yields convergence and optimality:
newii WWW )1(1
32
Without relaxation (S-DANSE)
4 nodes, 3-6 sensors/node
DANSE with simultaneous updating
33
With relaxation (rS-DANSE)
4 nodes, 3-6 sensors/node
DANSE with simultaneous updating
34
DANSE audio demo (tracking omitted)
Unfiltered
rS-DANSE
Centralized MWF
35
Robust DANSE
- Theory: DANSE == centralized MWF, but…
36
Robust DANSE
- Numerical errors due to:
- Estimation errors in Rdd (especially at low SNR nodes) ripple effect
- Reference microphones are close to each other ill-conditioned basis for signal subspace
- Solution: estimate speech component in communicated signals, preferably from high SNR nodes (= Robust DANSE or R-DANSE)
- Convergence is proven under certain dependency conditions
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
What if not fully connected?
What if not fully connected?
Nodes must pass on information from other nodes
1) Nodes act as relays (virtually fully connected): - huge increase in bandwidth if limited connections- routing problem
2) Nodes broadcast the sum of all filtered inputs:- no increase in bandwidth- no routing problem (?)
40
What if not fully connected?
FEEDBACK !!
What if not fully connected?
What if not fully connected?
- Intuition
- Theoretical analysis
- Conclusion: feedback causes major problems
- Direct feedback (one edge) vs. indirect feedback (loops)
Direct feedback cancellation
• Transmitter feedback cancellation
• Receiver feedback cancellation
Direct feedback cancellation
What if not fully connected?
- Intuition
- Theoretical analysis
- Conclusion: feedback causes major problems
- Direct feedback (one edge) vs. indirect feedback (loops)
- Prune to tree topology T-DANSE (= still optimal output!!)
Outline
1. Introduction
2. Multi-channel Wiener filter (MWF)
3. Example: distributed MWF in binaural hearing aids
4. DANSE in fully connected WASN
5. Tree-DANSE
6. Multi-speaker VAD
47
Multi-speaker VAD
- Goal: Track individual speech power of multiple simultaneous speakers or other non-stationary sources (VAD)
- Exploit spatial diversity from WASN
speaker
microphone
48
Multi-speaker VAD
• Ad-hoc microphone array• Assumptions:
1. Speakers in near-field2. Speakers are independent3. Limited noise/reverberance4. Sources to track are well-grounded (= they attain zero-values)
• Advantages:
• Array geometry unknown
• Speaker positions unknown
• Energy-based low data rate synchronization not crucial
WASN’s !
Data model
Data model
Non-negative blind source separation
- Theorem (Plumbley, 2002):
“An orthogonal mixture of non-negative, well-grounded source signals, that preserves non-negativity, is a permutation of the original signals.”
Exploiting non-negativity and well-groundedness (J=N=2 example)
s1
s2
s1
s2
y=As
Exploiting non-negativity and well-groundedness (J=N=2 example)
s1
s2
Orthogonal transformation preserves uncorrelatedness simple decorrelation (whitening) of measurements gives original up to a rotation
whiten
s1
s2
?
Exploiting non-negativity and well-groundedness (J=N=2 example)
- Well-grounded source signals
y=As
s1
s2
s1
s2
Exploiting non-negativity and well-groundedness (J=N=2 example)
- Well-grounded source signals
s1
s2
whiten
s1
s2
!
Exploiting non-negativity and well-groundedness (J=N=2 example)
- Well-grounded source signals
s1
s2
s1
s2
Non-negative blind source separation
- Theorem (Plumbley, 2002):
“An orthogonal mixture of non-negative, well-grounded source signals, that preserves non-negativity, is a permutation of the original signals.”
- Two different techniques:
1. - Whitening, ignoring non-negativity constraints (=easy)
- Search for rotation matrix that restores non-negativity (=hard)
2. Whitening with non-negativity constraints (=hard)
- 1st approach (Oja & Plumbley) = NPCA (Non-negative principal component analysis)
- 2nd approach (Bertrand & Moonen) = MNICA (Multiplicative non-negative independent component analysis)
MNICA: results
MNICA: results
MNICA: results