information theoretic approaches to data association and fusion in sensor networks
DESCRIPTION
Information Theoretic Approaches to data Association and Fusion in Sensor Networks. John Fisher, Alexander Ihler, Jason Williams , Alan Willsky MIT CSAIL/LIDS Haixiao Cai, Sanjeev Kulkarni, Sergio Verdu Princeton University SensorWeb MURI Review Meeting September 22, 2003. - PowerPoint PPT PresentationTRANSCRIPT
Information Theoretic Approaches to data Association and Fusion in Sensor Networks
John Fisher, Alexander Ihler, Jason Williams , Alan Willsky
MIT CSAIL/LIDS
Haixiao Cai, Sanjeev Kulkarni, Sergio VerduPrinceton University
SensorWeb MURI Review Meeting September 22, 2003
Problem/Motivation
Large number of simple, myopic sensors. Need to perform local fusion to support
global inference (Battlespace Awareness).
Critical need to understand statistical relationships between sensor outputs in the face of many modes of uncertainty (sensors, scene, geometry, etc).
Challenges
Uncertainty in scene and sensor geometry
Complex, dynamic environment Uncalibrated, multi-modal sensors Unknown joint sensor statistics Need fast, low-complexity algorithms
Activity and Accomplishments
Research Application of data association method to multi-
modal (A/V) correspondence problem. A/V is a surrogate for other modalities primarily because
we can easily collect this data (vs. IR, EM, etc.). Extensions and empirical results to multi-modal
feature-aided tracking. Generalization of data association to triangulated
graphs. Improved K-L Divergence/MI estimators. New developments on applied information-theoretic
sensor management.
Activity and Accomplishments Tech Transition
ARL visits Student (Ihler) on-site at ARL Plans to transition Data Association method to DARPA’s
CTS program (Ft. Belvoir installation) Publications
4 conference publications IPSN (2) ICME (invited) ICASSP (invited)
1 journal submission accepted pending 2nd review
3 Sensor Network workshop panels ARO, NSF, SAMSI
A Common Thread Fusion and correspondence are difficult
given the types of sensor uncertainties we are facing.
Various information theoretic measures and the need to estimate them arise naturally in such problems.
Exploiting sensor data subject to a common excitation provides a mechanism for estimating such quantities.
Overview
Estimating Information Theoretic Measures from Sensor Data (MIT, Princeton)
Applications Data Association, Multi-modal Tracking,
Inferring Group Interactions, Sensor Management
Future Directions Information driven sensor fusion
Data Association (last year)
Measurements: Separated signals Direction of arrival
1 signal/2 sensors Localize
>2 signals, 2 sensors Ambiguous Sensor A
A1
Sensor BB1A2
B2
Association as a Hypothesis Test
Assuming independent sources, hypotheses are of the form
Asymptotic comparison of known models to those estimated from a single realization
1 1
2 2
1 2 1 2 1 1 2 21 1 1 2 2
1 2 1 2 1 2 2 12 1 2 2 1
, , , , ,: ,
, , , , ,: ,H Hk
H Hk
A A B B p A B p A BH A B A B
A A B B p A B p A BH A B A B
1 1
2 2
1 1 2 2 1 1 2 2
1 2 2 1 1 2 2 1
ˆ ˆ, , , ,
ˆ ˆ, , , ,H H
H H
p A B p A B p A B p A B
p A B p A B p A B p A B
Asymptotics of Likelihood Ratio
Decomposes into two sets of terms: Statistical
dependencies (groupings)
Differences in model parameterizations
1 1
2 2
1
1 1 1 1 2
2
1 1 2 2
1 2 2 1
1 1 2 2
1 2 1 2 1 2 1 2
1 2 1 2
statistical dependence
model differences
s
, ,1log log
, ,
; ;
, ,
tatist
,
;
ca
;
i
H Hk k
k H Hk k
H
H H H H H
H
p A B p A BL
N p A B p A B
I A B I A B
D p A p A p B p B p A A B B
I A B I A B
2 2 2 2 11 2 1 2 1 2 1 2, , ,
l dependence
model differences
H H H H HD p A p A p B p B p A A B B
Asymptotics of Likelihood Ratio
If we estimate from a single realization: Statistical
dependence terms remain
Model divergences go away
1
1 1 1 1 2
2
1 1 2 2
1 2 2 1
1 1 2 2
1 2 1 2 1 2 1 2
1 2 1 2
statistical dependence
model differences
s
ˆ ˆ, ,1log log
ˆ ˆ, ,
; ;
, , ,
; ;
tatistical de
k k
k k k
H
H H H H H
H
p A B p A BL
N p A B p A B
I A B I A B
D p A p A p B p B p A A B B
I A B I A B
2 2 2 2 11 2 1 2 1 2 1 2, ,
pendence
model differen s
,
ce
H H H H HD p A p A p B p B p A A B B
High Dimensional Data
Learn low-dimensional auxiliary variables which summarize statistical dependency of measurements
1
2
1 2 3 4
1 2 3 4
1 2 3 4 1 1 2 2
1 2 3 4 1 2 1 2
ˆ ˆ, ,1log , log
ˆ ˆ, ,
1log ; ; ; ;
1log ; ; ; ;
lim
lim
k k
k k k
N
H
N
H
p f f p f fL f g
N p g g p g g
L I f f I f f I A B I A BN
L I g g I g g I A B I A BN
1 1 1 2 2 1 3 3 2 4 4 2
1 1 1 2 2 2 3 3 2 4 4 1
k k k k k k k k
k k k k k k k k
f f A f f B f f A f f B
g g A g g B g g A g g B
' , '
1arg max log ,
f s g sL f g
N
AV Association/Correspondence New since last year,
direct application of the 2 sensors/multiple source case Unknown joint
statistics High-dimensional data Varying scene
parameters Surrogate for multi-
modal sensors
consistent
inconsistent
AV Association/Correspondence
0.68
association matrix for 8 subjects
0.61
0.19 0.20
AV Association/Correspondence
0.68
association matrix for 8 subjects
0.61
0.19 0.20
General Structure Tests Generalization to hypothesis tests over graphical
structures How are observations related to each other?
11
: where , , andi
i
Mi i i i
i H j j d j j kj j
H p S S x x S S
vs vs1x
2x
3x
4x
5x
6x
1x
2x
3x
4x
5x
6x
1x
2x
3x
4x
5x
6x
General Structure Tests
1 1 11 1 4 2 2 3 3 5 6, , ,S x x S x x S x x
vs1x
2x
3x
4x
5x
6x
H1
1x
2x
3x
4x
5x
6x
H2
2 2 21 1 3 4 2 2 3 5 6, , ,S x x x S x S x x
12 1 2 ,jk j kS S S j k
12 12 1211 1 4 12 1312 12 1221 3 22 2 2312 12 1231 32 33 5 6
,
,
S x x S S
S x S x S
S S S x x
Intersection Sets - groupings on which the hypotheses agree
General Structure Tests
Asymptotics have a similar decomposition as in the 2-variable case (via the intersection sets):
1 1
1 2
1 1 1 2
1
2 2
2
1 12, ,,
12 2, ,,
1 12 12 2
, ,
2
statistical dep
1 1log lo
endence model differences
gH j t H jk tj j k
t H jk t H j tj k j
H j H jk H jk H jj j k j k jH
H j HH
p S p SL
N N p S p S
D p S p S D p S p S
D p S p
2 1
12 12 1
, ,
statistical dependence model differences
jk H jk H jj j k j k jS D p S p S
General Structure Tests Extension of the previous work on data
association is straightforward for such tests. Estimation from a single realization incurs a
reduction in separability only in terms of the model difference terms.
The “curse of dimensionality” (with respect to density estimation) arises in 2 ways: Individual measurements may be of high dimension
Could still design low dimensional auxiliary variables The number of variables in a group
New results provide a solution
General Structure Tests
The test implies potentially 6 joint densities, but is simplified by looking at the intersection sets.
1x
2x
3x
4x
5x
6x
H2
1 1 1
1
2 2 2
2
1 4 2 3 5 6
2 1 3 4 5 6
2 3 2 3
1 3 4 3 1 4
ˆ ˆ ˆ, , ,1 1log log
ˆ ˆ ˆ, , ,
,
, , ,
t
H H HH
H H HH
p x x p x x p x xL
N N p x p x x x p x x
D p x x p x p x
D p x x x p x p x x
1x
2x
3x
4x
5x
6x
H1
General Structure Tests
High dimensional variables learning auxiliary
variables reduces dimensionality in one aspect.
2 3 2 3
2 3 2 3
1 3 4 1 3 4
1 3 4 1 3 4
ˆ ˆ ˆ,
ˆ ˆ ˆ,
ˆ ˆ ˆ, , ,
ˆ ˆ ˆ, , ,
D p x x p x p x
D p f f p f p f
D p x x x p x p x x
D p g g g p g p g g
1 1 1
1
2 2 2
2
1 4 2 3 5 6
2 1 3 4 5 6
2 3 2 3
1 3 4 3 1 4
ˆ ˆ ˆ, , ,1 1log log
ˆ ˆ ˆ, , ,
,
, , ,
t
H H HH
H H HH
p x x p x x p x xL
N N p x p x x x p x x
D p x x p x p x
D p x x x p x p x x
But we would still have to estimate a 3 dimensional density.
This only gets worse with larger groupings.
K-L Divergence with Permutations Simple idea which mitigates many of the
dimensionality issues. Exploits the fact that the structures are distinguished
by their groupings of variables. Key Ideas:
1. Permuting sample order between groupings maintains the statistical dependency structure.
2. D(X||Y) >= D(f(X)||f(Y)) This has the advantage that we can design a single
(possibly vector-valued) function of all variables rather than one function for each variable.
Currently doing comparitive analysis (bias, variance) with previous approach.
kk
K-L Divergence with Permutations
1x 2x 3x
1x 2x 3xf
1 2 3 1 2 3 1 2 3ˆ , , : , , , ,k
p f x x x x x x p x x x
1 2 3 1 2 3 1 2 3ˆ , , : , ,k
p f x x x x x x p x p x p x
1 2 3 1 2 3, ,D p x x x p x p x p x
More General Structures
Analysis has been extended to comparisons between triangulated graphs.
Can be expressed as sums and differences of product terms.
Admits a wide class of Markov processes.
Modeling Group Interactions Object 3 tries to interpose itself between objects 1
and 2. The graph describes the state (position)
dependency structure.
1,tx
2,tx
3,tx
1, 1tx
2, 1tx
3, 1tx
Modeling Group Interactions
1,tx
2,tx
3,tx
1, 1tx
2, 1tx
3, 1tx
1,tx
2,tx
3,tx
1, 1tx
2, 1tx
3, 1tx
1,tx
2,tx
3,tx
1, 1tx
2, 1tx
3, 1tx
1,tx
2,tx
3,tx
1, 1tx
2, 1tx
3, 1tx
1H 2H 3H
1 2vsH H 2 3vsH H
Previous Work and Current Efforts (Princeton)• Developed fast algorithms based on block sorting for
entropy and divergence estimation for discrete sources.• Simulations and text data show excellent results.
• Have provided analysis of methods showing universal consistency.
• Have recently investigated estimation of mutual information.
• Have recently been investigating performance for hidden Markov sources.
• Currently analyzing performance for hidden Markov sources.• Investigating extensions to continuous alphabet sources.• Applications to various types of data.
A “Distilled” Problem The Problem: How to estimate the
entropy, divergence, and mutual information of two sources based only on one realization from each source ?
Assumption: Both are finite-alphabet,
finite- memory, stationary sources.
Our goal: Want good estimates, fast convergence, and reasonable computational complexity.
Two Approaches to Estimating Mutual Information
• Estimate mutual information via entropy:
I(X;Y) = H(X) + H(Y) – H(X,Y).
• Estimating mutual information via divergence:
I(X;Y)= D(pxy||pxpy).
• We use our entropy and divergence estimators via Burrows-Wheeler Block Sorting Transform.
Estimating Mutual Information
Analysis and simulations shows that both approaches converge to the true value.
Entropy approach appears better than the divergence approach.
Divergence approach does not use the fact that the second distribution pxpy is a product of two marginal distributions.
Hidden Markov Processes X is the underlying Markov Chain. Y is a deterministic mapping of X, or Y
is X observed through a Discrete Memoryless Channel.
Then, Y is a Hidden Markov Process.
Useful in a wide range of applications.
Entropy of HMP
In order to get the mutual information of the input and output of a DMC, we need the entropy of the output, which is a HMP if the input is Markov.
The entropy of HMP can be approximated by an upper bound and a lower bound.
These bounds can be calculated recursively.
)|()(),|( 111
11
dd
dd YYHYHXYYH
)|(lim)(),|(lim 111
11
d
dd
dd
dYYHYHXYYH
Estimating entropy of HMP
MSE of Our Estimators The MSE of our entropy estimator for i.i.d. sources satisfies
The MSE of our mutual information estimator for i.i.d. sources
We have convergence results for divergence estimator and for Markov sources and stationary ergodic sources.
.1
)](var[log1
]))()(ˆ[(22
2
n
OZqn
qHqHE n
.1
)()(
),(logvar
1]));();(ˆ[(
222
nO
YqXq
YXq
nYXIYXIE
yx
xyn
MSE of Entropy Estimator for HMP
We can prove H(Yd|Yd-1,…,Y1) converges to H(Y) exponentially fast w.r.t. d, if the Hidden Markov Process’ mapping satisfies that there exists an , such that for exactly one
We want to further establish the convergence rate of our entropy estimator for HMP.
Ya ai )(.Xi
Association vs the Generative Model
The MI fusion approach is equivalent to learning a latent variable model of the audio video measurements. Random variables:
Parameters, appearance bases:
Simultaneously learn statistics of joint audio/video variables and parameters
as the statistic of association (consistent with the theory)
vv v av vkk kav
k
aa a av akk kav
k
Y n
Y n
vk
ak
avk
avk
vkY
akY
, , , , ,v av v a av an n
, , ,v av a av
;av avI
Incorporating Motion Parameters Extension of multi-modal
fusion to include nuisance parameters Audio is an indirect
pointer to the object of interest.
Combine motion model (nuisance parameters) with audio-video appearance model.
vv v av vkk k kav
k
aa a av akk kav
k
Y T n
Y n
kT
vk
ak
avk
avk
vkY
akY
Incorporating Motion Parameters
example frames average image
without motionmodel
with motionmodel
Information Theoretic Sensor Management
Following Zhao, Shin Reich (2002), Chu, Haussecker, Zhao (2002), Ertin, Fisher, Potter (2003) we’ve started extending IT approaches to sensor management.
Specifically, consider the case where a subset of measurements over time has been incorporated into the belief state. When is it better to incorporate
a measurement from the past versus a new measurement?
How can we efficiently choose a set of measurements (avoid the greedy approach)?
kx1kx 2kx 3kx k Mx
0k Mz
1k Mz
Nk Mz
03kz
13kz
3Nkz
02kz
12kz
2Nkz
01kz
11kz
1Nkz
0kz
1kz
Nkz
Summary Applied association method to multi-modal data New MI/K-L divergence estimators based on
permutation approach Mitigates dimensionality issues, avoids some of the
combinatorics. Extended approach to triangulated graphs.
New estimators for information measures (entropy, divergence, mutual information) based on BWT (block sorting). Doesn’t require knowledge of distribution or parameters
of the sources. Efficient algorithm, good estimates, fast convergence. Significantly outperforms other algorithms tested. Investigating use in several applications including as
component for correspondence and fusion algorithms.