information theoretic approaches to data association and fusion in sensor networks

Information Theoretic Approaches to data Association and Fusion in Sensor Networks

John Fisher, Alexander Ihler, Jason Williams , Alan Willsky

MIT CSAIL/LIDS

Haixiao Cai, Sanjeev Kulkarni, Sergio VerduPrinceton University

SensorWeb MURI Review Meeting September 22, 2003

Problem/Motivation

Large number of simple, myopic sensors. Need to perform local fusion to support

global inference (Battlespace Awareness).

Critical need to understand statistical relationships between sensor outputs in the face of many modes of uncertainty (sensors, scene, geometry, etc).

Challenges

Uncertainty in scene and sensor geometry

Complex, dynamic environment Uncalibrated, multi-modal sensors Unknown joint sensor statistics Need fast, low-complexity algorithms

Activity and Accomplishments

Research Application of data association method to multi-

modal (A/V) correspondence problem. A/V is a surrogate for other modalities primarily because

we can easily collect this data (vs. IR, EM, etc.). Extensions and empirical results to multi-modal

feature-aided tracking. Generalization of data association to triangulated

graphs. Improved K-L Divergence/MI estimators. New developments on applied information-theoretic

sensor management.

Activity and Accomplishments Tech Transition

ARL visits Student (Ihler) on-site at ARL Plans to transition Data Association method to DARPA’s

CTS program (Ft. Belvoir installation) Publications

4 conference publications IPSN (2) ICME (invited) ICASSP (invited)

1 journal submission accepted pending 2nd review

3 Sensor Network workshop panels ARO, NSF, SAMSI

A Common Thread Fusion and correspondence are difficult

given the types of sensor uncertainties we are facing.

Various information theoretic measures and the need to estimate them arise naturally in such problems.

Exploiting sensor data subject to a common excitation provides a mechanism for estimating such quantities.

Overview

Estimating Information Theoretic Measures from Sensor Data (MIT, Princeton)

Applications Data Association, Multi-modal Tracking,

Inferring Group Interactions, Sensor Management

Future Directions Information driven sensor fusion

Data Association (last year)

Measurements: Separated signals Direction of arrival

1 signal/2 sensors Localize

>2 signals, 2 sensors Ambiguous Sensor A

A1

Sensor BB1A2

B2

Association as a Hypothesis Test

Assuming independent sources, hypotheses are of the form

Asymptotic comparison of known models to those estimated from a single realization

1 1

2 2

1 2 1 2 1 1 2 21 1 1 2 2

1 2 1 2 1 2 2 12 1 2 2 1

, , , , ,: ,

, , , , ,: ,H Hk

H Hk

A A B B p A B p A BH A B A B

A A B B p A B p A BH A B A B

1 1

2 2

1 1 2 2 1 1 2 2

1 2 2 1 1 2 2 1

ˆ ˆ, , , ,

ˆ ˆ, , , ,H H

H H

p A B p A B p A B p A B

p A B p A B p A B p A B

Asymptotics of Likelihood Ratio

Decomposes into two sets of terms: Statistical

dependencies (groupings)

Differences in model parameterizations

1 1

2 2

1

1 1 1 1 2

2

1 1 2 2

1 2 2 1

1 1 2 2

1 2 1 2 1 2 1 2

1 2 1 2

statistical dependence

model differences

s

, ,1log log

, ,

; ;

, ,

tatist

,

;

ca

;

i

H Hk k

k H Hk k

H

H H H H H

H

p A B p A BL

N p A B p A B

I A B I A B

D p A p A p B p B p A A B B

I A B I A B

2 2 2 2 11 2 1 2 1 2 1 2, , ,

l dependence

model differences

H H H H HD p A p A p B p B p A A B B

Asymptotics of Likelihood Ratio

If we estimate from a single realization: Statistical

dependence terms remain

Model divergences go away

1

1 1 1 1 2

2

1 1 2 2

1 2 2 1

1 1 2 2

1 2 1 2 1 2 1 2

1 2 1 2

statistical dependence

model differences

s

ˆ ˆ, ,1log log

ˆ ˆ, ,

; ;

, , ,

; ;

tatistical de

k k

k k k

H

H H H H H

H

p A B p A BL

N p A B p A B

I A B I A B

D p A p A p B p B p A A B B

I A B I A B

2 2 2 2 11 2 1 2 1 2 1 2, ,

pendence

model differen s

,

ce

H H H H HD p A p A p B p B p A A B B

High Dimensional Data

Learn low-dimensional auxiliary variables which summarize statistical dependency of measurements

1

2

1 2 3 4

1 2 3 4

1 2 3 4 1 1 2 2

1 2 3 4 1 2 1 2

ˆ ˆ, ,1log , log

ˆ ˆ, ,

1log ; ; ; ;

1log ; ; ; ;

lim

lim

k k

k k k

N

H

N

H

p f f p f fL f g

N p g g p g g

L I f f I f f I A B I A BN

L I g g I g g I A B I A BN

1 1 1 2 2 1 3 3 2 4 4 2

1 1 1 2 2 2 3 3 2 4 4 1

k k k k k k k k

k k k k k k k k

f f A f f B f f A f f B

g g A g g B g g A g g B

' , '

1arg max log ,

f s g sL f g

N

AV Association/Correspondence New since last year,

direct application of the 2 sensors/multiple source case Unknown joint

statistics High-dimensional data Varying scene

parameters Surrogate for multi-

modal sensors

consistent

inconsistent

AV Association/Correspondence

0.68

association matrix for 8 subjects

0.61

0.19 0.20

General Structure Tests Generalization to hypothesis tests over graphical

structures How are observations related to each other?

11

: where , , andi

i

Mi i i i

i H j j d j j kj j

H p S S x x S S

vs vs1x

2x

3x

4x

5x

6x

1x

2x

3x

4x

5x

6x

1x

2x

3x

4x

5x

6x

General Structure Tests

1 1 11 1 4 2 2 3 3 5 6, , ,S x x S x x S x x

vs1x

2x

3x

4x

5x

6x

H1

1x

2x

3x

4x

5x

6x

H2

2 2 21 1 3 4 2 2 3 5 6, , ,S x x x S x S x x

12 1 2 ,jk j kS S S j k

12 12 1211 1 4 12 1312 12 1221 3 22 2 2312 12 1231 32 33 5 6

,

,

S x x S S

S x S x S

S S S x x

Intersection Sets - groupings on which the hypotheses agree


Asymptotics have a similar decomposition as in the 2-variable case (via the intersection sets):

1 1

1 2

1 1 1 2

1

2 2

2

1 12, ,,

12 2, ,,

1 12 12 2

, ,

2

statistical dep

1 1log lo

endence model differences

gH j t H jk tj j k

t H jk t H j tj k j

H j H jk H jk H jj j k j k jH

H j HH

p S p SL

N N p S p S

D p S p S D p S p S

D p S p

2 1

12 12 1

, ,

statistical dependence model differences

jk H jk H jj j k j k jS D p S p S

General Structure Tests Extension of the previous work on data

association is straightforward for such tests. Estimation from a single realization incurs a

reduction in separability only in terms of the model difference terms.

The “curse of dimensionality” (with respect to density estimation) arises in 2 ways: Individual measurements may be of high dimension

Could still design low dimensional auxiliary variables The number of variables in a group

New results provide a solution


The test implies potentially 6 joint densities, but is simplified by looking at the intersection sets.

1x

2x

3x

4x

5x

6x

H2

1 1 1

1

2 2 2

2

1 4 2 3 5 6

2 1 3 4 5 6

2 3 2 3

1 3 4 3 1 4

ˆ ˆ ˆ, , ,1 1log log

ˆ ˆ ˆ, , ,

,

, , ,

t

H H HH

H H HH

p x x p x x p x xL

N N p x p x x x p x x

D p x x p x p x

D p x x x p x p x x

1x

2x

3x

4x

5x

6x

H1


High dimensional variables learning auxiliary

variables reduces dimensionality in one aspect.

2 3 2 3

2 3 2 3

1 3 4 1 3 4

1 3 4 1 3 4

ˆ ˆ ˆ,

ˆ ˆ ˆ,

ˆ ˆ ˆ, , ,

ˆ ˆ ˆ, , ,

D p x x p x p x

D p f f p f p f

D p x x x p x p x x

D p g g g p g p g g

1 1 1

1

2 2 2

2

1 4 2 3 5 6

2 1 3 4 5 6

2 3 2 3

1 3 4 3 1 4

ˆ ˆ ˆ, , ,1 1log log

ˆ ˆ ˆ, , ,

,

, , ,

t

H H HH

H H HH

p x x p x x p x xL

N N p x p x x x p x x

D p x x p x p x

D p x x x p x p x x

But we would still have to estimate a 3 dimensional density.

This only gets worse with larger groupings.

K-L Divergence with Permutations Simple idea which mitigates many of the

dimensionality issues. Exploits the fact that the structures are distinguished

by their groupings of variables. Key Ideas:

1. Permuting sample order between groupings maintains the statistical dependency structure.

2. D(X||Y) >= D(f(X)||f(Y)) This has the advantage that we can design a single

(possibly vector-valued) function of all variables rather than one function for each variable.

Currently doing comparitive analysis (bias, variance) with previous approach.

kk

K-L Divergence with Permutations

1x 2x 3x

1x 2x 3xf

1 2 3 1 2 3 1 2 3ˆ , , : , , , ,k

p f x x x x x x p x x x

1 2 3 1 2 3 1 2 3ˆ , , : , ,k

p f x x x x x x p x p x p x

1 2 3 1 2 3, ,D p x x x p x p x p x

More General Structures

Analysis has been extended to comparisons between triangulated graphs.

Can be expressed as sums and differences of product terms.

Admits a wide class of Markov processes.

Modeling Group Interactions Object 3 tries to interpose itself between objects 1

and 2. The graph describes the state (position)

dependency structure.

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

Modeling Group Interactions

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1,tx

2,tx

3,tx

1, 1tx

2, 1tx

3, 1tx

1H 2H 3H

1 2vsH H 2 3vsH H

Previous Work and Current Efforts (Princeton)• Developed fast algorithms based on block sorting for

entropy and divergence estimation for discrete sources.• Simulations and text data show excellent results.

• Have provided analysis of methods showing universal consistency.

• Have recently investigated estimation of mutual information.

• Have recently been investigating performance for hidden Markov sources.

• Currently analyzing performance for hidden Markov sources.• Investigating extensions to continuous alphabet sources.• Applications to various types of data.

A “Distilled” Problem The Problem: How to estimate the

entropy, divergence, and mutual information of two sources based only on one realization from each source ?

Assumption: Both are finite-alphabet,

finite- memory, stationary sources.

Our goal: Want good estimates, fast convergence, and reasonable computational complexity.

Two Approaches to Estimating Mutual Information

• Estimate mutual information via entropy:

I(X;Y) = H(X) + H(Y) – H(X,Y).

• Estimating mutual information via divergence:

I(X;Y)= D(pxy||pxpy).

• We use our entropy and divergence estimators via Burrows-Wheeler Block Sorting Transform.

Estimating Mutual Information

Analysis and simulations shows that both approaches converge to the true value.

Entropy approach appears better than the divergence approach.

Divergence approach does not use the fact that the second distribution pxpy is a product of two marginal distributions.

Hidden Markov Processes X is the underlying Markov Chain. Y is a deterministic mapping of X, or Y

is X observed through a Discrete Memoryless Channel.

Then, Y is a Hidden Markov Process.

Useful in a wide range of applications.

Entropy of HMP

In order to get the mutual information of the input and output of a DMC, we need the entropy of the output, which is a HMP if the input is Markov.

The entropy of HMP can be approximated by an upper bound and a lower bound.

These bounds can be calculated recursively.

)|()(),|( 111

11

dd

dd YYHYHXYYH

)|(lim)(),|(lim 111

11

d

dd

dd

dYYHYHXYYH

Estimating entropy of HMP

MSE of Our Estimators The MSE of our entropy estimator for i.i.d. sources satisfies

The MSE of our mutual information estimator for i.i.d. sources

We have convergence results for divergence estimator and for Markov sources and stationary ergodic sources.

.1

)](var[log1

]))()(ˆ[(22

2

n

OZqn

qHqHE n

.1

)()(

),(logvar

1]));();(ˆ[(

222

nO

YqXq

YXq

nYXIYXIE

yx

xyn

MSE of Entropy Estimator for HMP

We can prove H(Yd|Yd-1,…,Y1) converges to H(Y) exponentially fast w.r.t. d, if the Hidden Markov Process’ mapping satisfies that there exists an , such that for exactly one

We want to further establish the convergence rate of our entropy estimator for HMP.

Ya ai )(.Xi

Association vs the Generative Model

The MI fusion approach is equivalent to learning a latent variable model of the audio video measurements. Random variables:

Parameters, appearance bases:

Simultaneously learn statistics of joint audio/video variables and parameters

as the statistic of association (consistent with the theory)

vv v av vkk kav

k

aa a av akk kav

k

Y n

Y n

vk

ak

avk

avk

vkY

akY

, , , , ,v av v a av an n

, , ,v av a av

;av avI

Incorporating Motion Parameters Extension of multi-modal

fusion to include nuisance parameters Audio is an indirect

pointer to the object of interest.

Combine motion model (nuisance parameters) with audio-video appearance model.

vv v av vkk k kav

k

aa a av akk kav

k

Y T n

Y n

kT

vk

ak

avk

avk

vkY

akY

Incorporating Motion Parameters

example frames average image

without motionmodel

with motionmodel

Information Theoretic Sensor Management

Following Zhao, Shin Reich (2002), Chu, Haussecker, Zhao (2002), Ertin, Fisher, Potter (2003) we’ve started extending IT approaches to sensor management.

Specifically, consider the case where a subset of measurements over time has been incorporated into the belief state. When is it better to incorporate

a measurement from the past versus a new measurement?

How can we efficiently choose a set of measurements (avoid the greedy approach)?

kx1kx 2kx 3kx k Mx

0k Mz

1k Mz

Nk Mz

03kz

13kz

3Nkz

02kz

12kz

2Nkz

01kz

11kz

1Nkz

0kz

1kz

Nkz

Summary Applied association method to multi-modal data New MI/K-L divergence estimators based on

permutation approach Mitigates dimensionality issues, avoids some of the

combinatorics. Extended approach to triangulated graphs.

New estimators for information measures (entropy, divergence, mutual information) based on BWT (block sorting). Doesn’t require knowledge of distribution or parameters

of the sources. Efficient algorithm, good estimates, fast convergence. Significantly outperforms other algorithms tested. Investigating use in several applications including as

component for correspondence and fusion algorithms.

information theoretic approaches to data association and fusion in sensor networks

Documents

sensor data mit

sensor data subject

sensor geometrycomplex

sensor outputs

sensor networksjohn

types of sensor uncertainties

multimodal tracking

statistical relationships