stavri nikolov 1, tim dixon 2, john lewis 1, nishan canagarajah 1, dave bull 1, tom troscianko 2,...

56
Canagarajah 1 , Dave Bull 1 , Tom Troscianko 2 , Jan Noyes 2 1 Centre for Communications Research, University of Bristol, UK 2 Department of Experimental Psychology, University of Bristol, UK How Multi-Modality Displays Affect Decision Making NATO ARW 2006, 21 - 25 October 2006, Velingrad, Bulgaria

Upload: nicholas-harrison

Post on 11-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Stavri Nikolov1, Tim Dixon2, John Lewis1, Nishan Canagarajah1,

Dave Bull1, Tom Troscianko2, Jan Noyes2

1Centre for Communications Research, University of Bristol, UK2Department of Experimental Psychology, University of Bristol, UK

How Multi-Modality DisplaysAffect Decision MakingNATO ARW 2006, 21 - 25 October 2006, Velingrad, Bulgaria

2 Overview

• Multi-Sensor Image Fusion• Multi-Modality Fused Image/Video Displays• Target Detection in Fused Images with Short

Display Times (results)• Scanpath Assessment of Fused Videos• Multi-Modality Image Segmentation• Summary

3 How Does Image/Video Fusion Affect Decision Making

• Experiment 1: Target Detection in Fused Images with Short Display Times; Decision: is the target present or not?

• Experiment 2: Target Tracking in Fused Videos (+ secondary task); Decision: where to look to follow the target?

• Experiment 3: Image Segmentation (decomposing an image into meaningful regions/object) in Fused Images; Decision: which objects to segment and how?

4

Multi-Sensor Image Fusion

5 Multi-Sensor Image Fusion: Definition

• the process by which several images coming from different sensors, or some of their features, are combined together to form a fused image

• the aim of the fusion process is to create a single image (or visual representation) that will capture most of the important and complementary information in the input images and will resolve better any uncertainties, inconsistencies or ambiguities.

6 Multi-Sensor Image Fusion: Example

Visible and IR images courtesy of Octec Ltd, UK

An exampleAn exampleFF

7 Multi-Sensor Image Fusion: Applications

• Many different applications of image fusion:

– remote sensing

– surveillance

– defence

– computer vision

– robotics

– medical imaging

– microscopic imaging

– art

8 Multi-Sensor Image Fusion: Applications

• Image fusion is used in:

– night vision systems

– binocular vision

– 3-D scene model building from multiple views

– image/photo mosaics

– digital cameras and microscopes to extend the effective depth of field by combining multi-focus images

– target detection

9 Multi-Sensor Image Fusion: Different Levels

• Image fusion can be performed at different levels of the information representation:

– signal level

– pixel level

– feature / region level

– object level

– symbolic level

10

Multi-Modality Image Displays

11 Multi-Modality Image Displays

• Adjacent (side-by-side) displays (*)

• Window displays

• Fade in/out displays

• Checkerboard displays (*)

• Gaze-contingent multi-modality displays (*)

• Hybrid fused displays (*)

• Interleaved video displays

12 Adjacent and Checkerboard Displays

Images from the Eden Project Multi-Sensor Data Set

13

Demo of a gaze-contingent multi-modal display (GCMMD) using aerial photographs and maps of England (from Multimap.com).

Gaze-Contingent Multi-Modal Displays

“Multi-Modality Gaze-Contingent Displays for Image Fusion",

S. G. Nikolov, M. G. Jones, I. D. Gilchrist, D. R. Bull, C. N. Canagarajah, Proceedings of Fusion 2002

14 Hybrid Fused Image Displays

(1.0,0.0) (0.8,0.2) (0.6,0.4)

(0.4,0.6) (0.2,0.8) (0.0,1.0)

“Hybrid Fused Displays: Between Pixel- and Region-Based Image Fusion", S. G. Nikolov, J. J. Lewis, R. J. O’Callaghan, D. R. Bull and C. N. Canagarajah, Proceedings of Fusion 2004

15 Fused Image Assessment

• The results of image fusion are: – either used for presentation to a human observer

for easier and enhanced interpretation – or subjected to further computer analysis or

processing, e.g. target detection or tracking, with the aim of improved accuracy and more robust performance

• Finding an optimal fused image is a very difficult

problem since in most cases this is task and

application dependent.

16

… it depends what we want to do with it, i.e. the task we have!

Which Fused Image is Better?

Original Visible

and IR “UN

Camp” images

courtesy of TNO

Human Factors

17 Categories of Fused Image Assessment Metrics

Input Image Metrics (IIMs)

Input and Fused Image Metrics

(IFIMs)

Fused Image Metrics (FIMs)

A

B

input images

FUSION F

fused image

18 Fused Image Assessment Metrics

• A number of image quality metrics have been

proposed in the past but all require a reference image

• In practice an ideal fused is rarely known and is

application and task specific

• other metrics try to estimate what information is

transferred from the input images to the fused image

• two such metrics that we used in our study to assess

the quality of the fused images are Piella's image

quality index (IQI) [03] and Petrovic's edge-based

Q^AB/F metric [00,03] (both of which are IFIMs)

19

Experiment 1: Target Detection in Fused Images

Decision: Is the target present or not?

20

Average Contrast Pyramid DT-CWTClean Low High

• Testing 3 fusion schemes: AVR, CP & DT-CWT, and 3 JPEG2000

compression rates: clean, low (.3bpp) and high (.2bpp).

• Using a signal detection paradigm to assess Ps ability to detect

presence of the soldier (target) in briefly displayed images.

Target present Target absent

Experiment 1, Task 1: Objective Human Task Performance

21

• Fixation point ‘+’ shown for 750ms, an image presented for 15ms,

followed by an inter-stimulus interval of 15ms, and a mask for 250ms.

Task 1: Method

22

• Show pairs of images, ask Ps to rate both out of 5 (5 = Best quality, 1

= Worst quality). Images paired: by Fusion type and by Compression level

Experiment 1, Task 2: Subjective Image Assessment

23

• The results showed a significant effect for fusion

but not compression in JPEG2000 images

• Subjective ratings differed for JPEG2000 images,

whilst metric results for both JPEG (different study)

and JPEG2000 showed similar trends

Target Detection in Fused Images: Main Results

“Characterisation of Image Fusion Quality Metrics for Surveillance Applications over Bandlimited Channels", E. F. Canga, T. D. Dixon, S. G. Nikolov, D. R. Bull, C. N. Canagarajah, J. M. Noyes, T. Troscianko, Proceedings of Fusion 2005

24

Experiment 2: Target Tracking in Fused Videos

Decision: Where to look to follow the target?

25

• Applying an eye-tracking paradigm to the fused image assessment

process.

• Moving beyond still images: assessing participants’ ability to

accurately track a figure.

• Using footage taken recently at the Eden Project Biome.

• Videos of a ‘soldier’ walking through thick foliage filmed in both

visible light and IR, and at two natural luminance levels.

• All videos registered using our Video Fusion Toolbox (VFT)

Experiment 2

26

• High

Luminance

(HL)

• Low

Luminance

(LL)

Original Videos Used

Videos from the Eden Project Multi-Sensor Data Set

27

Low Luminance:

• Fused Average

• Fused DWT

• Fused DT-CWT

High Luminance:

• Fused Average

• Fused DWT

• Fused DT-CWT

Fused Videos Used

28

• Participants asked to visually track the solider as accurately as possible

throughout video sequence.

• Tobii x50 Eye-Tracker used to record eye movements.

• Participants also asked to press SPACE at specific points in the two

sequences (when soldier walked past features of the scene).

• 10 Ps (5m, 5f): mean age = 27.1 (s.d. = 6.76).

• Each shown 6 displays: Viz, IR, Viz+IR*, AVE, DWT, DT-CWT.

• All Ps shown each condition in 3 separate sessions.

• Half shown above order first, half reverse order. Order switched for 2nd and

switch back for 3rd sessions.

• Eye position and reaction times recorded.

Tasks + Methods

29

• Eye position translated

onto target box for each

participant.

• Calculated an accuracy

ratio, hits:total views for

each condition.

• Also considered Tobii

accuracy coding.

Accuracy Results I

30 Accuracy Results II

Videos from the Eden Project Multi-Sensor Data Set

31

• Accuracy Scores revealed:– Main effect display

modality (p = .001).– No main effect of session

(p > .05).– No interaction (p > .05).– Post hoc tests revealed

differences between Viz and: AVE, DWT, CWT.

– IR and: AVE, DWT

• RT Scores revealed:– No significant effects 0

0.1

0.2

0.3

0.4

0.5

0.6

Viz IR AVE DWT CWT

Display Modality

Est

imat

ed M

argi

nal M

eans

of A

ccur

acy

Session 1

Session 2

Session 3

* *

**

Results (High Luminance)

“Scanpath Analysis of Fused Multi-Sensor Images with Luminance Change",

T.D. Dixon, S.G. Nikolov, J.J. Lewis, J. Li, E.F. Canga, J.M. Noyes, T.

Troscianko, D.R. Bull and C.N. Canagarajah, Proceedings of Fusion 2006

32

• Accuracy Scores revealed:– Main effect display

modality (p < .001).– No main effect of

session (p > .05).– No interaction (p > .05).– Post hoc tests revealed

differences between Viz and: IR, AVE, DWT, CWT.

• RT Scores revealed:– Main effect of fusion: IR

significantly closer to ‘ideal’ timing.

0

0.05

0.1

0.15

0.2

0.25

0.3

Viz IR AVE DWT CWT

Display Modality

Est

imat

ed M

argi

nal M

eans

of A

ccur

acy

Session 1

Session 2

Session 3

**

*

Results (Low Luminance)

33

• The current experimental results reveal two methods for

differentiating between fusion schemes: the use of

scanpath accuracy and RTs.

• Fused videos with higher (perceived) quality do not

necessarily lead to better tracking performance

• The AVE and DWT fusion methods were found to perform

best in the 2.1_i tracking task. From a subjective point,

the DWT appeared to create a sequence that was much

noisier and with more artefacts than the CWT method.

Target Tracking in Fused Videos: Conclusions I

34

• All of the fusion methods performed significantly better

than the inputs, highlighting the advantages of using a

fused sequence even when luminance levels are high.

• Results suggest that when luminance is low, any

method of attaining additional information regarding the

target location will significantly improve upon a visible

light camera alone.

Target Tracking in Fused Videos: Conclusions II

35

Experiment 3: Multi-Modal Image Segmentation

Decision: Which objects to segment and how?

36 Multi-Modal Image Segmentation

• Multi-modal sensors

Multi-sensor systems

• Many applications need good segmentation

• How best to segment a set of multi-modal images?

• To study how fusion affects segmentation

• Previous evaluation methods– Subjective– based on ground truth

• Need for objective measure of quality of segmentation

techniques

sets of multi-modal images}

37 Joint Vs. Uni-Modal Segmentation

Two approaches investigated:• Uni-modal segmentation

S1 = σ(I1),…, SN = σ(IN)– Each image segmented separately

– Different segmentations for each image in the set

• Joint segmentation

Sjoint = σ(I1 …IN)– All images in the set contribute a single segmentation

– Segmentation accounts for all features from all input images

38 Uni-Modal and Joint Image Segmentation

Original IR image in red Original Visible Image in green Joint Segmentation

Unimodal Segmetation Unimodal Segmentation Union of Unimodal Segmentations

39

• To enable objective comparison of different

segmentation techniques

• Need some method of finding a “ground truth” of natural

images

• The human visual system is good at segmenting images

• The Berkeley Segmentation Database– 1000 natural images– 12000 human segmentations

[Martin et al., A Database of Human Segmented natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics, ICCV, 2001]

Multi-Sensor Image Segmentation Data Set

40

• 11 Sets of multi-modal images

• 14 IR and 11 grey scale images

• 33 fused images from 3 pixel-based fusion algorithms– Contrast pyramids– Discrete wavelets transform– Dual tree complex wavelet transform

• All images have been segmented by the techniques

described using the same “good” parameters across

the whole data set

Multi-Sensor Image Segmentation Data Set

41 Image Data Set: Examples

Images from the Multi-Sensor Image Segmentation Data Set

42 Experimental Setup

• 63 subjects

• The instructions were toDivide each image into pieces, most important pieces first, where each piece represents a distinguished thing in the image. The number of things in each image is completely up to you. Something between 2 and 20 is usually reasonable. Take care and try and be as accurate as possible.

• 5 images segmented each

• Images pseudo-randomly distributed so that:– Each subject sees only one image from each set– They see at least one IR, one visible and one fused image– An image is not distributed a second time unless all images

have been distributed once; etc.

43 The Segmentation Tool

The Berkeley Segmentation Tool (SegTool)

44 The Human Segmentations

• 315 human segmentation produced

• ~20 rejected as obviously wrong

• 5-6 segmentations for each image

• 1 expert segmentation for each image

The human segmentations are available to download from www.ImageFusion.org

45 Examples of Human Segmentations

User 5

User 61User 54User 39

User 35User 15

Human Segmentations of “UN Camp” CWT Fused Image

46 Segmentation Error Measure I

We adopt the approach used with the Berkley Segmentation Dataset

• Precision, P, fraction of detections that are true positives rather than false positives

• Recall, R, fraction of true positives that are detected rather than missed

• F-measure is a weighted harmonic mean

F = PR/(αR+(1- α)P)– α = 0.5 used

47 Segmentation Error Measure II

• Correspondences computed by– Comparing the segmentation to each

human segmentation of that image

– Correspondence computed as a minimum cost bipartite assignment problem

– Scores averaged to give a single P, R and F value for each image

– Tolerates localization errors– Finds explicit correspondences only

48 Analysis of Human Segmentations

49 Examples of Automatic and Human Segmentations I

Images from the Multi-Sensor Image Segmentation Data Set

50 Examples of Automatic and Human Segmentations II

Images from the Multi-Sensor Image Segmentation Data Set

51 Joint Vs Uni-Modal Segmentation (Original Images)

52

• Using the human segmentations as “ground truth” for

evaluation– Found UoB_Uni to give best segmentations of uni-

modal techniques– Found joint segmentations to be better than the uni-

modal segmentations of the original images– Found the joint segmentations to be at least as good

as the uni-modal segmentations of the fused images

• The relevance of these results to region-based fusion

confirmed

Multi-Sensor Image Segmentation: Results

“Joint- versus Uni-Modal Segmentation for Region-Based Image Fusion",

J. J. Lewis, S. G. Nikolov, A. Toet, D. R. Bull and C. N. Canagarajah,

Proceedings of Fusion 2006

53

• Recent results indicate that schemes for fusion of

visible and IR imagery should prioritise terrain features

from the visible imagery and man-made targets from

the IR imagery in the fusion process, in order to

produce a fused image that is optimally tuned to

human visual cognition and decision making

• By comparing the human segmentations of the input

images to the human segmentations of the fused

images we can hopefully study how image fusion

affects segmentation decisions

Multi-Sensor Image Segmentation: Work in Progress

54 Summary I

• Multi-sensor image fusion affects decision making

in various ways

• By applying tasks to the image fusion assessment

process, it has been found that DT-CWT fusion can

lead to better target detection human performance

than AVE, pyramid and DWT methods

• In addition, the objective tasks utilised have been

shown to produce very different patterns of results

to comparative subjective tasks.

55 Summary II

• Fused videos with higher (perceived) quality do not

necessarily lead to better tracking performance

• In most cases there are significant advantages of

using a fused video sequence for target tracking even

in HL levels and more so in LL levels

• Using the Multi-Sensor Segmentation Data Set we

are trying to produce fused images that are optimally

tuned to human visual cognition and decision making

and to study how image fusion affects segmentation

decisions

56

• NATO and the ARW organisers

• The Data and Information Fusion Defence Technology Centre (DIF-DTC),

UK, for partially funding this research

• The Image Fusion Toolbox (IFT) and the Video Fusion Toolbox (VFT)

development team at the University of Bristol

• Lex Toet (TNO Defence and Security, The Netherlands), Dave Dwyer

(Octec Ltd, UK) and Equinox Corp (USA) for providing some of the images

sequences used in this study (all these image sequences are available

through www.ImageFusion.org)

• The Eden Project in Cornwall

Acknowledgements