active object recognition under gaze control

37
Computational Vision and Active Perception Active Object Recognition under Gaze Control Active Object Recognition under Gaze Control Jan-Olof Eklundh, Mårten Björkman CVAP, KTH, Stockholm

Upload: others

Post on 28-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Active Object Recognition under Gaze Control

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Active Object Recognition under Gaze Control

Active Object Recognition under Gaze Control

Jan-Olof Eklundh, Mårten BjörkmanCVAP, KTH, Stockholm

Page 2: Active Object Recognition under Gaze Control

2

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

The KTH Head of 1991The KTH Head of 1991

Independent eye and neck movementsEye rotations around optical centerEccentric neckDrive towards symmetry constrained redundancyMonocular stabilization and pursuit, binocular stereopsis and accommodation independent but integratedBinocular fixation at lateral speeds up to 115°/s,5 m/s in depth. Saccades up to 360°/s

Page 3: Active Object Recognition under Gaze Control

3

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

QuickTime™ and aH.261 decompressor

are needed to see this picture.

Page 4: Active Object Recognition under Gaze Control

4

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

QuickTime™ and aH.261 decompressor

are needed to see this picture.

Page 5: Active Object Recognition under Gaze Control

5

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

What did this system “see”?What did this system “see”?

High performance through tightly integrated hardware. No resources leftInformation about ego-motion, independent object motion and depth available, but couldn’t be utilizedAppearance of 3D objects too, as also to some extent poseGoal of current work to do that in visual search and hand-eye coordination tasks

Page 6: Active Object Recognition under Gaze Control

6

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

S-o-t-a object classificationS-o-t-a object classification

Page 7: Active Object Recognition under Gaze Control

7

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

A robot looking at a table at 1.5 mA robot looking at a table at 1.5 m

Objects subtend only a fraction of the scene and are not centered (unless attentional step)

Page 8: Active Object Recognition under Gaze Control

8

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Desirable system structureDesirable system structure

where ”what”attention segmentation

recognitionwhat

Run concurrently. Motion powerful in bootstrapping,but static objects often as important.

Page 9: Active Object Recognition under Gaze Control

9

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

With stereo and motionWith stereo and motion

Page 10: Active Object Recognition under Gaze Control

10

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

What the system “sees”What the system “sees”

Page 11: Active Object Recognition under Gaze Control

11

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion F-g-s by integration of multiple cues

from motion and appearanceF-g-s by integration of multiple cues

from motion and appearance

Original Foreground mask

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 12: Active Object Recognition under Gaze Control

12

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

The cuesThe cues

Motion

Colour Prediction

Texture(contrast)

Combined

Page 13: Active Object Recognition under Gaze Control

13

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion F-g-s by integration of multiple cues

from motion and appearanceF-g-s by integration of multiple cues

from motion and appearance

Original Foreground mask

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 14: Active Object Recognition under Gaze Control

14

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Typical static scenesTypical static scenes

Page 15: Active Object Recognition under Gaze Control

15

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion A system for searching for

static objectsA system for searching for

static objectsA wide field of view for attentionRecognition in foveated viewSteps:

Divide scene into depth layersSelect candidate objects through attentionFixate and track objects of potential interestRecognize/classify objects in foveal view, possibly after a second binocularly based segmentation

Technically: two pairs of stereo camerasProblem: transfer of views

Page 16: Active Object Recognition under Gaze Control

16

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Flow of informationFlow of information

Left

Segmentation Global hue SIFT features Local hue

Attention

Gaze direction

Recognition

RegistrationRegistration

Left RightRight

FixationCalibration

Wide field Foveal

Page 17: Active Object Recognition under Gaze Control

17

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Stereo computationsStereo computations

Relative orientations have to be known to• relate disparities to depths• simplify estimation of disparities

∆x∆y

(1+x )α – y rxy α + r + x r

2

z

zy

= + 1z

1 – x t- y t

Using corner features and optical flow model

Unstable process => use robust methodsFirst assume r and r to be zeroOn-line calibration allows the use of expected retinal size

z y

Page 18: Active Object Recognition under Gaze Control

18

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Figure-ground segmentationFigure-ground segmentation

Disparity map is sliced into layers.Widths are set after objects searched for

Page 19: Active Object Recognition under Gaze Control

19

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Figure-ground segmentationFigure-ground segmentation

BinoCues BinoAttn

Page 20: Active Object Recognition under Gaze Control

20

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Appearance based attentionAppearance based attention

Local hue histograms correlated with that of requested object.

Fast implementation using rotating sums.

Page 21: Active Object Recognition under Gaze Control

21

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Saliency peaksSaliency peaks

Peaks from blob detection of depth slices.Based on Differences of Gaussians.Hue saliency map used for weighting.Random value added before selection.Inhibition on return

Page 22: Active Object Recognition under Gaze Control

22

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

FixationFixation

0 0 c0 0 da b e

F =

The foveal system continuously tries to fixate• done using corner features• and affine essential matrix

Zero disparity filters won’t work

Page 23: Active Object Recognition under Gaze Control

23

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Foveated segmentationFoveated segmentation

To boost the ensuing recognition/classification

• Foveal segmentation based on disparities• Rectification using affine fundamental matrix

• Only search for disparities around zero => Large number of false positives

• Points clustered in 3D using mean shift

Page 24: Active Object Recognition under Gaze Control

24

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Foveated segmentationFoveated segmentation

Page 25: Active Object Recognition under Gaze Control

25

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Foveated segmentationFoveated segmentation

Page 26: Active Object Recognition under Gaze Control

26

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion Small example object database in

real-time experiments - in total 24Small example object database in real-time experiments - in total 24

Here models of SIFT features and hue histograms. Texture descriptors also included now.

Page 27: Active Object Recognition under Gaze Control

27

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion Visual scene searchVisual scene search

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 28: Active Object Recognition under Gaze Control

28

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Segmentation robustnessSegmentation robustness

Page 29: Active Object Recognition under Gaze Control

29

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Effect of occlusionsEffect of occlusions

Page 30: Active Object Recognition under Gaze Control

30

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Effect of rotationsEffect of rotations

Page 31: Active Object Recognition under Gaze Control

31

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Recognition experimentsRecognition experiments

24 objectsLearned over a range of views, represented by two featuresArranged in 24 “scenes”“Is X in the scene?”3 fixations allowed

Page 32: Active Object Recognition under Gaze Control

32

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion SIFT features in wide field, no

disparity based segmentation SIFT features in wide field, no disparity based segmentation

Page 33: Active Object Recognition under Gaze Control

33

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Colour cue, wide field vs wide + central field disparity segmentation

Colour cue, wide field vs wide + central field disparity segmentation

Page 34: Active Object Recognition under Gaze Control

34

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

SIFT features, wide field vs wide + central field disparity segmentation

SIFT features, wide field vs wide + central field disparity segmentation

Page 35: Active Object Recognition under Gaze Control

35

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Wide field + central disparity based segmentation, combined features

Wide field + central disparity based segmentation, combined features

Page 36: Active Object Recognition under Gaze Control

36

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

ConclusionsConclusions

Gaze control essential. In fact, many current methods assume foveation or something with similar effects3D cues powerful for figure-ground segmentation (informs about the scene)3D cues thereby also support recognition and categorizationIntegration of multiple cues essential

Page 37: Active Object Recognition under Gaze Control

37

Com

puta

tiona

l Vis

ion

and

Activ

e Pe

rcep

tion

Comments. Future workComments. Future work

We have a running system, that normally finds objects within three saccadesExperiments tedious (learning, scene setups)More cues being added, especially textureFocus on classification and eventually categorizationApplications to hand-eye coordination and manipulationPotential for computing both local and global shape properties